Stepping
Stepping allows you to go through the program by lines of source code or by machine instructions.
Consider the following examples.
numba_dpex/examples/debug/simple_sum.py
:
15import dpctl
16import numpy as np
17
18import numba_dpex as dppy
19
20
21@dppy.kernel(debug=True)
22def data_parallel_sum(a, b, c):
23 i = dppy.get_global_id(0)
24 c[i] = a[i] + b[i] # Condition breakpoint location
25
26
27global_size = 10
28N = global_size
29
30a = np.array(np.random.random(N), dtype=np.float32)
31b = np.array(np.random.random(N), dtype=np.float32)
32c = np.ones_like(a)
33
34device = dpctl.SyclDevice("opencl:gpu")
35with dpctl.device_context(device):
36 data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
37
38print("Done...")
Example with a nested function numba_dpex/examples/debug/simple_dppy_func.py
:
15import dpctl
16import numpy as np
17
18import numba_dpex as dppy
19
20
21@dppy.func(debug=True)
22def func_sum(a_in_func, b_in_func):
23 result = a_in_func + b_in_func
24 return result
25
26
27@dppy.kernel(debug=True)
28def kernel_sum(a_in_kernel, b_in_kernel, c_in_kernel):
29 i = dppy.get_global_id(0)
30 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
31
32
33global_size = 10
34a = np.arange(global_size, dtype=np.float32)
35b = np.arange(global_size, dtype=np.float32)
36c = np.empty_like(a)
37
38device = dpctl.SyclDevice("opencl:gpu")
39with dpctl.device_context(device):
40 kernel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
41
42print("Done...")
step
Run the debugger and use the following commands:
$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_sum.py:22
(gdb) run simple_sum.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::data_parallel_sum () at simple_sum.py:22
22 i = dppy.get_global_id(0)
(gdb) step
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::data_parallel_sum () at simple_sum.py:22
22 i = dppy.get_global_id(0)
(gdb) step
23 c[i] = a[i] + b[i]
(gdb) continue
...
Done...
You can use stepping to switch to a nested function. See the example below:
$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_dppy_func.py:29
(gdb) run simple_dppy_func.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::kernel_sum () at simple_dppy_func.py:29
29 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) step
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::kernel_sum () at simple_dppy_func.py:29
29 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) step
__main__::func_sum () at simple_dppy_func.py:22
22 result = a_in_func + b_in_func
(gdb) continue
...
Done...
stepi
The command allows you to move forward by machine instructions. The example uses an additional command x/i $pc
, which prints the instruction to be executed.
$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_dppy_func.py:29
(gdb) run simple_dppy_func.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::kernel_sum () at simple_dppy_func.py:29
29 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) stepi
0x00000000fffeb630 29 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) stepi
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::kernel_sum () at simple_dppy_func.py:29
29 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) continue
...
Done...
next
The command has stepping-like behavior, but it skips nested functions.
$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_dppy_func.py:29
(gdb) run simple_dppy_func.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::kernel_sum () at simple_dppy_func.py:29
29 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) next
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::kernel_sum () at simple_dppy_func.py:29
29 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) next
...
Done...
set scheduler-locking step
The first line of the kernel and functions is debugged twice. This happens because you are debugging a multi-threaded program, so multiple events may be received from different threads. This is the default behavior, but you can configure it for more efficient debugging. To ensure the current thread executes a single line without interference, set the scheduler-locking setting to on or step:
$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_dppy_func.py:29
(gdb) run simple_dppy_func.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::kernel_sum () at simple_dppy_func.py:29
29 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) set scheduler-locking step
(gdb) step
__main__::func_sum () at dppy_func.py:22
22 result = a_in_func + b_in_func
(gdb) step
23 return result
(gdb) continue
...
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::kernel_sum () at simple_dppy_func.py:29
29 c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) continue
...
Done...
See also: