Stepping

Stepping allows you to go through the program by lines of source code or by machine instructions.

Consider the following examples.

numba_dpex/examples/debug/simple_sum.py:

import dpctl
import numpy as np

import numba_dpex as dppy


@dppy.kernel(debug=True)
def data_parallel_sum(a, b, c):
    i = dppy.get_global_id(0)
    c[i] = a[i] + b[i]  # Condition breakpoint location


global_size = 10
N = global_size

a = np.array(np.random.random(N), dtype=np.float32)
b = np.array(np.random.random(N), dtype=np.float32)
c = np.ones_like(a)

device = dpctl.SyclDevice("opencl:gpu")
with dpctl.device_context(device):
    data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)

print("Done...")

Example with a nested function numba_dpex/examples/debug/simple_dppy_func.py:

import dpctl
import numpy as np

import numba_dpex as dppy


@dppy.func(debug=True)
def func_sum(a_in_func, b_in_func):
    result = a_in_func + b_in_func
    return result


@dppy.kernel(debug=True)
def kernel_sum(a_in_kernel, b_in_kernel, c_in_kernel):
    i = dppy.get_global_id(0)
    c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])


global_size = 10
a = np.arange(global_size, dtype=np.float32)
b = np.arange(global_size, dtype=np.float32)
c = np.empty_like(a)

device = dpctl.SyclDevice("opencl:gpu")
with dpctl.device_context(device):
    kernel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)

print("Done...")

`step`

Run the debugger and use the following commands:

$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_sum.py:22
(gdb) run simple_sum.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::data_parallel_sum () at simple_sum.py:22
22           i = dppy.get_global_id(0)
(gdb) step
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::data_parallel_sum () at simple_sum.py:22
22          i = dppy.get_global_id(0)
(gdb) step
23          c[i] = a[i] + b[i]
(gdb) continue
...
Done...

You can use stepping to switch to a nested function. See the example below:

$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_dppy_func.py:29
(gdb) run simple_dppy_func.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::kernel_sum () at simple_dppy_func.py:29
29          c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) step
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::kernel_sum () at simple_dppy_func.py:29
29          c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) step
__main__::func_sum () at simple_dppy_func.py:22
22          result = a_in_func + b_in_func
(gdb) continue
...
Done...

`stepi`

The command allows you to move forward by machine instructions. The example uses an additional command x/i $pc, which prints the instruction to be executed.

$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_dppy_func.py:29
(gdb) run simple_dppy_func.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::kernel_sum () at simple_dppy_func.py:29
29          c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) stepi
0x00000000fffeb630      29          c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) stepi
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::kernel_sum () at simple_dppy_func.py:29
29          c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) continue
...
Done...

`next`

The command has stepping-like behavior, but it skips nested functions.

$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_dppy_func.py:29
(gdb) run simple_dppy_func.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::kernel_sum () at simple_dppy_func.py:29
29          c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) next
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::kernel_sum () at simple_dppy_func.py:29
29          c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) next
...
Done...

`set scheduler-locking step`

The first line of the kernel and functions is debugged twice. This happens because you are debugging a multi-threaded program, so multiple events may be received from different threads. This is the default behavior, but you can configure it for more efficient debugging. To ensure the current thread executes a single line without interference, set the scheduler-locking setting to on or step:

$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break simple_dppy_func.py:29
(gdb) run simple_dppy_func.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::kernel_sum () at simple_dppy_func.py:29
29          c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) set scheduler-locking step
(gdb) step
__main__::func_sum () at dppy_func.py:22
22          result = a_in_func + b_in_func
(gdb) step
23          return result
(gdb) continue
...
[Switching to Thread 1.1073742080 lane 0]
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::kernel_sum () at simple_dppy_func.py:29
29          c_in_kernel[i] = func_sum(a_in_kernel[i], b_in_kernel[i])
(gdb) continue
...
Done...

Stepping

step

stepi

next

set scheduler-locking step

`step`

`stepi`

`next`

`set scheduler-locking step`