Writing SYCL Kernels
Introduction
Numba-dpex offers a way to write data-parallel kernels directly using Python. The compiler extension to Numba has a programming model similar to the SYCL C++ domain-specific language. By providing similar abstractions as SYCL, Python programmers can use the compiler to express data-parallelism using a hierarchical syntax. Note that not all SYCL concepts are currently supported by numba-dpex.
The explicit kernel programming mode of numba-dpex bears similarities with
Numba’s other GPU backends: numba.cuda and numba.roc. The documentation
should serves as a guide for using the current kernel programming features
available in numba-dpex.
Kernel declaration
A kernel function is a device function that is meant to be called from host code, where a device can be any SYCL supported device such as a GPU, CPU, or an FPGA. The present focus of development of numba-dpex is mainly on Intel’s GPU hardware. The main characteristics of a kernel function are:
kernels cannot explicitly return a value; all result data must be written to an array passed to the function (if computing a scalar, you will probably pass a one-element array)
kernels explicitly declare their thread hierarchy when called: i.e. the number of thread blocks and the number of threads per block (note that while a kernel is compiled once, it can be called multiple times with different block sizes or grid sizes).
Example
Kernel invocation
A kernel is typically launched in the following way:
Indexing functions
Numba-dpex provides the following indexing functions that have OpenCL-like semantics:
numba_dpex.get_local_idnumba_dpex.get_local_sizenumba_dpex.get_group_idnumba_dpex.get_num_groups