Writing SYCL Kernels ==================== Introduction ------------- ``numba-dpex`` offers a way of programming SYCL supporting devices using Python. Similar to SYCL's C++ programming model for heterogeneous computing, the extension offers Python abstractions for expressing data-parallelism using a hierarchical syntax. Note that not all SYCL concepts are currently supported in the extension, and some of the concepts may not be a good fit for Python. The explicit kernel programming mode bears lots of similarities with Numba's other GPU backends:``numba.cuda`` and ``numba.roc``. Readers who are familiar with either of the existing backends of Numba, or in general with OpenCL, CUDA, or SYCL programming should find writing kernels in ``numba-dpex`` extremely intuitive. Irrespective of the reader's level of familiarity with GPU programming frameworks, this documentation should serves as a guide for using the current features available in the extension. Kernel declaration ------------------ A kernel function is a device function that is meant to be called from host code, where a device can be any SYCL supported device such as a GPU, CPU, or an FPGA. The present focus of development is mainly on Intel's GPU hardware. The main characteristics of a kernel function are: - kernels cannot explicitly return a value; all result data must be written to an array passed to the function (if computing a scalar, you will probably pass a one-element array) - kernels explicitly declare their thread hierarchy when called: i.e. the number of thread blocks and the number of threads per block (note that while a kernel is compiled once, it can be called multiple times with different block sizes or grid sizes). Example ~~~~~~~~~ .. literalinclude:: ../../../numba_dpex/examples/sum.py Kernel invocation ------------------ A kernel is typically launched in the following way: .. literalinclude:: ../../../numba_dpex/examples/sum.py :pyobject: driver Indexing functions ------------------ Currently, the following indexing functions are supported. They have the same semantics as OpenCL. - ``numba_dpex.get_local_id`` - ``numba_dpex.get_local_size`` - ``numba_dpex.get_group_id`` - ``numba_dpex.get_num_groups``