Writing SYCL Kernels

Introduction

Numba-dpex offers a way to write data-parallel kernels directly using Python. The compiler extension to Numba has a programming model similar to the SYCL C++ domain-specific language. By providing similar abstractions as SYCL, Python programmers can use the compiler to express data-parallelism using a hierarchical syntax. Note that not all SYCL concepts are currently supported by numba-dpex.

The explicit kernel programming mode of numba-dpex bears similarities with Numba’s other GPU backends: numba.cuda and numba.roc. The documentation should serves as a guide for using the current kernel programming features available in numba-dpex.

Kernel declaration

A kernel function is a device function that is meant to be called from host code, where a device can be any SYCL supported device such as a GPU, CPU, or an FPGA. The present focus of development of numba-dpex is mainly on Intel’s GPU hardware. The main characteristics of a kernel function are:

  • kernels cannot explicitly return a value; all result data must be written to an array passed to the function (if computing a scalar, you will probably pass a one-element array)

  • kernels explicitly declare their thread hierarchy when called: i.e. the number of thread blocks and the number of threads per block (note that while a kernel is compiled once, it can be called multiple times with different block sizes or grid sizes).

Example

Kernel invocation

A kernel is typically launched in the following way:

Indexing functions

Numba-dpex provides the following indexing functions that have OpenCL-like semantics:

  • numba_dpex.get_local_id

  • numba_dpex.get_local_size

  • numba_dpex.get_group_id

  • numba_dpex.get_num_groups