A range kernel represents the simplest form of parallelism that can be expressed in numba-dpex using kapi. Such a kernel represents a data-parallel execution over a set of work-items with each work-item representing a logical thread of execution. Example: Vector addition using a range kernel shows an example of a range kernel written in numba-dpex.
1import dpnp
2import numba_dpex as dpex
3from numba_dpex import kernel_api as kapi
4
5
6# Data parallel kernel implementing vector sum
7@dpex.kernel
8def vecadd(item: kapi.Item, a, b, c):
9 i = item.get_id(0)
10 c[i] = a[i] + b[i]
11
12
13N = 1024
14a = dpnp.ones(N)
15b = dpnp.ones_like(a)
16c = dpnp.zeros_like(a)
17dpex.call_kernel(vecadd, kapi.Range(N), a, b, c)
The highlighted lines in the example demonstrate the definition of the execution
range on line 17 and extraction of every work-items’ id or index position
via the item.get_id
call on line 10. An execution range comprising of
1024 work-items is defined when calling the kernel and each work-item then
executes a single addition.
There are a few semantic rules that have to be adhered to when writing a range kernel:
Analogous to the API of SYCL a range kernel can execute only over a 1-, 2-, or a 3-dimensional set of work-items.
Every range kernel requires its first argument to be an instance of the
numba_dpex.kernel_api.Item
class. TheItem
object is an abstraction encapsulating the index position (id) of a single work-item in the global execution range. The id will be a 1-, 2-, or a 3-tuple depending the dimensionality of the execution range.A range kernel cannot return any value.
Note the rule is enforced only in the compiled mode and not in the pure Python execution on a kapi kernel.
A kernel can accept both array and scalar arguments. Array arguments currently can either be a
dpnp.ndarray
or adpctl.tensor.usm_ndarray
. Scalar values can be of any Python numeric type. Array arguments are passed by reference, i.e., changes to an array in a kernel are visible outside the kernel. Scalar values are always passed by value.At least one argument of a kernel should be an array. The requirement is so that the kernel launcher (
numba_dpex.core.kernel_launcher.call_kernel()
) can determine the execution queue on which to launch the kernel. Refer to the Launching a kernel section for more details.
A range kernel has to be executed via the
numba_dpex.core.kernel_launcher.call_kernel()
function by passing in
an instance of the numba_dpex.kernel_api.Range
class. Refer to the
Launching a kernel section for more details on how to launch a range
kernel.
A range kernel is meant to express a basic parallel-for calculation that is ideally suited for embarrassingly parallel kernels such as element-wise computations over n-dimensional arrays (ndarrays). The API for expressing a range kernel does not allow advanced features such as synchronization of work-items and fine-grained control over memory allocation on a device. For such advanced features, an nd-range kernel should be used.
See also
Refer API documentation for numba_dpex.kernel_api.Range
,
numba_dpex.kernel_api.Item
, and
numba_dpex.core.kernel_launcher.call_kernel()
for more details.