Overview#
Data Parallel Extension for Numba* (numba-dpex) is an extension to
the Numba* Python JIT compiler adding an architecture-agnostic kernel
programming API, and a new front-end to compile the Data Parallel Extension
for Numpy* (dpnp) library. The dpnp Python library is a data-parallel
implementation of NumPy*’s API using the SYCL* language.
numba-dpex is an open-source project and can be installed as part of Intel
AI Analytics Toolkit or the Intel Distribution for Python*. The package is
also available on Anaconda cloud and as a Docker image on GitHub. Please refer
the Getting Started page to learn more.
Main Features#
Portable Kernel Programming#
The numba-dpex kernel programming API has a design similar to Numba’s
cuda.jit sub-module. The API is modeled after the SYCL* language and uses
the DPC++ SYCL runtime. Currently, compilation of kernels is supported for
SPIR-V-based OpenCL and oneAPI Level Zero devices CPU and GPU devices. In the
future, compilation support for other types of hardware that are supported by
DPC++ will be added.
The following example illustrates a vector addition kernel written with
numba-dpex kernel API.
import dpnp
import numba_dpex as dpex
@dpex.kernel
def vecadd_kernel(a, b, c):
i = dpex.get_global_id(0)
c[i] = a[i] + b[i]
a = dpnp.ones(1024, device="gpu")
b = dpnp.ones(1024, device="gpu")
c = dpnp.empty_like(a)
vecadd_kernel[dpex.Range(1024)](a, b, c)
print(c)
In the above example, three arrays are allocated on a default gpu device
using the dpnp library. The arrays are then passed as input arguments to the
kernel function. The compilation target and the subsequent execution of the
kernel is determined by the input arguments and follow the
“compute-follows-data” programming model as specified in the Python* Array API
Standard. To change the execution target to a CPU, the device keyword needs to
be changed to cpu when allocating the dpnp arrays. It is also possible
to leave the device keyword undefined and let the dpnp library select a
default device based on environment flag settings. Refer the
Kernel Programming Basics for further details.
dpjit decorator#
The numba-dpex package provides a new decorator dpjit that extends
Numba’s njit decorator. The new decorator is equivalent to
numba.njit(parallel=True), but additionally supports compiling dpnp
functions, prange loops, and array expressions that use dpnp.ndarray
objects.
Unlike Numba’s NumPy parallelization that only supports CPUs, dpnp
expressions are first converted to data-parallel kernels and can then be
offloaded to different types of devices. As dpnp implements the same API
as NumPy*, an existing numba.njit decorated function that uses
numpy.ndarray may be refactored to use dpnp.ndarray and decorated with
dpjit. Such a refactoring can allow the parallel regions to be offloaded
to a supported GPU device, providing users an additional option to execute their
code parallelly.
The vector addition example depicted using the kernel API can also be
expressed in several different ways using dpjit.
import dpnp
import numba_dpex as dpex
@dpex.dpjit
def vecadd_v1(a, b):
return a + b
@dpex.dpjit
def vecadd_v2(a, b):
return dpnp.add(a, b)
@dpex.dpjit
def vecadd_v3(a, b):
c = dpnp.empty_like(a)
for i in prange(a.shape[0]):
c[i] = a[i] + b[i]
return c
As with the kernel API example, a dpjit function if invoked with dpnp
input arguments follows the compute-follows-data programming model. Refer
user_manual/dpnp_offload/index for further details.