Overview#

Data Parallel Extension for Numba* (numba-dpex) is an extension to the Numba* Python JIT compiler adding an architecture-agnostic kernel programming API, and a new front-end to compile the Data Parallel Extension for Numpy* (dpnp) library. The dpnp Python library is a data-parallel implementation of NumPy*’s API using the SYCL* language.

numba-dpex is an open-source project and can be installed as part of Intel AI Analytics Toolkit or the Intel Distribution for Python*. The package is also available on Anaconda cloud and as a Docker image on GitHub. Please refer the Getting Started page to learn more.

Main Features#

Portable Kernel Programming#

The numba-dpex kernel programming API has a design similar to Numba’s cuda.jit sub-module. The API is modeled after the SYCL* language and uses the DPC++ SYCL runtime. Currently, compilation of kernels is supported for SPIR-V-based OpenCL and oneAPI Level Zero devices CPU and GPU devices. In the future, compilation support for other types of hardware that are supported by DPC++ will be added.

The following example illustrates a vector addition kernel written with numba-dpex kernel API.

import dpnp
import numba_dpex as dpex


@dpex.kernel
def vecadd_kernel(a, b, c):
    i = dpex.get_global_id(0)
    c[i] = a[i] + b[i]


a = dpnp.ones(1024, device="gpu")
b = dpnp.ones(1024, device="gpu")
c = dpnp.empty_like(a)

vecadd_kernel[dpex.Range(1024)](a, b, c)
print(c)

In the above example, three arrays are allocated on a default gpu device using the dpnp library. The arrays are then passed as input arguments to the kernel function. The compilation target and the subsequent execution of the kernel is determined by the input arguments and follow the “compute-follows-data” programming model as specified in the Python* Array API Standard. To change the execution target to a CPU, the device keyword needs to be changed to cpu when allocating the dpnp arrays. It is also possible to leave the device keyword undefined and let the dpnp library select a default device based on environment flag settings. Refer the Kernel Programming Basics for further details.

dpjit decorator#

The numba-dpex package provides a new decorator dpjit that extends Numba’s njit decorator. The new decorator is equivalent to numba.njit(parallel=True), but additionally supports compiling dpnp functions, prange loops, and array expressions that use dpnp.ndarray objects.

Unlike Numba’s NumPy parallelization that only supports CPUs, dpnp expressions are first converted to data-parallel kernels and can then be offloaded to different types of devices. As dpnp implements the same API as NumPy*, an existing numba.njit decorated function that uses numpy.ndarray may be refactored to use dpnp.ndarray and decorated with dpjit. Such a refactoring can allow the parallel regions to be offloaded to a supported GPU device, providing users an additional option to execute their code parallelly.

The vector addition example depicted using the kernel API can also be expressed in several different ways using dpjit.

import dpnp
import numba_dpex as dpex


@dpex.dpjit
def vecadd_v1(a, b):
    return a + b


@dpex.dpjit
def vecadd_v2(a, b):
    return dpnp.add(a, b)


@dpex.dpjit
def vecadd_v3(a, b):
    c = dpnp.empty_like(a)
    for i in prange(a.shape[0]):
        c[i] = a[i] + b[i]
    return c

As with the kernel API example, a dpjit function if invoked with dpnp input arguments follows the compute-follows-data programming model. Refer user_manual/dpnp_offload/index for further details.