Welcome to numba-dpex’s documentation!

Numba data-parallel extension (numba-dpex) is an Intel ®-developed extension to the Numba JIT compiler. The extension adds kernel programming and automatic offload capabilities to the Numba compiler. Numba-dpex is part of Intel oneAPI Base Toolkit and distributed with the Intel Distribution for Python*. The goal of the extension is to make it easy for Python programmers to write efficient and portable code for a mix of architectures across CPUs, GPUs, FPGAs and other accelerators.

Numba-dpex provides an API to write data-parallel kernels directly in Python and compiles the kernels to a lower-level kernels that are executed using a SYCL runtime library. Presently, only Intel’s DPC++ SYCL runtime is supported via the dpctl package, and only OpenCL and Level Zero devices are supported. Support for other SYCL runtime libraries and hardwares may be added in the future.

Along with the kernel programming API an auto-offload feature is also provided. The feature enables automatic generation of kernels from data-parallel NumPy library calls and array expressions, Numba prange loops, and other “data-parallel by construction” expressions that Numba is able to parallelize. Following two examples demonstrate the two ways in which kernels may be written using numba-dpex.

  • Defining a data-parallel kernel explicitly.

    import numpy as np
    import numba_dpex as dpex
    import dpctl
    
    
    @dpex.kernel
    def sum(a, b, c):
        i = dpex.get_global_id(0)
        c[i] = a[i] + b[i]
    
    
    a = np.array(np.random.random(20), dtype=np.float32)
    b = np.array(np.random.random(20), dtype=np.float32)
    c = np.ones_like(a)
    
    with dpctl.device_context("opencl:gpu"):
        sum[20, dpex.DEFAULT_LOCAL_SIZE](a, b, c)
    
  • Writing implicitly data-parallel expressions in the fashion of Numba parallel loops.

    from numba import njit
    import numpy as np
    import dpctl
    
    
    @njit
    def f1(a, b):
        c = a + b
        return c
    
    
    global_size = 64
    local_size = 32
    N = global_size * local_size
    a = np.ones(N, dtype=np.float32)
    b = np.ones(N, dtype=np.float32)
    with dpctl.device_context("opencl:gpu:0"):
        c = f1(a, b)
    

Contributing

Refer the contributing guide for information on coding style and standards used in numba-dpex.

License

Numba-dpex is Licensed under Apache License 2.0 that can be found in LICENSE. All usage and contributions to the project are subject to the terms and conditions of this license.

Indices and tables