Welcome to numba-dpex's documentation! ====================================== Numba data-parallel extension (`numba-dpex `_) is an Intel |reg|-developed extension to the `Numba `_ JIT compiler. The extension adds kernel programming and automatic offload capabilities to the Numba compiler. Numba-dpex is part of `Intel oneAPI Base Toolkit `_ and distributed with the `Intel Distribution for Python* `_. The goal of the extension is to make it easy for Python programmers to write efficient and portable code for a mix of architectures across CPUs, GPUs, FPGAs and other accelerators. Numba-dpex provides an API to write data-parallel kernels directly in Python and compiles the kernels to a lower-level kernels that are executed using a `SYCL `_ runtime library. Presently, only Intel's `DPC++ `_ SYCL runtime is supported via the `dpctl `_ package, and only OpenCL and Level Zero devices are supported. Support for other SYCL runtime libraries and hardwares may be added in the future. Along with the kernel programming API an auto-offload feature is also provided. The feature enables automatic generation of kernels from data-parallel NumPy library calls and array expressions, Numba ``prange`` loops, and `other "data-parallel by construction" expressions `_ that Numba is able to parallelize. Following two examples demonstrate the two ways in which kernels may be written using numba-dpex. - Defining a data-parallel kernel explicitly. .. code-block:: python import numpy as np import numba_dpex as dpex import dpctl @dpex.kernel def sum(a, b, c): i = dpex.get_global_id(0) c[i] = a[i] + b[i] a = np.array(np.random.random(20), dtype=np.float32) b = np.array(np.random.random(20), dtype=np.float32) c = np.ones_like(a) with dpctl.device_context("opencl:gpu"): sum[20, dpex.DEFAULT_LOCAL_SIZE](a, b, c) - Writing implicitly data-parallel expressions in the fashion of `Numba parallel loops `_. .. code-block:: python from numba import njit import numpy as np import dpctl @njit def f1(a, b): c = a + b return c global_size = 64 local_size = 32 N = global_size * local_size a = np.ones(N, dtype=np.float32) b = np.ones(N, dtype=np.float32) with dpctl.device_context("opencl:gpu:0"): c = f1(a, b) .. toctree:: :maxdepth: 1 :caption: Core Features CoreFeatures .. toctree:: :maxdepth: 1 :caption: User Guides Getting Started Direct kernel programming Debugging with GDB Docker numba-dpex for numba.cuda Programmers .. toctree:: :maxdepth: 1 :caption: Developer Guides developer_guides/dpnp_integration developer_guides/tools Contributing ============ Refer the `contributing guide `_ for information on coding style and standards used in numba-dpex. License ======= Numba-dpex is Licensed under Apache License 2.0 that can be found in `LICENSE `_. All usage and contributions to the project are subject to the terms and conditions of this license. Indices and tables ================== .. only:: builder_html * :ref:`genindex` * :ref:`modindex` * :ref:`search` .. only:: not builder_html * :ref:`modindex` .. |reg| unicode:: U+000AE .. REGISTERED SIGN