Welcome to ``numba-dpex``'s documentation! ====================================== `numba-dpex `_ is an Intel |reg|-developed extension to the `Numba `_ JIT compiler that adds "XPU" programming capabilities to it. The `XPU vision `_ is to make it extremely easy for programmers to write efficient and portable code for a mix of architectures across CPUs, GPUs, FPGAs and other accelerators. To provide XPU programming capabilities, the extension relies on `SYCL `_ that is an industry standard for writing cross-platform code using standard C++. Using a SYCL runtime library the extension can launch data-parallel kernels generated directly from Python bytecode on supported data-parallel architectures. Currently, support for SYCL is restricted to Intel's `DPC++ `_ via the `dpctl `_ package. Support for other SYCL runtime libraries may be added in the future. The main feature of the extension is to let programmers write data-parallel kernels directly in Python. Such kernels can be written in two different ways: an explicit API superficially similar to OpenCL, and an implicit API that generates kernels from NumPy library calls, Numba's ``prange`` statement, and `other "data-parallel by construction" expressions `_ that Numba is able to parallelize. Following are two examples to demonstrate the two ways in which kernels may be written using the extension. - Defining a data-parallel kernel explicitly. .. code-block:: python import numpy as np import numba_dpex as dppy import dpctl @dppy.kernel def sum(a, b, c): i = dppy.get_global_id(0) c[i] = a[i] + b[i] a = np.array(np.random.random(20), dtype=np.float32) b = np.array(np.random.random(20), dtype=np.float32) c = np.ones_like(a) with dpctl.device_context("opencl:gpu"): sum[20, dppy.DEFAULT_LOCAL_SIZE](a, b, c) - Writing implicitly data-parallel expressions in the fashion of `Numba parallel loops `_. .. code-block:: python from numba import njit import numpy as np import dpctl @njit def f1(a, b): c = a + b return c global_size = 64 local_size = 32 N = global_size * local_size a = np.ones(N, dtype=np.float32) b = np.ones(N, dtype=np.float32) with dpctl.device_context("opencl:gpu:0"): c = f1(a, b) .. toctree:: :maxdepth: 1 :caption: Core Features CoreFeatures .. toctree:: :maxdepth: 1 :caption: User Guides Getting Started Programming SYCL Kernels Debugging with GDB For numba.cuda Programmers .. toctree:: :maxdepth: 1 :caption: Developer Guides developer_guides/dpnp_integration developer_guides/tools About ===== ``numba-dpex`` is developed by Intel and is part of the `Intel Distribution for Python `_. Contributing ============ Refer the `contributing guide `_ for information on coding style and standards used in the project. License ======= The code is Licensed under Apache License 2.0 that can be found in `LICENSE `_. All usage and contributions to the project are subject to the terms and conditions of this license. Indices and tables ================== .. only:: builder_html * :ref:`genindex` * :ref:`modindex` * :ref:`search` .. only:: not builder_html * :ref:`modindex` .. |reg| unicode:: U+000AE .. REGISTERED SIGN