oneAPI for the Scientific Python Community

Diptorup Deb and Oleksandr Pavlyk

The scientific Python software ecosystem for heterogeneous computing is highly fragmented with different set of libraries and modules for different architectures. The fragmentation of the software ecosystem makes it harder to write portable Python code and goes counter to one of the central tenets of Python: "Simple is better than complex".

No need to reimplement code

The poster introduces our ongoing work to introduce a programming model and set of packages that will help a scientific Python programmer program different types of devices such as CPU, GPU, accelerators without having to reimplement their code.

Use oneAPI directly from Python

The work is based on interfacing oneAPI, an open standard for unified device programming, with Python and helping a programmer be more productive by efficiently using the oneAPI programming model directly from Python.

What is oneAPI?

oneAPI is an open standard for a unified application programming interface (API) that delivers a common developer experience across accelerator architectures, including multi-core CPUs, GPUs, and FPGAs.

oneAPI unified programming across multiple architectures

A freely available implementation of the standard is available through Intel® oneAPI Toolkits. The Intell® Base Toolkit features an industry-leading C++ compiler that implements SYCL*, an evolution of C++ for heterogeneous computing. It also includes a suite of performance libraries, such as Intel® oneAPI Math Kernel Library (oneMKL), etc, as well as Intel® Distribution for Python*.

Data-parallel extensions for Python

The data-parallel extensions for Python are a set of Python packages to help interface Python with oneAPI and bring the oneAPI programming model to Python programmers. Read more ...



dpctl provides Python users access to data-parallel computing resources targeted by oneAPI, such as device selection, queue construction (used to specify offload target), Unified Shared Memory (USM) allocation, and USM-based ND-array object. dpctl.tensor submodule lets Python users get their job done using tensor operations powered by pure SYCL generic kernels for portability.

dpctl provides the necessary Python bindings to make a SYCL library into a Python native extension and subsequently use it from Python.




numba-dpex is a standalone extension to the Numba JIT compiler. Numba-dpex adds two features to Numba:

An OpenCL-style compute API to write oneAPI kernels directly in Python.

An extension to Numba's parallelizer to generate kernels from data parallel code regions that are identified by Numba and offload them to user specified device.




dpnp is a NumPy-like library that is built using oneAPI DPC++ compiler and oneAPI performance libraries such as oneAPI MKL and oneAPI DPC++ Library. dpnp will provide cross-architecture performance portability to NumPy users without requiring changes to their existing NumPy code.

While building on dpctl.tensor, the dpnp extends it with richer API surface, as well as with linear algebra, fast Fourier transform, and random number generation submodules powered by oneMKL.


Install extensions using conda

$ conda install -c dppy/label/dev dpctl dpnp numba-dpex

Contributions welcome!

We do a Pull Request contributions workflow on GitHub for all projects. New users are always welcome!


Find us on Gitter ,
create issues for dpctl, numba-dpex or dpnp,
get help on Intel Developer Zone.