DPNP integration

Integration with DPNP backend library 

NumPy function calls are replaced with DPNP function calls.

import numpy as np
from numba import njit
import dpctl

@njit
def foo(a):
  return np.sum(a)  # this call will be replaced with DPNP function

a = np.arange(42)

with dpctl.device_context():
  result = foo(a)

print(result)

np.sum(a) will be replaced with dpnp_sum_c<int, int>(…).

Repository map

Code for integration is mostly resides in numba_dpex/dpnp_iface.
Tests resides in numba_dpex/tests/njit_tests/dpnp.
Helper pass resides in numba_dpex/rename_numpy_functions_pass.py.

Architecture

Default Numba compiler pipeline is modified and extended with DPPYRewriteOverloadedNumPyFunctions pass.

The main work is performed in RewriteNumPyOverloadedFunctions used by the pass. It rewrites call for NumPy function in following way:

np.sum(a) -> numba_dpex.dpnp.sum(a)

numba_dpex.dpnp contains stub functions (defined as classes) like following:

# numba_dpex/dpnp_iface/stubs.py - imported in numba_dpex.__init__.py

class dpnp(Stub):

  class sum(Stub):  # stub function
    pass

For the stub function call to be lowered with Numba compiler pipeline there is overload in numba_dpex/dpnp_iface/dpnp_transcendentalsimpl.py:

@overload(stubs.dpnp.sum)
def dpnp_sum_impl(a):
  ...

Overload implementation knows about DPNP functions. It receives DPNP function pointer from DPNP and uses known signature from DPNP headers. The implementation calls DPNP function via creating Numba ExternalFunctionPointer.

For more details about overloads implementation see Writing overload for stub function.

For more details about testing the integration see Writing DPNP integration tests.

Places to update

numba_dpex/dpnp_iface/stubs.py: Add new class to stubs.dpnp class.
numba_dpex/dpnp_iface/dpnp_fptr_interface.pyx: Update items in DPNPFuncName enum.
numba_dpex/dpnp_iface/dpnp_fptr_interface.pyx: Update if statements in get_DPNPFuncName_from_str() function.
Add @overload(stubs.dpnp.YOUR_FUNCTION) in one of the numba_dpex/dpnp_iface/*.py modules or create new.
numba_dpex/rename_numpy_functions_pass.py: Update items in rewrite_function_name_map dict.
numba_dpex/rename_numpy_functions_pass.py: Update imported modules in DPPYRewriteOverloadedNumPyFunctions.__init__().
Add test in one of the numba_dpex/tests/njit_tests/dpnp test modules or create new.

Writing overload for stub function

Overloads for stub functions resides in numba_dpex/dpnp_iface/*.py modules. If you need create new module try to name it corresponding to DPNP naming. I.e. dpnp/backend/kernels/dpnp_krnl_indexing.cpp -> numba_dpex/dpnp_iface/dpnp_indexing.py.

from numba.core.extending import overload
import numba_dpex.dpnp_iface as dpnp_lowering
...

@overload(stubs.dpnp.sum)
def dpnp_sum_impl(a):
  dpnp_lowering.ensure_dpnp("sum")

ensure_dpnp() checks that DPNP package is available and contains the function.

from numba import types
from numba.core.typing import signature
...
# continue of dpnp_sum_impl()
  """
  dpnp source:
  https://github.com/IntelPython/dpnp/blob/0.6.1dev/dpnp/backend/kernels/dpnp_krnl_reduction.cpp#L59

  Function declaration:
  void dpnp_sum_c(void* result_out,
                  const void* input_in,
                  const size_t* input_shape,
                  const size_t input_shape_ndim,
                  const long* axes,
                  const size_t axes_ndim,
                  const void* initial,
                  const long* where)

  """
  sig = signature(
      types.void,  # return type
      types.voidptr,  # void* result_out,
      types.voidptr,  # const void* input_in,
      types.voidptr,  # const size_t* input_shape,
      types.intp,  # const size_t input_shape_ndim,
      types.voidptr,  # const long* axes,
      types.intp,  # const size_t axes_ndim,
      types.voidptr,  # const void* initial,
      types.voidptr,  # const long* where)
  )

Signature sig is based on the DPNP function signature defined in header file. It is recommended to provide link to signature in DPNP sources and copy it in comment as shown above.

For mapping between C types and Numba types see Types matching for Numba and DPNP.

import numba_dpex.dpnp_iface.dpnpimpl as dpnp_ext
...
# continue of dpnp_sum_impl()
  dpnp_func = dpnp_ext.dpnp_func("dpnp_sum", [a.dtype.name, "NONE"], sig)

dpnp_ext.dpnp_func() returns function pointer from DPNP. It receives:

Function name (i.e. "dpnp_sum") which is converted to DPNPFuncName enum in get_DPNPFuncName_from_str().
List of input and output data types names (i.e. [a.dtype.name, "NONE"], "NONE" means reusing previous type name) which is converted to DPNPFuncType enum in get_DPNPFuncType_from_str().
Signature which is used for creating Numba ExternalFunctionPointer.

import numba_dpex.dpnp_iface.dpnpimpl as dpnp_ext
...
# continue of dpnp_sum_impl()
  PRINT_DEBUG = dpnp_lowering.DEBUG

  def dpnp_impl(a):
      out = np.empty(1, dtype=a.dtype)
      common_impl(a, out, dpnp_func, PRINT_DEBUG)

      return out[0]

  return dpnp_impl

This code created implementation function and returns it from the overload function.

PRINT_DEBUG used for printing debug information which is used in tests. Tests rely on debug information to check that DPNP implementation was used. See Writing DPNP integration tests.

dpnp_impl() creates output array with size and data type corresponding to DPNP function output array.

dpnp_impl() could call NumPy functions supported by Numba and other stab functions (i.e. numba_dpex.dpnp.dot()).

The implementation function usually reuse a common function like common_impl(). This approach eliminates code duplication. You should consider all available common functions at the top of the file before creating the new one.

from numba.core.extending import register_jitable
from numba_dpex import dpctl_functions
import numba_dpex.dpnp_iface.dpnpimpl as dpnp_ext
...

@register_jitable
def common_impl(a, out, dpnp_func, print_debug):
    if a.size == 0:
        raise ValueError("Passed Empty array")

    sycl_queue = dpctl_functions.get_current_queue()
    a_usm = dpctl_functions.malloc_shared(a.size * a.itemsize, sycl_queue)  # 1
    dpctl_functions.queue_memcpy(sycl_queue, a_usm, a.ctypes, a.size * a.itemsize)  # 2

    out_usm = dpctl_functions.malloc_shared(a.itemsize, sycl_queue)  # 1

    axes, axes_ndim = 0, 0
    initial = 0
    where = 0

    dpnp_func(out_usm, a_usm, a.shapeptr, a.ndim, axes, axes_ndim, initial, where)  # 3

    dpctl_functions.queue_memcpy(
        sycl_queue, out.ctypes, out_usm, out.size * out.itemsize
    )  # 4

    dpctl_functions.free_with_queue(a_usm, sycl_queue)  # 5
    dpctl_functions.free_with_queue(out_usm, sycl_queue)  # 5

    dpnp_ext._dummy_liveness_func([a.size, out.size])  # 6

    if print_debug:
        print("dpnp implementation")  # 7

Key parts of any common function are:

Allocate input and output USM arrays
Copy input array to input USM array
Call dpnp_func()
Copy output USM array to output array
Deallocate USM arrays
Disable dead code elimination for input and output arrays
Print debug information used for testing

Types matching for Numba and DPNP

[const] T* -> types.voidptr
size_t -> types.intp
long -> types.int64

We are using void * in case of size_t * as Numba currently does not have any type to represent size_t *. Since, both the types are pointers, if the compiler allows there should not be any mismatch in the size of the container to hold different types of pointer.

Writing DPNP integration tests

See all DPNP integration tests in numba_dpex/tests/njit_tests/dpnp.

Usually adding new test is as easy as adding function name to the corresponding list of function names. Each item in the list is used as a parameter for tests. You should find tests for the category of functions similar to your function and update a list with function names like list_of_unary_ops, list_of_nan_ops.

@pytest.mark.parametrize("filter_str", filter_strings)
def test_unary_ops(filter_str, unary_op, input_array, get_shape, capfd):
  a = input_array  # 1
  a = np.reshape(a, get_shape)
  op, name = unary_op  # 2
  if (name == "cumprod" or name == "cumsum") and (
      filter_str == "opencl:cpu:0" or is_gen12(filter_str)
  ):
      pytest.skip()
  actual = np.empty(shape=a.shape, dtype=a.dtype)
  expected = np.empty(shape=a.shape, dtype=a.dtype)

  f = njit(op)  # 3
  with dpctl.device_context(filter_str), dpnp_debug():  # 7
      actual = f(a)  # 4
      captured = capfd.readouterr()
      assert "dpnp implementation" in captured.out  # 8

  expected = op(a)  # 5
  max_abs_err = np.sum(actual - expected)
  assert max_abs_err < 1e-4  # 6

Test functions starts from test_ (see pytest docs) and all input parameters are provided by fixtures.

In example above unary_op contains tuple (FUNCTION, FUNCTION_NAME), see fixture unary_op().

Key parts of any test are:

Receive input array from the fixture input_array
Receive the tested function from fixture unary_op
Compile the tested function with njit()
Call the compiled tested function inside device_context() device_context and receive actual result
Call the original tested function and receive expected result
Compare actual and expected result
Run the compiled test function inside debug contex dpnp_debug()
Check that DPNP was usede as debug information was printed to output

Troubleshooting

Do not forget to rebuild Python extensions with current installed version of DPNP. There is headers dependency in Cython files (i.e. numba_dpex/dpnp_iface/dpnp_fptr_interface.pyx).
Do not forget add array to dpnp_ext._dummy_liveness_func([YOUR_ARRAY.size]). Dead code elimination could delete temporary variables before they are used for DPNP function call. As a result wrong data could be passed to DPNP function.