Reduction on SYCL-supported Devices

Numba-dpex does not yet provide any specific decorator to implement reduction kernels. However, a kernel reduction can be written explicitly. This section provides two approaches for writing a reduction kernel as a numba_dpex.kernel function.

Example 1

This example demonstrates a summation reduction on a one-dimensional array.

Full example can be found at numba_dpex/examples/sum_reduction.py.

In this example, to reduce the array we invoke the kernel multiple times.

@ndpx.kernel
def sum_reduction_kernel(A, R, stride):
    i = ndpx.get_global_id(0)
    # sum two element
    R[i] = A[i] + A[i + stride]
    # store the sum to be used in nex iteration
    A[i] = R[i]
def sum_reduce(A):
    """Size of A should be power of two."""
    total = len(A)
    # max size will require half the size of A to store sum
    R = np.array(np.random.random(math.ceil(total / 2)), dtype=A.dtype)

    while total > 1:
        global_size = total // 2
        sum_reduction_kernel[ndpx.Range(global_size)](A, R, global_size)
        total = total // 2

    return R[0]

Example 2

Full example can be found at numba_dpex/examples/sum_reduction_recursive_ocl.py.

Note

Numba-dpex does not yet provide any analogue to the numba.cuda.reduce decorator for writing reductions kernel. Such a decorator will be added in future releases.

Full examples

  • numba_dpex/examples/sum_reduction_recursive_ocl.py

  • numba_dpex/examples/sum_reduction_ocl.py

  • numba_dpex/examples/sum_reduction.py