Supported Address Space Qualifiers#

The address space qualifier may be used to specify the region of memory that is used to allocate the object.

Numba-dpex supports three disjoint named address spaces:

Global Address Space

Global Address Space refers to memory objects allocated from the global memory pool and will be shared among all work-items. Arguments passed to any kernel are allocated in the global address space. In the below example, arguments a, b and c will be allocated in the global address space:

# SPDX-FileCopyrightText: 2020 - 2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0

import dpnp
import numpy.testing as testing

import numba_dpex as ndpx


# Data parallel kernel implementing vector sum
@ndpx.kernel
def kernel_vector_sum(a, b, c):
    i = ndpx.get_global_id(0)
    c[i] = a[i] + b[i]


# Utility function for printing and testing
def driver(a, b, c, global_size):
    kernel_vector_sum[ndpx.Range(global_size)](a, b, c)
    a_np = dpnp.asnumpy(a)  # Copy dpnp array a to NumPy array a_np
    b_np = dpnp.asnumpy(b)  # Copy dpnp array b to NumPy array b_np
    c_np = dpnp.asnumpy(c)  # Copy dpnp array c to NumPy array c_np
    testing.assert_equal(c_np, a_np + b_np)


# Main function
def main():
    N = 10
    global_size = N
    print("Vector size N", N)

    # Create random vectors on the default device
    a = dpnp.random.random(N)
    b = dpnp.random.random(N)
    c = dpnp.ones_like(a)

    print("Using device ...")
    print(a.device)
    driver(a, b, c, global_size)
    print("Done...")


if __name__ == "__main__":
    main()

Local Address Space
Local Address Space refers to memory objects that need to be allocated in local memory pool and are shared by all work-items of a work-group. Numba-dpex does not support passing arguments that are allocated in the local address space to @numba_dpex.kernel. Users are allowed to allocate static arrays in the local address space inside the @numba_dpex.kernel. In the example below numba_dpex.local.array(shape, dtype) is the API used to allocate a static array in the local address space:

Private Address Space

Private Address Space refers to memory objects that are local to each work-item and is not shared with any other work-item. In the example below numba_dpex.private.array(shape, dtype) is the API used to allocate a static array in the private address space:

# SPDX-FileCopyrightText: 2020 - 2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0

import dpctl
import dpctl.tensor as dpt
import numpy as np
from numba import float32

import numba_dpex as ndpx


def private_memory():
    """
    This example demonstrates the usage of numba_dpex's `private.array`
    intrinsic function. The function is used to create a static array
    allocated on the devices private address space.
    """

    @ndpx.kernel
    def private_memory_kernel(A):
        memory = ndpx.private.array(shape=1, dtype=np.float32)
        i = ndpx.get_global_id(0)

        # preload
        memory[0] = i
        ndpx.barrier(ndpx.LOCAL_MEM_FENCE)  # local mem fence

        # memory will not hold correct deterministic result if it is not
        # private to each thread.
        A[i] = memory[0] * 2

    N = 4
    device = dpctl.select_default_device()

    arr = dpt.zeros(N, dtype=dpt.float32, device=device)
    orig = np.arange(N).astype(np.float32)

    print("Using device ...")
    device.print_device_info()

    global_range = ndpx.Range(N)
    local_range = ndpx.Range(N)
    private_memory_kernel[ndpx.NdRange(global_range, local_range)](arr)

    arr_out = dpt.asnumpy(arr)
    np.testing.assert_allclose(orig * 2, arr_out)
    # the output should be `orig[i] * 2, i.e. [0, 2, 4, ..]``
    print(arr_out)


def main():
    private_memory()

    print("Done...")


if __name__ == "__main__":
    main()