Supported Address Space Qualifiers#
The address space qualifier may be used to specify the region of memory that is used to allocate the object.
Numba-dpex supports three disjoint named address spaces:
- Global Address Space
Global Address Space refers to memory objects allocated from the global memory pool and will be shared among all work-items. Arguments passed to any kernel are allocated in the global address space. In the below example, arguments a, b and c will be allocated in the global address space:
# SPDX-FileCopyrightText: 2020 - 2023 Intel Corporation # # SPDX-License-Identifier: Apache-2.0 import dpnp import numpy.testing as testing import numba_dpex as ndpx # Data parallel kernel implementing vector sum @ndpx.kernel def kernel_vector_sum(a, b, c): i = ndpx.get_global_id(0) c[i] = a[i] + b[i] # Utility function for printing and testing def driver(a, b, c, global_size): kernel_vector_sum[ndpx.Range(global_size)](a, b, c) a_np = dpnp.asnumpy(a) # Copy dpnp array a to NumPy array a_np b_np = dpnp.asnumpy(b) # Copy dpnp array b to NumPy array b_np c_np = dpnp.asnumpy(c) # Copy dpnp array c to NumPy array c_np testing.assert_equal(c_np, a_np + b_np) # Main function def main(): N = 10 global_size = N print("Vector size N", N) # Create random vectors on the default device a = dpnp.random.random(N) b = dpnp.random.random(N) c = dpnp.ones_like(a) print("Using device ...") print(a.device) driver(a, b, c, global_size) print("Done...") if __name__ == "__main__": main()
- Local Address Space
Local Address Space refers to memory objects that need to be allocated in local memory pool and are shared by all work-items of a work-group. Numba-dpex does not support passing arguments that are allocated in the local address space to @numba_dpex.kernel. Users are allowed to allocate static arrays in the local address space inside the @numba_dpex.kernel. In the example below numba_dpex.local.array(shape, dtype) is the API used to allocate a static array in the local address space:
- Private Address Space
Private Address Space refers to memory objects that are local to each work-item and is not shared with any other work-item. In the example below numba_dpex.private.array(shape, dtype) is the API used to allocate a static array in the private address space:
# SPDX-FileCopyrightText: 2020 - 2023 Intel Corporation # # SPDX-License-Identifier: Apache-2.0 import dpctl import dpctl.tensor as dpt import numpy as np from numba import float32 import numba_dpex as ndpx def private_memory(): """ This example demonstrates the usage of numba_dpex's `private.array` intrinsic function. The function is used to create a static array allocated on the devices private address space. """ @ndpx.kernel def private_memory_kernel(A): memory = ndpx.private.array(shape=1, dtype=np.float32) i = ndpx.get_global_id(0) # preload memory[0] = i ndpx.barrier(ndpx.LOCAL_MEM_FENCE) # local mem fence # memory will not hold correct deterministic result if it is not # private to each thread. A[i] = memory[0] * 2 N = 4 device = dpctl.select_default_device() arr = dpt.zeros(N, dtype=dpt.float32, device=device) orig = np.arange(N).astype(np.float32) print("Using device ...") device.print_device_info() global_range = ndpx.Range(N) local_range = ndpx.Range(N) private_memory_kernel[ndpx.NdRange(global_range, local_range)](arr) arr_out = dpt.asnumpy(arr) np.testing.assert_allclose(orig * 2, arr_out) # the output should be `orig[i] * 2, i.e. [0, 2, 4, ..]`` print(arr_out) def main(): private_memory() print("Done...") if __name__ == "__main__": main()