Supported Address Space Qualifiers
The address space qualifier may be used to specify the region of memory that is used to allocate the object.
Three disjoint named address spaces are supported:
- Global Address Space
Global Address Space refers to memory objects allocated from the global memory pool and will be shared among all work-items. Arguments passed to any kernel are allocated in the global address space. In the below example, arguments a, b and c will be allocated in the global address space:
#! /usr/bin/env python # Copyright 2020, 2021 Intel Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import dpctl import numpy as np import numpy.testing as testing import numba_dpex as dppy @dppy.kernel def data_parallel_sum(a, b, c): """ Vector addition using the ``kernel`` decorator. """ i = dppy.get_global_id(0) c[i] = a[i] + b[i] def driver(a, b, c, global_size): print("A : ", a) print("B : ", b) data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c) print("A + B = ") print("C ", c) testing.assert_equal(c, a + b) def main(): global_size = 10 N = global_size print("N", N) a = np.array(np.random.random(N), dtype=np.float32) b = np.array(np.random.random(N), dtype=np.float32) c = np.ones_like(a) # Use the environment variable SYCL_DEVICE_FILTER to change the default device. # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter. device = dpctl.select_default_device() print("Using device ...") device.print_device_info() with dpctl.device_context(device): driver(a, b, c, global_size) print("Done...") if __name__ == "__main__": main()
- Local Address Space
Local Address Space refers to memory objects that need to be allocated in local memory pool and are shared by all work-items of a work-group. It is not supported to pass arguments that are allocated in the local address space to @numba_dpex.kernel. Users are allowed to allocate static arrays in the local address space inside the @numba_dpex.kernel. In the example below numba_dpex.local.array(shape, dtype) is the API used to allocate a static array in the local address space:
def local_memory(): """ This example demonstrates the usage of numba-dpex's `local.array` intrinsic function. The function is used to create a static array allocated on the devices local address space. """ blocksize = 10 @dppy.kernel def reverse_array(A): lm = dppy.local.array(shape=10, dtype=float32) i = dppy.get_global_id(0) # preload lm[i] = A[i] # barrier local or global will both work as we only have one work group dppy.barrier(dppy.CLK_LOCAL_MEM_FENCE) # local mem fence # write A[i] += lm[blocksize - 1 - i] arr = np.arange(blocksize).astype(np.float32) print(arr) # Use the environment variable SYCL_DEVICE_FILTER to change the default device. # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter. device = dpctl.select_default_device() print("Using device ...") device.print_device_info() with dpctl.device_context(device): reverse_array[blocksize, dppy.DEFAULT_LOCAL_SIZE](arr) # the output should be `orig[::-1] + orig, i.e. [9, 9, 9, ...]`` print(arr)
- Private Address Space
Private Address Space refers to memory objects that are local to each work-item and is not shared with any other work-item. In the example below numba_dpex.private.array(shape, dtype) is the API used to allocate a static array in the private address space:
# Copyright 2020, 2021 Intel Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import dpctl import numpy as np from numba import float32 import numba_dpex def private_memory(): """ This example demonstrates the usage of numba-dpex's `private.array` intrinsic function. The function is used to create a static array allocated on the devices private address space. """ @numba_dpex.kernel def private_memory_kernel(A): memory = numba_dpex.private.array(shape=1, dtype=np.float32) i = numba_dpex.get_global_id(0) # preload memory[0] = i numba_dpex.barrier(numba_dpex.CLK_LOCAL_MEM_FENCE) # local mem fence # memory will not hold correct deterministic result if it is not # private to each thread. A[i] = memory[0] * 2 N = 4 arr = np.zeros(N).astype(np.float32) orig = np.arange(N).astype(np.float32) # Use the environment variable SYCL_DEVICE_FILTER to change the default device. # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter. device = dpctl.select_default_device() print("Using device ...") device.print_device_info() with numba_dpex.offload_to_sycl_device(device): private_memory_kernel[N, N](arr) np.testing.assert_allclose(orig * 2, arr) # the output should be `orig[i] * 2, i.e. [0, 2, 4, ..]`` print(arr) def main(): private_memory() print("Done...") if __name__ == "__main__": main()