Supported Address Space Qualifiers

The address space qualifier may be used to specify the region of memory that is used to allocate the object.

Three disjoint named address spaces are supported:

  1. Global Address Space

    Global Address Space refers to memory objects allocated from the global memory pool and will be shared among all work-items. Arguments passed to any kernel are allocated in the global address space. In the below example, arguments a, b and c will be allocated in the global address space:

    #! /usr/bin/env python
    # Copyright 2020, 2021 Intel Corporation
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #      http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    import dpctl
    import numpy as np
    import numpy.testing as testing
    
    import numba_dpex as dppy
    
    
    @dppy.kernel
    def data_parallel_sum(a, b, c):
        """
        Vector addition using the ``kernel`` decorator.
        """
        i = dppy.get_global_id(0)
        c[i] = a[i] + b[i]
    
    
    def driver(a, b, c, global_size):
        print("A : ", a)
        print("B : ", b)
        data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
        print("A + B = ")
        print("C ", c)
        testing.assert_equal(c, a + b)
    
    
    def main():
        global_size = 10
        N = global_size
        print("N", N)
    
        a = np.array(np.random.random(N), dtype=np.float32)
        b = np.array(np.random.random(N), dtype=np.float32)
        c = np.ones_like(a)
    
        # Use the environment variable SYCL_DEVICE_FILTER to change the default device.
        # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
        device = dpctl.select_default_device()
        print("Using device ...")
        device.print_device_info()
    
        with dpctl.device_context(device):
            driver(a, b, c, global_size)
    
        print("Done...")
    
    
    if __name__ == "__main__":
        main()
    
  2. Local Address Space

    Local Address Space refers to memory objects that need to be allocated in local memory pool and are shared by all work-items of a work-group. It is not supported to pass arguments that are allocated in the local address space to @numba_dpex.kernel. Users are allowed to allocate static arrays in the local address space inside the @numba_dpex.kernel. In the example below numba_dpex.local.array(shape, dtype) is the API used to allocate a static array in the local address space:

    def local_memory():
        """
        This example demonstrates the usage of numba-dpex's `local.array`
        intrinsic function. The function is used to create a static array
        allocated on the devices local address space.
        """
        blocksize = 10
    
        @dppy.kernel
        def reverse_array(A):
            lm = dppy.local.array(shape=10, dtype=float32)
            i = dppy.get_global_id(0)
    
            # preload
            lm[i] = A[i]
            # barrier local or global will both work as we only have one work group
            dppy.barrier(dppy.CLK_LOCAL_MEM_FENCE)  # local mem fence
            # write
            A[i] += lm[blocksize - 1 - i]
    
        arr = np.arange(blocksize).astype(np.float32)
        print(arr)
    
        # Use the environment variable SYCL_DEVICE_FILTER to change the default device.
        # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
        device = dpctl.select_default_device()
        print("Using device ...")
        device.print_device_info()
    
        with dpctl.device_context(device):
            reverse_array[blocksize, dppy.DEFAULT_LOCAL_SIZE](arr)
    
        # the output should be `orig[::-1] + orig, i.e. [9, 9, 9, ...]``
        print(arr)
    
  3. Private Address Space

    Private Address Space refers to memory objects that are local to each work-item and is not shared with any other work-item. In the example below numba_dpex.private.array(shape, dtype) is the API used to allocate a static array in the private address space:

    # Copyright 2020, 2021 Intel Corporation
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #      http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    import dpctl
    import numpy as np
    from numba import float32
    
    import numba_dpex
    
    
    def private_memory():
        """
        This example demonstrates the usage of numba-dpex's `private.array`
        intrinsic function. The function is used to create a static array
        allocated on the devices private address space.
        """
    
        @numba_dpex.kernel
        def private_memory_kernel(A):
            memory = numba_dpex.private.array(shape=1, dtype=np.float32)
            i = numba_dpex.get_global_id(0)
    
            # preload
            memory[0] = i
            numba_dpex.barrier(numba_dpex.CLK_LOCAL_MEM_FENCE)  # local mem fence
    
            # memory will not hold correct deterministic result if it is not
            # private to each thread.
            A[i] = memory[0] * 2
    
        N = 4
        arr = np.zeros(N).astype(np.float32)
        orig = np.arange(N).astype(np.float32)
    
        # Use the environment variable SYCL_DEVICE_FILTER to change the default device.
        # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
        device = dpctl.select_default_device()
        print("Using device ...")
        device.print_device_info()
    
        with numba_dpex.offload_to_sycl_device(device):
            private_memory_kernel[N, N](arr)
    
        np.testing.assert_allclose(orig * 2, arr)
        # the output should be `orig[i] * 2, i.e. [0, 2, 4, ..]``
        print(arr)
    
    
    def main():
        private_memory()
    
        print("Done...")
    
    
    if __name__ == "__main__":
        main()