Defining the execution queue for a kernel function

There are two ways to specify the queue where a kernel is executed. The first way follows the notion of “compute follows data” (CFD). The second way is for a programmer to specify the execution queue using a dpctl.device_context context manager.

In the CFD style of programming kernels, the execution queue is determined based on the input arguments passed to a kernel function. Currently, the kernel API only supports array arguments that provide the __sycl_usm_array_interface__ (SUAI) attribute for CFD style programming. The SUAI attribute encodes the queue where the array was defined.

We also allow passing arrays and data types that do not provide SUAI. For such cases, programmers need to specify the queue using the dpctl.device_context context manager. Do note that the use of dpctl.device_context is deprecated and slotted for removal in some future release.

Users are not allowed to pass mixed type of arrays to a @numba_dpex.kernel. For example, if the first array argument to a @numba_dpex.kernel is of type numpy.ndarray, the rest of the array argument will also have to be of type numpy.ndarray.

The following are how users can specify in which device they want to offload their computation.

numpy.ndarray
Using context manager, with dpctl.device_context(SYCL_device). Please look at method select_device_ndarray() in the example below.
Array with __sycl_usm_array_interface__ attribute
In this case the Compute Follows Data semantics is supported. Compute Follows Data stipulates that computation must be off-loaded to device where data is resident.
Expected behavior in different cases:
Users are allowed to mix arrays created using equivalent SYCL queues. Where equivalent queues are defined as:

Two SYCL queues are equivalent if they have the same:

SYCL context

SYCL device

Same queue properties

All usm-types are accessible from device. Users can mix arrays with different usm-type as long as they were allocated using the equvalent SYCL queue.

Using the context_manager to specify a queue when passing SUAI args will have no effect on queue selection and a warning will be printed out to inform the user.

Example

# Copyright 2021 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import warnings

import dpctl
import dpctl.tensor as dpt
import numpy as np

import numba_dpex

"""
We support passing arrays of two types to a @numba_dpex.kernel decorated
function.

1. numpy.ndarray
2. Any array with __sycl_usm_array_interface__ (SUAI) attribute.

Users are not allowed to mix the type of arrays passed as arguments. As in, all
the arguments passed to a @numba_dpex.kernel has to have the same type. For
example, if the first array argument is of type numpy.ndarray the rest of the
array arguments will also have to be of type numpy.ndarray.

The following are how users can specify in which device they want to offload
their computation.

1. numpy.ndarray
    Using context manager provided by numba-dpex. Please look at method:
        select_device_ndarray()

2. Array with __sycl_usm_array_interface__ attribute
    We follow compute follows data which states that the device where the
    data resides will be selected for computation. Please look at method:
         select_device_SUAI()

    Users can mix SUAI arrays created using equivalent SYCL queues.
    Two SYCL queues are equivalent if they have the same:
        1. SYCL context
        2. SYCL device
        3. Same queue properties
"""


@numba_dpex.kernel
def sum_kernel(a, b, c):
    i = numba_dpex.get_global_id(0)
    c[i] = a[i] + b[i]


def allocate_SUAI_data(a, b, got, usm_type, queue):
    da = dpt.usm_ndarray(
        a.shape,
        dtype=a.dtype,
        buffer=usm_type,
        buffer_ctor_kwargs={"queue": queue},
    )
    da.usm_data.copy_from_host(a.reshape((-1)).view("|u1"))

    db = dpt.usm_ndarray(
        b.shape,
        dtype=b.dtype,
        buffer=usm_type,
        buffer_ctor_kwargs={"queue": queue},
    )
    db.usm_data.copy_from_host(b.reshape((-1)).view("|u1"))

    dc = dpt.usm_ndarray(
        got.shape,
        dtype=got.dtype,
        buffer=usm_type,
        buffer_ctor_kwargs={"queue": queue},
    )

    return da, db, dc


# ==========================================================================
def select_device_ndarray(N):
    a = np.array(np.random.random(N), np.float32)
    b = np.array(np.random.random(N), np.float32)

    got = np.ones_like(a)

    # This context manager is specifying to use the Opencl GPU.
    with numba_dpex.offload_to_sycl_device("opencl:gpu"):
        sum_kernel[N, 1](a, b, got)

    expected = a + b

    assert np.array_equal(got, expected)
    print("Correct result when numpy.ndarray is passed!")


def select_device_SUAI(N):
    usm_type = "device"

    a = np.array(np.random.random(N), np.float32)
    b = np.array(np.random.random(N), np.float32)
    got = np.ones_like(a)

    device = dpctl.SyclDevice("opencl:gpu")
    queue = dpctl.SyclQueue(device)

    # We are allocating the data in Opencl GPU and this device
    # will be selected for compute.
    da, db, dc = allocate_SUAI_data(a, b, got, usm_type, queue)

    # Users don't need to specify where the computation will
    # take place. It will be inferred from data.
    sum_kernel[N, 1](da, db, dc)

    dc.usm_data.copy_to_host(got.reshape((-1)).view("|u1"))

    expected = a + b

    assert np.array_equal(got, expected)
    print(
        "Correct result when array with __sycl_usm_array_interface__ is passed!"
    )


if __name__ == "__main__":
    select_device_ndarray(10)
    select_device_SUAI(10)