numba_dpex.kernel_api

The kernel_api module provides a set of Python classes and functions that are analogous to the C++ SYCL API. The kernel_api module is meant to allow prototyping SYCL-like kernels in pure Python before compiling them using numba_dpex.

Overview

Classes

AtomicRef

Analogue to the sycl::atomic_ref class.

Group

Analogue to the sycl::group class.

Item

Analogue to the sycl::item class.

NdItem

Analogue to the sycl::nd_item class.

LocalAccessor

Analogue to the sycl::local_accessor class.

AddressSpace

Analogue of SYCL address space classes.

MemoryOrder

Analogue of sycl::memory_order enumeration.

MemoryScope

Analogue of sycl::memory_scope enumeration.

PrivateArray

An array that gets allocated on the private memory of a work-item.

NdRange

Analogue to the sycl::nd_range class.

Range

Analogue to the sycl::range class.

Function

atomic_fence(memory_order, memory_scope)

Performs a memory fence operations across all work-items.

group_barrier(group, fence_scope)

Performs a barrier operation across all work-items in a work-group.

call_kernel(kernel_fn, index_range, *kernel_args)

Mocks the launching of a kernel function over either a Range or NdRange.

Classes

class AtomicRef(ref, index, memory_order=MemoryOrder.RELAXED, memory_scope=MemoryScope.DEVICE, address_space=None)

Analogue to the sycl::atomic_ref class.

An atomic reference is a view into a data container that can be then updated atomically using any of the fetch_* member functions of the class.

Overview

Methods

fetch_add(val)

Adds the operand val to the object referenced by the AtomicRef

fetch_sub(val)

Subtracts the operand val to the object referenced by the

fetch_min(val)

Calculates the minimum value of the operand val and the object

fetch_max(val)

Calculates the maximum value of the operand val and the object

fetch_and(val)

Calculates the bitwise AND of the operand val and the object

fetch_or(val)

Calculates the bitwise OR of the operand val and the object

fetch_xor(val)

Calculates the bitwise XOR of the operand val and the object

load()

Loads the value of the object referenced by the AtomicRef.

store(val)

Stores operand val to the object referenced by the AtomicRef.

exchange(val)

Replaces the value of the object referenced by the AtomicRef

compare_exchange(expected, desired, expected_idx)

Compares the value of the object referenced by the AtomicRef

Members

fetch_add(val)

Adds the operand val to the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.

Parameters:

val – Value to be added to the object referenced by the AtomicRef.

Returns: The original value of the object referenced by the AtomicRef.

fetch_sub(val)

Subtracts the operand val to the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.

Parameters:

val – Value to be subtracted from the object referenced by the AtomicRef.

Returns: The original value of the object referenced by the AtomicRef.

fetch_min(val)

Calculates the minimum value of the operand val and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.

Parameters:

val – Value to be compared against the object referenced by the AtomicRef.

Returns: The original value of the object referenced by the AtomicRef.

fetch_max(val)

Calculates the maximum value of the operand val and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.

Parameters:

val – Value to be compared against the object referenced by the AtomicRef.

Returns: The original value of the object referenced by the AtomicRef.

fetch_and(val)

Calculates the bitwise AND of the operand val and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.

Parameters:

val – Value to be bitwise ANDed against the object referenced by the AtomicRef.

Returns: The original value of the object referenced by the AtomicRef.

fetch_or(val)

Calculates the bitwise OR of the operand val and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.

Parameters:

val – Value to be bitwise ORed against the object referenced by the AtomicRef.

Returns: The original value of the object referenced by the AtomicRef.

fetch_xor(val)

Calculates the bitwise XOR of the operand val and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.

Parameters:

val – Value to be bitwise XORed against the object referenced by the AtomicRef.

Returns: The original value of the object referenced by the AtomicRef.

load()

Loads the value of the object referenced by the AtomicRef.

Returns: The value of the object referenced by the AtomicRef.

store(val)

Stores operand val to the object referenced by the AtomicRef.

Parameters:

val – Value to be stored in the object referenced by the AtomicRef.

exchange(val)

Replaces the value of the object referenced by the AtomicRef with value of val. Returns the original value of the referenced object.

Parameters:

val – Value to be exchanged against the object referenced by the AtomicRef.

Returns: The original value of the object referenced by the AtomicRef.

compare_exchange(expected, desired, expected_idx=0)

Compares the value of the object referenced by the AtomicRef against the value of expected[expected_idx]. If the values are equal, replaces the value of the referenced object with the value of desired. Otherwise assigns the original value of the referenced object to expected[expected_idx].

Parameters:
  • expected – Array containing the expected value of the object referenced by the AtomicRef.

  • desired – Value that replaces the value of the object referenced by the AtomicRef.

  • expected_idx – Offset in expected array where the expected

  • present. (value of the object referenced by the AtomicRef is)

Returns: True if the comparison operation and replacement operation

were successful.

class Group(global_range: Range, local_range: Range, group_range: Range, index: list)

Analogue to the sycl::group class.

Represents a particular work-group within a parallel execution and provides API to extract various properties of the work-group. An instance of the class is not user-constructible. Users should use numba_dpex.kernel_api.NdItem.get_group() to access the Group to which a work-item belongs.

Overview

Methods

get_group_id(dim)

Returns a specific coordinate of the multi-dimensional index of a group.

get_group_linear_id()

Returns a linearized version of the work-group index.

get_group_range(dim)

Returns the extent of the range of groups in an nd-range for given dimension.

get_group_linear_range()

Returns the total number of work-groups in the nd_range.

get_local_range(dim)

Returns the extent of the range of work-items in a work-group for given dimension.

get_local_linear_range()

Return the total number of work-items in the work-group.

Members

get_group_id(dim)

Returns a specific coordinate of the multi-dimensional index of a group.

Since the work-items in a work-group have a defined position within the global nd-range, the returned group id can be used along with the local id to uniquely identify the work-item in the global nd-range.

Parameters:

dim (int) – An integral value between (1..3) for which the group index is returned.

Returns:

The coordinate for the dim dimension for the group’s multi-dimensional index within an nd-range.

Return type:

int

Raises:

ValueError – If the dim argument is not in the (1..3) interval.

get_group_linear_id()

Returns a linearized version of the work-group index.

Returns:

The linearized index for the group’s position within an nd-range.

Return type:

int

get_group_range(dim)

Returns the extent of the range of groups in an nd-range for given dimension.

Parameters:

dim (int) – An integral value between (1..3) for which the group index is returned.

Returns:

The extent of group range for the specified dimension.

Return type:

int

get_group_linear_range()

Returns the total number of work-groups in the nd_range.

Returns:

Returns the number of groups in a parallel execution of an nd-range kernel.

Return type:

int

get_local_range(dim)

Returns the extent of the range of work-items in a work-group for given dimension.

Parameters:

dim (int) – An integral value between (1..3) for which the group index is returned.

Returns:

The extent of the local work-item range for the specified dimension.

Return type:

int

get_local_linear_range()

Return the total number of work-items in the work-group.

Returns:

Returns the linearized size of the local range inside an nd-range.

Return type:

int

class Item(extent: Range, index: list)

Analogue to the sycl::item class.

Identifies the work-item in a parallel execution of a kernel launched with the Range index-space class.

Overview

Methods

get_linear_id()

Returns the linear id associated with this item for all dimensions.

get_id(idx)

Get the id for a specific dimension.

get_linear_range()

Return the total number of work-items in the work-group.

get_range(idx)

Get the range size for a specific dimension.

Members

get_linear_id()

Returns the linear id associated with this item for all dimensions.

Returns:

The linear id of the work item in the global range.

Return type:

int

get_id(idx)

Get the id for a specific dimension.

Returns:

The id

Return type:

int

get_linear_range()

Return the total number of work-items in the work-group.

get_range(idx)

Get the range size for a specific dimension.

Returns:

The size

Return type:

int

class NdItem(global_item: Item, local_item: Item, group: Group)

Analogue to the sycl::nd_item class.

Identifies an instance of the function object executing at each point in an NdRange.

Overview

Methods

get_global_id(idx)

Get the global id for a specific dimension.

get_global_linear_id()

Get the linearized global id for the item for all dimensions.

get_local_id(idx)

Get the local id for a specific dimension.

get_local_linear_id()

Get the local linear id associated with this item for all

get_global_range(idx)

Get the global range size for a specific dimension.

get_local_range(idx)

Get the local range size for a specific dimension.

get_local_linear_range()

Return the total number of work-items in the work-group.

get_global_linear_range()

Return the total number of work-items in the work-group.

get_group()

Returns the group.

Members

get_global_id(idx)

Get the global id for a specific dimension.

Returns:

The global id

Return type:

int

get_global_linear_id()

Get the linearized global id for the item for all dimensions.

Returns:

The global linear id.

Return type:

int

get_local_id(idx)

Get the local id for a specific dimension.

Returns:

The local id

Return type:

int

get_local_linear_id()

Get the local linear id associated with this item for all dimensions.

Returns:

The local linear id.

Return type:

int

get_global_range(idx)

Get the global range size for a specific dimension.

Returns:

The size

Return type:

int

get_local_range(idx)

Get the local range size for a specific dimension.

Returns:

The size

Return type:

int

get_local_linear_range()

Return the total number of work-items in the work-group.

get_global_linear_range()

Return the total number of work-items in the work-group.

get_group()

Returns the group.

Returns:

A group object.

class LocalAccessor(shape, dtype)

Analogue to the sycl::local_accessor class.

The class acts as a proxy to allocating device local memory and accessing that memory from within a numba_dpex.kernel() decorated function.

class AddressSpace

Bases: numba_dpex.kernel_api.flag_enum.FlagEnum

Analogue of SYCL address space classes.

The integer values of the enums is kept consistent with the corresponding implementation in dpcpp.

Overview

Attributes

PRIVATE

-

GLOBAL

-

CONSTANT

-

LOCAL

-

GENERIC

-

Members

PRIVATE = 0
GLOBAL = 1
CONSTANT = 2
LOCAL = 3
GENERIC = 4
class MemoryOrder

Bases: numba_dpex.kernel_api.flag_enum.FlagEnum

Analogue of sycl::memory_order enumeration.

The integer values of the enums is kept consistent with the corresponding implementation in dpcpp.

Overview

Members

RELAXED = 0
ACQUIRE = 1
CONSUME_UNSUPPORTED = 2
RELEASE = 3
ACQ_REL = 4
SEQ_CST = 5
class MemoryScope

Bases: numba_dpex.kernel_api.flag_enum.FlagEnum

Analogue of sycl::memory_scope enumeration.

The integer values of the enums is kept consistent with the corresponding implementation in dpcpp.

Overview

Members

WORK_ITEM = 0
SUB_GROUP = 1
WORK_GROUP = 2
DEVICE = 3
SYSTEM = 4
class PrivateArray(shape, dtype, fill_zeros=False)

An array that gets allocated on the private memory of a work-item.

The class should be used to allocate small arrays on the private per-work-item memory for fast accesses inside a kernel. It is similar in intent to the sycl::private_memory class but is not a direct analogue.

class NdRange(global_size, local_size)

Analogue to the sycl::nd_range class.

The NdRange defines the index space for a work group as well as the global index space. It is passed to parallel_for to execute a kernel on a set of work items.

This class basically contains two Range object, one for the global_range and the other for the local_range. The global_range parameter contains the global index space and the local_range parameter contains the index space of a work group. This class mimics the behavior of sycl::nd_range class.

Overview

Methods

get_global_range()

Returns a Range defining the index space.

get_local_range()

Returns a Range defining the index space of a work group.

Members

get_global_range()

Returns a Range defining the index space.

Returns:

A Range object defining the index space.

Return type:

Range

get_local_range()

Returns a Range defining the index space of a work group.

Returns:

A Range object to specify index space of a work group.

Return type:

Range

class Range

Bases: tuple

Analogue to the sycl::range class.

The range is an abstraction that describes the number of elements in each dimension of buffers and index spaces. It can contain 1, 2, or 3 numbers, depending on the dimensionality of the object it describes.

This is just a wrapper class on top of a 3-tuple. The kernel launch parameter is consisted of three int’s. This class basically mimics the behavior of sycl::range.

Overview

Methods

get(index)

Returns the range of a single dimension.

size()

Returns the size of a range.

Members

get(index)

Returns the range of a single dimension.

Parameters:

index (int) – The index of the dimension, i.e. [0,2]

Returns:

The range of the dimension indexed by index.

Return type:

int

size()

Returns the size of a range.

Returns the size of a range by multiplying the range of the individual dimensions.

Returns:

The size of a range.

Return type:

int

Functions

atomic_fence(memory_order: MemoryOrder, memory_scope: MemoryScope)

Performs a memory fence operations across all work-items.

The function is equivalent to the sycl::atomic_fence function and controls the order of memory accesses (loads and stores) by individual work-items.

Important

The function is a no-op during CPython execution and only available in JIT compiled mode of execution.

Parameters:
  • memory_order (MemoryOrder) – The memory synchronization order.

  • memory_scope (MemoryScope) – The set of work-items and devices to which the memory ordering constraints apply.

group_barrier(group: Group, fence_scope: MemoryScope = MemoryScope.WORK_GROUP)

Performs a barrier operation across all work-items in a work-group.

The function is equivalent to the sycl::group_barrier function. It synchronizes work within a group of work-items. All the work-items of the group must execute the barrier call before any work-item continues execution beyond the barrier.

The group_barrier performs a memory fence operation ensuring that memory accesses issued before the barrier are not re-ordered with those issued after the barrier. All work-items in group G execute a release fence prior to synchronizing at the barrier, all work-items in group G execute an acquire fence afterwards, and there is an implicit synchronization of these fences as if provided by an explicit atomic operation on an atomic object.

Important

The function is not implemented yet for pure CPython execution and is only supported in JIT compiled mode of execution.

Parameters:
  • group (Group) – Indicates the work-group inside which the barrier is to be executed.

  • fence_scope (MemoryScope) (optional) – scope of any memory consistency operations that are performed by the barrier.

Raises:

NotImplementedError – When the function is called directly from Python.

call_kernel(kernel_fn, index_range: Range | NdRange, *kernel_args)

Mocks the launching of a kernel function over either a Range or NdRange.

Important

The function is meant to be used only during prototyping a kernel_api function in Python. To launch a JIT compiled kernel, the numba_dpex.core.kernel_launcher.call_kernel() function should be used.

Parameters:
  • kernel_fn – A callable function object written using numba_dpex.kernel_api.

  • index_range (Range|NdRange) – An instance of a Range or an NdRange object

  • kernel_args (List) – The expanded list of actual arguments with which to launch the kernel execution.

Raises:
  • ValueError – If the first positional argument is not callable.

  • ValueError – If the second positional argument is not a Range or an Ndrange object