numba_dpex.kernel_api¶
The kernel_api module provides a set of Python classes and functions that are analogous to the C++ SYCL API. The kernel_api module is meant to allow prototyping SYCL-like kernels in pure Python before compiling them using numba_dpex.
Overview¶
Analogue to the sycl::atomic_ref class. |
|
Analogue to the sycl::group class. |
|
Analogue to the sycl::item class. |
|
Analogue to the sycl::nd_item class. |
|
Analogue to the sycl::local_accessor class. |
|
Analogue of SYCL address space classes. |
|
Analogue of sycl::memory_order enumeration. |
|
Analogue of sycl::memory_scope enumeration. |
|
An array that gets allocated on the private memory of a work-item. |
|
Analogue to the sycl::nd_range class. |
|
Analogue to the sycl::range class. |
|
Performs a memory fence operations across all work-items. |
|
Performs a barrier operation across all work-items in a work-group. |
|
Mocks the launching of a kernel function over either a Range or NdRange. |
Classes¶
- class AtomicRef(ref, index, memory_order=MemoryOrder.RELAXED, memory_scope=MemoryScope.DEVICE, address_space=None)¶
Analogue to the sycl::atomic_ref class.
An atomic reference is a view into a data container that can be then updated atomically using any of the
fetch_*
member functions of the class.Overview
¶ fetch_add
(val)Adds the operand
val
to the object referenced by the AtomicReffetch_sub
(val)Subtracts the operand
val
to the object referenced by thefetch_min
(val)Calculates the minimum value of the operand
val
and the objectfetch_max
(val)Calculates the maximum value of the operand
val
and the objectfetch_and
(val)Calculates the bitwise AND of the operand
val
and the objectfetch_or
(val)Calculates the bitwise OR of the operand
val
and the objectfetch_xor
(val)Calculates the bitwise XOR of the operand
val
and the objectload
()Loads the value of the object referenced by the AtomicRef.
store
(val)Stores operand
val
to the object referenced by the AtomicRef.exchange
(val)Replaces the value of the object referenced by the AtomicRef
compare_exchange
(expected, desired, expected_idx)Compares the value of the object referenced by the AtomicRef
Members
- fetch_add(val)¶
Adds the operand
val
to the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.- Parameters:
val – Value to be added to the object referenced by the AtomicRef.
Returns: The original value of the object referenced by the AtomicRef.
- fetch_sub(val)¶
Subtracts the operand
val
to the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.- Parameters:
val – Value to be subtracted from the object referenced by the AtomicRef.
Returns: The original value of the object referenced by the AtomicRef.
- fetch_min(val)¶
Calculates the minimum value of the operand
val
and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.- Parameters:
val – Value to be compared against the object referenced by the AtomicRef.
Returns: The original value of the object referenced by the AtomicRef.
- fetch_max(val)¶
Calculates the maximum value of the operand
val
and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.- Parameters:
val – Value to be compared against the object referenced by the AtomicRef.
Returns: The original value of the object referenced by the AtomicRef.
- fetch_and(val)¶
Calculates the bitwise AND of the operand
val
and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.- Parameters:
val – Value to be bitwise ANDed against the object referenced by the AtomicRef.
Returns: The original value of the object referenced by the AtomicRef.
- fetch_or(val)¶
Calculates the bitwise OR of the operand
val
and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.- Parameters:
val – Value to be bitwise ORed against the object referenced by the AtomicRef.
Returns: The original value of the object referenced by the AtomicRef.
- fetch_xor(val)¶
Calculates the bitwise XOR of the operand
val
and the object referenced by the AtomicRef and assigns the result to the value of the referenced object. Returns the original value of the object.- Parameters:
val – Value to be bitwise XORed against the object referenced by the AtomicRef.
Returns: The original value of the object referenced by the AtomicRef.
- load()¶
Loads the value of the object referenced by the AtomicRef.
Returns: The value of the object referenced by the AtomicRef.
- store(val)¶
Stores operand
val
to the object referenced by the AtomicRef.- Parameters:
val – Value to be stored in the object referenced by the AtomicRef.
- exchange(val)¶
Replaces the value of the object referenced by the AtomicRef with value of
val
. Returns the original value of the referenced object.- Parameters:
val – Value to be exchanged against the object referenced by the AtomicRef.
Returns: The original value of the object referenced by the AtomicRef.
- compare_exchange(expected, desired, expected_idx=0)¶
Compares the value of the object referenced by the AtomicRef against the value of
expected[expected_idx]
. If the values are equal, replaces the value of the referenced object with the value ofdesired
. Otherwise assigns the original value of the referenced object toexpected[expected_idx]
.- Parameters:
expected – Array containing the expected value of the object referenced by the AtomicRef.
desired – Value that replaces the value of the object referenced by the AtomicRef.
expected_idx – Offset in expected array where the expected
present. (value of the object referenced by the AtomicRef is)
- Returns:
True
if the comparison operation and replacement operation were successful.
- class Group(global_range: Range, local_range: Range, group_range: Range, index: list)¶
Analogue to the sycl::group class.
Represents a particular work-group within a parallel execution and provides API to extract various properties of the work-group. An instance of the class is not user-constructible. Users should use
numba_dpex.kernel_api.NdItem.get_group()
to access the Group to which a work-item belongs.Overview
¶ get_group_id
(dim)Returns a specific coordinate of the multi-dimensional index of a group.
Returns a linearized version of the work-group index.
get_group_range
(dim)Returns the extent of the range of groups in an nd-range for given dimension.
Returns the total number of work-groups in the nd_range.
get_local_range
(dim)Returns the extent of the range of work-items in a work-group for given dimension.
Return the total number of work-items in the work-group.
Members
- get_group_id(dim)¶
Returns a specific coordinate of the multi-dimensional index of a group.
Since the work-items in a work-group have a defined position within the global nd-range, the returned group id can be used along with the local id to uniquely identify the work-item in the global nd-range.
- Parameters:
dim (int) – An integral value between (1..3) for which the group index is returned.
- Returns:
The coordinate for the
dim
dimension for the group’s multi-dimensional index within an nd-range.- Return type:
int
- Raises:
ValueError – If the
dim
argument is not in the (1..3) interval.
- get_group_linear_id()¶
Returns a linearized version of the work-group index.
- Returns:
The linearized index for the group’s position within an nd-range.
- Return type:
int
- get_group_range(dim)¶
Returns the extent of the range of groups in an nd-range for given dimension.
- Parameters:
dim (int) – An integral value between (1..3) for which the group index is returned.
- Returns:
The extent of group range for the specified dimension.
- Return type:
int
- get_group_linear_range()¶
Returns the total number of work-groups in the nd_range.
- Returns:
Returns the number of groups in a parallel execution of an nd-range kernel.
- Return type:
int
- get_local_range(dim)¶
Returns the extent of the range of work-items in a work-group for given dimension.
- Parameters:
dim (int) – An integral value between (1..3) for which the group index is returned.
- Returns:
The extent of the local work-item range for the specified dimension.
- Return type:
int
- get_local_linear_range()¶
Return the total number of work-items in the work-group.
- Returns:
Returns the linearized size of the local range inside an nd-range.
- Return type:
int
- class Item(extent: Range, index: list)¶
Analogue to the sycl::item class.
Identifies the work-item in a parallel execution of a kernel launched with the
Range
index-space class.Overview
¶ Returns the linear id associated with this item for all dimensions.
get_id
(idx)Get the id for a specific dimension.
Return the total number of work-items in the work-group.
get_range
(idx)Get the range size for a specific dimension.
Members
- get_linear_id()¶
Returns the linear id associated with this item for all dimensions.
- Returns:
The linear id of the work item in the global range.
- Return type:
int
- get_id(idx)¶
Get the id for a specific dimension.
- Returns:
The id
- Return type:
int
- get_linear_range()¶
Return the total number of work-items in the work-group.
- get_range(idx)¶
Get the range size for a specific dimension.
- Returns:
The size
- Return type:
int
- class NdItem(global_item: Item, local_item: Item, group: Group)¶
Analogue to the sycl::nd_item class.
Identifies an instance of the function object executing at each point in an
NdRange
.Overview
¶ get_global_id
(idx)Get the global id for a specific dimension.
Get the linearized global id for the item for all dimensions.
get_local_id
(idx)Get the local id for a specific dimension.
Get the local linear id associated with this item for all
get_global_range
(idx)Get the global range size for a specific dimension.
get_local_range
(idx)Get the local range size for a specific dimension.
Return the total number of work-items in the work-group.
Return the total number of work-items in the work-group.
Returns the group.
Members
- get_global_id(idx)¶
Get the global id for a specific dimension.
- Returns:
The global id
- Return type:
int
- get_global_linear_id()¶
Get the linearized global id for the item for all dimensions.
- Returns:
The global linear id.
- Return type:
int
- get_local_id(idx)¶
Get the local id for a specific dimension.
- Returns:
The local id
- Return type:
int
- get_local_linear_id()¶
Get the local linear id associated with this item for all dimensions.
- Returns:
The local linear id.
- Return type:
int
- get_global_range(idx)¶
Get the global range size for a specific dimension.
- Returns:
The size
- Return type:
int
- get_local_range(idx)¶
Get the local range size for a specific dimension.
- Returns:
The size
- Return type:
int
- get_local_linear_range()¶
Return the total number of work-items in the work-group.
- get_global_linear_range()¶
Return the total number of work-items in the work-group.
- get_group()¶
Returns the group.
- Returns:
A group object.
- class LocalAccessor(shape, dtype)¶
Analogue to the sycl::local_accessor class.
The class acts as a proxy to allocating device local memory and accessing that memory from within a
numba_dpex.kernel()
decorated function.
- class AddressSpace¶
Bases:
numba_dpex.kernel_api.flag_enum.FlagEnum
Analogue of SYCL address space classes.
The integer values of the enums is kept consistent with the corresponding implementation in dpcpp.
Overview
Members
- PRIVATE = 0¶
- GLOBAL = 1¶
- CONSTANT = 2¶
- LOCAL = 3¶
- GENERIC = 4¶
- class MemoryOrder¶
Bases:
numba_dpex.kernel_api.flag_enum.FlagEnum
Analogue of sycl::memory_order enumeration.
The integer values of the enums is kept consistent with the corresponding implementation in dpcpp.
Overview
Members
- RELAXED = 0¶
- ACQUIRE = 1¶
- CONSUME_UNSUPPORTED = 2¶
- RELEASE = 3¶
- ACQ_REL = 4¶
- SEQ_CST = 5¶
- class MemoryScope¶
Bases:
numba_dpex.kernel_api.flag_enum.FlagEnum
Analogue of sycl::memory_scope enumeration.
The integer values of the enums is kept consistent with the corresponding implementation in dpcpp.
Overview
Members
- WORK_ITEM = 0¶
- SUB_GROUP = 1¶
- WORK_GROUP = 2¶
- DEVICE = 3¶
- SYSTEM = 4¶
- class PrivateArray(shape, dtype, fill_zeros=False)¶
An array that gets allocated on the private memory of a work-item.
The class should be used to allocate small arrays on the private per-work-item memory for fast accesses inside a kernel. It is similar in intent to the sycl::private_memory class but is not a direct analogue.
- class NdRange(global_size, local_size)¶
Analogue to the sycl::nd_range class.
The NdRange defines the index space for a work group as well as the global index space. It is passed to parallel_for to execute a kernel on a set of work items.
This class basically contains two Range object, one for the global_range and the other for the local_range. The global_range parameter contains the global index space and the local_range parameter contains the index space of a work group. This class mimics the behavior of sycl::nd_range class.
Overview
¶ Returns a Range defining the index space.
Returns a Range defining the index space of a work group.
Members
- get_global_range()¶
Returns a Range defining the index space.
- Returns:
A Range object defining the index space.
- Return type:
- class Range¶
Bases:
tuple
Analogue to the sycl::range class.
The range is an abstraction that describes the number of elements in each dimension of buffers and index spaces. It can contain 1, 2, or 3 numbers, depending on the dimensionality of the object it describes.
This is just a wrapper class on top of a 3-tuple. The kernel launch parameter is consisted of three int’s. This class basically mimics the behavior of sycl::range.
Overview
Members
- get(index)¶
Returns the range of a single dimension.
- Parameters:
index (int) – The index of the dimension, i.e. [0,2]
- Returns:
The range of the dimension indexed by index.
- Return type:
int
- size()¶
Returns the size of a range.
Returns the size of a range by multiplying the range of the individual dimensions.
- Returns:
The size of a range.
- Return type:
int
Functions¶
- atomic_fence(memory_order: MemoryOrder, memory_scope: MemoryScope)¶
Performs a memory fence operations across all work-items.
The function is equivalent to the
sycl::atomic_fence
function and controls the order of memory accesses (loads and stores) by individual work-items.Important
The function is a no-op during CPython execution and only available in JIT compiled mode of execution.
- Parameters:
memory_order (MemoryOrder) – The memory synchronization order.
memory_scope (MemoryScope) – The set of work-items and devices to which the memory ordering constraints apply.
- group_barrier(group: Group, fence_scope: MemoryScope = MemoryScope.WORK_GROUP)¶
Performs a barrier operation across all work-items in a work-group.
The function is equivalent to the
sycl::group_barrier
function. It synchronizes work within a group of work-items. All the work-items of the group must execute the barrier call before any work-item continues execution beyond the barrier.The
group_barrier
performs a memory fence operation ensuring that memory accesses issued before the barrier are not re-ordered with those issued after the barrier. All work-items in group G execute a release fence prior to synchronizing at the barrier, all work-items in group G execute an acquire fence afterwards, and there is an implicit synchronization of these fences as if provided by an explicit atomic operation on an atomic object.Important
The function is not implemented yet for pure CPython execution and is only supported in JIT compiled mode of execution.
- Parameters:
group (Group) – Indicates the work-group inside which the barrier is to be executed.
fence_scope (MemoryScope) (optional) – scope of any memory consistency operations that are performed by the barrier.
- Raises:
NotImplementedError – When the function is called directly from Python.
- call_kernel(kernel_fn, index_range: Range | NdRange, *kernel_args)¶
Mocks the launching of a kernel function over either a Range or NdRange.
Important
The function is meant to be used only during prototyping a kernel_api function in Python. To launch a JIT compiled kernel, the
numba_dpex.core.kernel_launcher.call_kernel()
function should be used.- Parameters:
kernel_fn – A callable function object written using
numba_dpex.kernel_api
.index_range (Range|NdRange) – An instance of a Range or an NdRange object
kernel_args (List) – The expanded list of actual arguments with which to launch the kernel execution.
- Raises:
ValueError – If the first positional argument is not callable.
ValueError – If the second positional argument is not a Range or an Ndrange object