Heterogeneous Systems and Programming Concepts¶
This section introduces the basic concepts defined by SYCL standard
for programming heterogeneous system, and used by dpctl
.
Note
For SYCL-level details, refer to a more topical SYCL reference, such as the SYCL 2020 spec.
Definitions¶
- Heterogeneous computing
Refers to computing on multiple devices in a program.
- Host
Every program starts by running on a host, and most of the lines of code in a program, in particular lines of code implementing the Python interpreter itself, are usually for the host. Hosts are customarily CPUs.
- Device
A device is a processing unit connected to a host that is programmable with a specific device driver. Different types of devices can have different architectures (CPUs, GPUs, FPGA, ASICs, DSP) but are programmable using the same oneAPI programming model.
- Platform
Platform is an abstraction to represent a collection of devices addressable by the same lower-level framework. As multiple devices of the same type can programmed by the same framework, a platform may contain multiple devices. The same physical hardware (for example, GPU) may be programmable by different lower-level frameworks, and hence be enumerated as part of different platforms. For example, the same GPU hardware can be listed as an OpenCL* GPU device and a Level-Zero* GPU device.
- Context
Holds the runtime information needed to operate on a device or a group of devices from the same platform. Contexts are relatively expensive to create and should be reused as much as possible.
- Queue
A queue is needed to schedule the execution of any computation or data copying on the device. Queue construction requires specifying a device and a context targeting that device as well as additional properties, such as whether profiling information should be collected or submitted tasks are executed in the order in which they were submitted.
- Event
An event holds information related to computation/data movement operation scheduled for execution on a queue, such as its execution status as well as profiling information if the queue the task was submitted to allowed for collection of such information. Events can be used to specify task dependencies as well as to synchronize host and devices.
- Unified Shared Memory
Unified Shared Memory (USM) refers to pointer-based device memory management. USM allocations are bound to context. It means, a pointer representing USM allocation can be unambiguously mapped to the data it represents only if the associated context is known. USM allocations are accessible by computational kernels that are executed on a device, provided that the allocation is bound to the same context that is used to construct the queue where the kernel is scheduled for execution.
Depending on the capability of the device, USM allocations can be:
Name |
Host accessible |
Device accessibility |
---|---|---|
Device allocation |
No |
Refers to an allocation in host memory that is accessible from a device. |
Shared allocation |
Yes |
Accessible by both the host and device. |
Host allocation |
Yes |
Accessible by both the host and device. |
Runtime manages synchronization of the host’s and device’s view into shared allocations. The initial placement of the shared allocations is not defined.
- Backend
Refers to the implementation of oneAPI programming model using a lower-level heterogeneous programming API. Amongst examples of backends are “cuda”, “hip”, “level_zero”, “opencl”. In particular backend implements a platform abstraction.
Platform¶
A platform abstracts one or more SYCL devices that are connected to a host and can be programmed by the same underlying framework.
The dpctl.SyclPlatform
class represents a platform and
abstracts the sycl::platform SYCL runtime class.
To obtain all platforms available on a system programmatically, use
dpctl.lsplatform()
function. Refer to Enumerating available devices
for more information.
It is possible to select devices from specific backend, and hence belonging to
the same platform, by using
ONEAPI_DEVICE_SELECTOR
environment variable, or by using
a filter selector string.
Context¶
A context is an entity that is associated with the state of device as managed by the backend. The context is required to map unified address space pointer to the device where it was allocated unambiguously.
In order for two DPC++-based Python extensions to share USM allocations, e.g. as part of DLPack exchange, they each must use the same SYCL context when submitting for execution programs that would access this allocation.
Since sycl::context
is dynamically constructed by each extension sharing a USM allocation,
in general, requires sharing the sycl::context
along with the USM pointer, as it is done
in __sycl_usm_array_interface__
attribute.
Since DLPack itself does not provide for storing of the sycl::context
, the proper
working of dpctl.tensor.from_dlpack()
function is only supported for devices of those
platforms that support default platform context SYCL extension sycl_ext_oneapi_default_platform_context,
and only of those allocations that are bound to this default context.
To query where a particular device dev
belongs to a platform that implements
the default context, check whether dev.sycl_platform.default_context
returns an instance
of dpctl.SyclContext
or raises an exception.
Queue¶
SYCL queue is an entity associated with scheduling computational tasks for execution on a targeted SYCL device and using some specific SYCL context.
Queue constructor generally requires both to be specified. For platforms that support the default platform context, a shortcut queue constructor call that specifies only a device would use the default platform context associated with the platform given device is a part of.
>>> import dpctl
>>> d = dpctl.SyclDevice("gpu")
>>> q1 = dpctl.SyclQueue(d)
>>> q2 = dpctl.SyclQueue("gpu")
>>> q1.sycl_context == q2.sycl_context, q1.sycl_device == q2.sycl_device
(True, True)
>>> q1 == q2
False
Even through q1
and q2
instances of dpctl.SyclQueue
target the same device and use the same context
they do not compare equal, since they correspond to two independent scheduling entities.
Note
dpctl.tensor.usm_ndarray
objects one associated with q1
and another associated with q2
could not be combined in a call to the same function that implements
compute-follows-data programming model in dpctl.tensor
.
Event¶
A SYCL event is an entity created when a task is submitted to SYCL queue for execution. The events are used to order execution of computational tasks by the DPC++ runtime. They may also contain profiling information associated with the submitted task, provided the queue was created with “enable_profiling” property.
SYCL event can be used to synchronize execution of the associated task with execution on host by using
dpctl.SyclEvent.wait()
.
Methods dpctl.SyclQueue.submit_async()
and dpctl.SyclQueue.memcpy_async()
return
dpctl.SyclEvent
instances.
Note
At this point, dpctl.tensor
does not provide public API for accessing SYCL events associated with
submission of computation tasks implementing operations on dpctl.tensor.usm_ndarray
objects.
Backend¶
Intel(R) oneAPI Data Parallel C++ compiler ships with two backends:
OpenCL* backend
Level-Zero backend
Additional backends can be added to the compiler by installing CodePlay’s plugins:
CUDA backend: provided by oneAPI for NVIDIA(R) GPUs from CodePlay
HIP backend: provided by oneAPI for AMD GPUs from CodePlay
When building open source Intel LLVM compiler from source the project can be configured to enable different backends (see Get Started Guide for further details).