Queue
You need a queue to schedule the execution of any computation or data copying on a device.
The queue construction requires specifying:
Device
Context targeting the device
Additional properties, such as: * If profiling information should be collected * If submitted tasks are executed in the order, in which they are submitted
The dpctl.SyclQueue
class represents a queue and abstracts the
sycl::queue SYCL runtime class.
Types of Queues
SYCL has a task-based execution model. The order, in which a SYCL runtime executes a task on a target device, is specified by a sequence of events that must be completed before the execution of the task is allowed.
Submission of a task returns an event that you can use to further grow the graph of computational tasks. A SYCL queue stores the needed data to manage the scheduling operations.
There are two types of queues:
Out-of-order. Unless specified otherwise during the constriction of a queue, a SYCL runtime executes tasks, which dependencies are met in an unspecified order, with the possibility for some of the tasks to be executed concurrently.
In-order. You can specify SYCL queues to indicate that runtime must execute tasks in the order, in which they are submitted. In this case, tasks submitted to such a queue are never executed concurrently.
Creating a New Queue
dpctl.SyclQueue(ctx, dev, property=None)
creates a new queue instance
for the given compatible context and device.
To create the in-order queue, set a keyword parametr
to in_order
To dynamically collect task execution statistics in the returned event once the
associated task completes, set a keyword parametr
to enable_profiling
.
1import dpctl
2
3
4def create_queue_from_subdevice_multidevice_context():
5 """
6 Create a queue from a sub-device.
7 """
8 cpu_d = dpctl.SyclDevice("opencl:cpu:0")
9 try:
10 sub_devs = cpu_d.create_sub_devices(partition=2)
11 except dpctl.SyclSubDeviceCreationError:
12 print("Could not create sub device.")
13 print(f"{cpu_d} has {cpu_d.max_compute_units} compute units")
14 return
15 ctx = dpctl.SyclContext(sub_devs)
16 q = dpctl.SyclQueue(ctx, sub_devs[0], partition="enable_profiling")
17 print(
18 "Number of devices in SyclContext " "associated with the queue: ",
19 q.sycl_context.device_count,
20 )
21
A possible output for the Constructing SyclQueue from context and device example:
INFO: Executing example create_queue_from_subdevice_multidevice_context
Could not create sub device.
<dpctl.SyclDevice [backend_type.opencl, device_type.cpu, Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz] at 0x7f110e1bdbf0> has 2 compute units
INFO: ===========================
When a context is not specified, the sycl::queue constructor
from a device instance is called. Instead of an instance of
dpctl.SyclDevice
the argument dev can be a valid filter selector
string. In this case, the sycl::queue constructor with the
corresponding sycl::ext::oneapi::filter_selector
is called.
1import dpctl
2
3
4def create_queue_from_filter_selector():
5 """Create queue for a GPU device or,
6 if it is not available, for a CPU device.
7
8 Create in-order queue with profilign enabled.
9 """
10 q = dpctl.SyclQueue("gpu,cpu", property=("in_order", "enable_profiling"))
11 print("Queue {} is in order: {}".format(q, q.is_in_order))
12 # display the device used
13 print("Device targeted by the queue:")
14 q.sycl_device.print_device_info()
A possible output for the Constructing SyclQueue from filter selector example:
INFO: Executing example create_queue_from_filter_selector
Queue <dpctl.SyclQueue at 0x7f4006692700, property=['in_order', 'enable_profiling']> is in order: True
Device targeted by the queue:
Name Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Driver version 2023.16.7.0.21_160000
Vendor Intel(R) Corporation
Filter string opencl:cpu:0
INFO: ===========================
Profiling a Task Submitted to a Queue
The result of scheduling the execution of a task on a queue is an event. You can use an event for several purposes:
Query for the status of the task execution
Order execution of future tasks after it is completed
Wait for execution to complete
Сarry information to profile the task execution
The profiling information is only populated if the queue
used is created with the enable_profiling
property and only becomes available
after the task execution is complete.
The dpctl.SyclTimer
class implements a Python context manager.
You can use this context manager to collect cumulative profiling information for all the tasks submitted
to the queue of interest by functions executed within the context:
import dpctl import dpctl.tensor as dpt
q = dpctl.SyclQueue(property="enable_profiling") timer_ctx =
dpctl.SyclTimer() with timer_ctx(q):
X = dpt.arange(10**6, dtype=float, sycl_queue=q)
host_dt, device_dt = timer_ctx.dt
The timer leverages oneAPI enqueue_barrier SYCL
extension and submits a barrier at context entrance and a barrier at context
exit and records associated events. The elapsed device time is computed as
e_exit.profiling_info_start - e_enter.profiling_info_end
.