dpctl.utils
¶
- dpctl.utils.get_execution_queue(qs, /)¶
Get execution queue from queues associated with input arrays.
- Parameters:
qs (List[
dpctl.SyclQueue
], Tuple[dpctl.SyclQueue
]) – a list or a tuple ofdpctl.SyclQueue
objects corresponding to arrays that are being combined.- Returns:
execution queue under compute follows data paradigm, or
None
if queues are not equal.- Return type:
- dpctl.utils.get_coerced_usm_type(usm_types, /)¶
Get USM type of the output array for a function combining arrays of given USM types using compute-follows-data execution model.
- dpctl.utils.validate_usm_type(usm_type, allow_none=True)¶
Raises an exception if usm_type is invalid.
- Parameters:
usm_type –
Specification for USM allocation type. Valid specifications are:
"device"
"shared"
"host"
If
allow_none
keyword argument is set, a value ofNone
is also permitted.allow_none (bool, optional) – Whether
usm_type
value ofNone
is considered valid. Default: True.
- Raises:
ValueError – if
usm_type
is not a recognized string.TypeError – if
usm_type
is not a string, andusm_type
is notNone
providedallow_none
isTrue
.
- dpctl.utils.onetrace_enabled()[source]¶
Enable
onetrace
collection for kernels executed in this context.Note
Proper working of this utility assumes that Python interpreter has been launched by
onetrace
orunitrace
tool from project intel/pti-gpu.- Example:
Launch the Python interpreter using onetrace tool:
$ onetrace --conditional-collection -v -t --demangle python app.py
Now using the context manager in the Python sessions enables data collection and its output for every offloaded kernel:
import dpctl.tensor as dpt from dpctl.utils import onetrace_enabled # onetrace output reporting on execution of the kernel # should be seen, starting with "Device Timeline" with onetrace_enabled(): a = dpt.arange(100, dtype='int16')
Sample output:
>>> with onetrace_enabled(): ... a = dpt.arange(100, dtype='int16') ... Device Timeline (queue: 0x555aee86bed0): dpctl::tensor::kernels::constructors::linear_sequence_step_kernel<short>[SIMD32 {1; 1; 1} {100; 1; 1}]<1.1> [ns] = 44034325658 (append) 44034816544 (submit) 44036296752 (start) 44036305918 (end) >>>
- dpctl.utils.intel_device_info(sycl_device)[source]¶
For Intel(R) GPU devices returns a dictionary with device architectural details, and an empty dictionary otherwise. The dictionary contains the following keys:
- device_id:
32-bits device PCI identifier
- gpu_eu_count:
Total number of execution units
- gpu_hw_threads_per_eu:
Number of thread contexts in EU
- gpu_eu_simd_width:
Physical SIMD width of EU
- gpu_slices:
Total number of slices
- gpu_subslices_per_slice:
Number of sub-slices per slice
- gpu_eu_count_per_subslice:
Number of EUs in subslice
- max_mem_bandwidth:
Maximum memory bandwidth in bytes/second
- free_memory:
Global memory available on the device in units of bytes
Unsupported descriptors are omitted from the dictionary.
Descriptors other than the PCI identifier are supported only for
SyclDevices
with Level-Zero backend.Note
Environment variable
ZES_ENABLE_SYSMAN
may need to be set to1
for the"free_memory"
key to be reported.
- exception dpctl.utils.ExecutionPlacementError¶
Exception raised when execution placement target can not be unambiguously determined from input arrays.
Make sure that input arrays are associated with the same
dpctl.SyclQueue
, or migrate data to the samedpctl.SyclQueue
usingdpctl.tensor.usm_ndarray.to_device()
method.