dpctl.utils¶
- dpctl.utils.get_execution_queue(qs, /)¶
Get execution queue from queues associated with input arrays.
- Parameters:
qs (List[
dpctl.SyclQueue], Tuple[dpctl.SyclQueue]) – a list or a tuple ofdpctl.SyclQueueobjects corresponding to arrays that are being combined.- Returns:
execution queue under compute follows data paradigm, or
Noneif queues are not equal.- Return type:
- dpctl.utils.get_coerced_usm_type(usm_types, /)¶
Get USM type of the output array for a function combining arrays of given USM types using compute-follows-data execution model.
- dpctl.utils.validate_usm_type(usm_type, allow_none=True)¶
Raises an exception if usm_type is invalid.
- Parameters:
usm_type –
Specification for USM allocation type. Valid specifications are:
"device""shared""host"
If
allow_nonekeyword argument is set, a value ofNoneis also permitted.allow_none (bool, optional) – Whether
usm_typevalue ofNoneis considered valid. Default: True.
- Raises:
ValueError – if
usm_typeis not a recognized string.TypeError – if
usm_typeis not a string, andusm_typeis notNoneprovidedallow_noneisTrue.
- dpctl.utils.onetrace_enabled()[source]¶
Enable
onetracecollection for kernels executed in this context.Note
Proper working of this utility assumes that Python interpreter has been launched by
onetraceorunitracetool from project intel/pti-gpu.- Example:
Launch the Python interpreter using onetrace tool:
$ onetrace --conditional-collection -v -t --demangle python app.py
Now using the context manager in the Python sessions enables data collection and its output for every offloaded kernel:
import dpctl.tensor as dpt from dpctl.utils import onetrace_enabled # onetrace output reporting on execution of the kernel # should be seen, starting with "Device Timeline" with onetrace_enabled(): a = dpt.arange(100, dtype='int16')
Sample output:
>>> with onetrace_enabled(): ... a = dpt.arange(100, dtype='int16') ... Device Timeline (queue: 0x555aee86bed0): dpctl::tensor::kernels::constructors::linear_sequence_step_kernel<short>[SIMD32 {1; 1; 1} {100; 1; 1}]<1.1> [ns] = 44034325658 (append) 44034816544 (submit) 44036296752 (start) 44036305918 (end) >>>
- dpctl.utils.intel_device_info(sycl_device)[source]¶
For Intel(R) GPU devices returns a dictionary with device architectural details, and an empty dictionary otherwise. The dictionary contains the following keys:
- device_id:
32-bits device PCI identifier
- gpu_eu_count:
Total number of execution units
- gpu_hw_threads_per_eu:
Number of thread contexts in EU
- gpu_eu_simd_width:
Physical SIMD width of EU
- gpu_slices:
Total number of slices
- gpu_subslices_per_slice:
Number of sub-slices per slice
- gpu_eu_count_per_subslice:
Number of EUs in subslice
- max_mem_bandwidth:
Maximum memory bandwidth in bytes/second
- free_memory:
Global memory available on the device in units of bytes
Unsupported descriptors are omitted from the dictionary.
Descriptors other than the PCI identifier are supported only for
SyclDeviceswith Level-Zero backend.Note
Environment variable
ZES_ENABLE_SYSMANmay need to be set to1for the"free_memory"key to be reported.
- exception dpctl.utils.ExecutionPlacementError¶
Exception raised when execution placement target can not be unambiguously determined from input arrays.
Make sure that input arrays are associated with the same
dpctl.SyclQueue, or migrate data to the samedpctl.SyclQueueusingdpctl.tensor.usm_ndarray.to_device()method.