DLPack exchange of USM allocated arrays

DLPack overview

DLPack is a commonly used C-ABI compatible data structure that allows data exchange between major frameworks. DLPack strives to be minimal, intentionally leaves allocators API and device API out of scope.

Data shared via DLPack are owned by the producer who provides a deleter function stored in the DLManagedTensor, and are only accessed by consumer. Python semantics of using the structure is explained in dlpack docs.

DLPack specifies data location in memory via void * data field of DLTensor struct, and via DLDevice device field. The DLDevice struct has two members: an enumeration device_type and an integer device_id.

DLPack recognizes enumeration value DLDeviceType::kDLOneAPI reserved for sharing SYCL USM allocations. It is not kDLSycl since importing USM-allocated tensor with this device type relies on oneAPI SYCL extensions sycl_ext_oneapi_filter_selector and sycl_ext_oneapi_default_platform_context to operate.

Exporting USM allocation to DLPack

When sharing USM allocation (of any sycl::usm::kind) with void * ptr bound to sycl::context ctx:

Protocol for exporting USM allocation as DLPack
// Input: void *ptr:
//             USM allocation pointer
//        sycl::context ctx:
//             context the pointer is bound to

// Get device where allocation was originally made
// Keep in mind, the device may be a sub-device
const sycl::device &ptr_dev = sycl::get_pointer_device(ptr, ctx);

#if SYCL_EXT_ONEAPI_DEFAULT_CONTEXT
const sycl::context &default_ctx = ptr_dev.get_platform().ext_oneapi_get_default_context();
#else
static_assert(false, "ext_oneapi_default_context extension is required");
#endif

// Assert that ctx is the default platform context, or throw
if (ctx != default_ctx) {
    throw pybind11::type_error(
        "Can not export USM allocations not "
        "bound to default platform context."
    );
}

// Find parent root device if ptr_dev is a sub-device
const sycl::device &parent_root_device = get_parent_root_device(ptr_dev);

// find position of parent_root_device in sycl::get_devices
const auto &all_root_devs = sycl::device::get_devices();
auto beg = std::begin(all_root_devs);
auto end = std::end(all_root_devs);
auto selectot_fn = [parent_root_device](const sycl::device &root_d) -> bool {
    return parent_root_device == root_d;
};
auto pos = find_if(beg, end, selector_fn);

if (pos == end) {
    throw pybind11::type_error("Could not produce DLPack: failed finding device_id");
}
std::ptrdiff_t dev_idx = std::distance(beg, pos);

// check that dev_idx can fit into int32_t if needed
int32_t device_id = static_cast<int32_t>(dev_idx);

// populate DLTensor with DLDeviceType::kDLOneAPI and computed device_id

Importing DLPack with device_type == kDLOneAPI

Protocol for recognizing DLPack as a valid USM allocation
// Input: ptr = dlm_tensor->dl_tensor.data
//        device_id = dlm_tensor->dl_tensor.device.device_id

// Get root_device from device_id
const auto &device_vector = sycl::get_device();
const sycl::device &root_device = device_vector.at(device_id);

// Check if the backend of the device is supported by consumer
//    Perhaps for certain backends (CUDA, hip, etc.) we should dispatch
//    different dlpack importers

// alternatively
// sycl::device root_device = sycl::device(
//       sycl::ext::oneapi::filter_selector{ std::to_string(device_id)}
// );

// Get default platform context
#if SYCL_EXT_ONEAPI_DEFAULT_CONTEXT
const sycl::context &default_ctx = root_device.get_platform().ext_oneapi_get_default_context();
#else
static_assert(false, "ext_oneapi_default_context extension is required");
#endif

// Check that pointer is known in the context
const sycl::usm::kind &alloc_type = sycl::get_pointer_type(ptr, ctx);

if (alloc_type == sycl::usm::kind::unknown) {
    throw pybind11::type_error(
        "Data pointer in DLPack is not bound to the "
        "default platform context of specified device"
    );
}

// Perform check that USM allocation type is supported by consumer if needed

// Get sycl::device where the data was allocated
const sycl::device &ptr_dev = sycl::get_pointer_device(ptr, ctx);

// Create object of consumer's library from ptr, ptr_dev, ctx

Support of DLPack with kDLOneAPI device type

dpctl supports DLPack v0.8. Exchange of USM allocations made using Level-Zero backend is supported with torch.Tensor(device='xpu') for PyTorch when using intel-extension-for-pytorch, as well as for TensorFlow when intel-extension-for-tensorflow is used.