dpctl.memory

Data Parallel Control Memory

dpctl.memory provides Python objects for untyped USM memory container of bytes for each kind of USM pointers: shared pointers, device pointers and host pointers.

Shared and host pointers are accessible from both host and a device, while device pointers are only accessible from device.

Python objects corresponding to shared and host pointers implement Python simple buffer protocol. It is therefore possible to use these objects to maniputalate USM memory using NumPy or bytearray, memoryview, or array.array classes.

Classes

class dpctl.memory.MemoryUSMDevice

MemoryUSMDevice(nbytes, alignment=0, queue=None, copy=False) allocates nbytes of USM device memory.

Non-positive alignments are not used (malloc_device is used instead). For the queue=None cast the dpctl.SyclQueue() is used to allocate memory.

MemoryUSMDevice(usm_obj) constructor create instance from usm_obj expected to implement __sycl_usm_array_interface__ protocol and exposing a contiguous block of USM memory of USM device type. Using copy=True to perform a copy if USM type is other than ‘device’.

copy_from_device()

Copy SYCL memory underlying the argument object into the memory of the instance

copy_from_host()

Copy content of Python buffer provided by obj to instance memory.

copy_to_host()

Copy content of instance’s memory into memory of obj, or allocate NumPy array of obj is None

get_usm_type()
nbytes
reference_obj
size
tobytes()

Constructs bytes object populated with copy of USM memory

class dpctl.memory.MemoryUSMHost

MemoryUSMHost(nbytes, alignment=0, queue=None, copy=False) allocates nbytes of USM host memory.

Non-positive alignments are not used (malloc_host is used instead). For the queue=None case dpctl.SyclQueue() is used to allocate memory.

MemoryUSMDevice(usm_obj) constructor create instance from usm_obj expected to implement __sycl_usm_array_interface__ protocol and exposing a contiguous block of USM memory of USM host type. Using copy=True to perform a copy if USM type is other than ‘host’.

copy_from_device()

Copy SYCL memory underlying the argument object into the memory of the instance

copy_from_host()

Copy content of Python buffer provided by obj to instance memory.

copy_to_host()

Copy content of instance’s memory into memory of obj, or allocate NumPy array of obj is None

get_usm_type()
nbytes
reference_obj
size
tobytes()

Constructs bytes object populated with copy of USM memory

class dpctl.memory.MemoryUSMShared

MemoryUSMShared(nbytes, alignment=0, queue=None, copy=False) allocates nbytes of USM shared memory.

Non-positive alignments are not used (malloc_shared is used instead). For the queue=None cast the dpctl.SyclQueue() is used to allocate memory.

MemoryUSMShared(usm_obj) constructor create instance from usm_obj expected to implement __sycl_usm_array_interface__ protocol and exposing a contiguous block of USM memory of USM shared type. Using copy=True to perform a copy if USM type is other than ‘shared’.

copy_from_device()

Copy SYCL memory underlying the argument object into the memory of the instance

copy_from_host()

Copy content of Python buffer provided by obj to instance memory.

copy_to_host()

Copy content of instance’s memory into memory of obj, or allocate NumPy array of obj is None

get_usm_type()
nbytes
reference_obj
size
tobytes()

Constructs bytes object populated with copy of USM memory

Comparison with Rapids Memory Manager (RMM)

RMM implements DeviceBuffer which is Cython native class wrapping around something similar to std::vector<unsigned char, custom_cuda_allocator (calls resource manager)> which is called device_buffer.

DeviceBuffer stores a unique pointer to an instance of this class. DeviceBuffer implements __cuda_array_interface__. Direct constructors always allocate new memory and copy provided inputs into the newly allocated array.

Zero-copy construction is possible from a unique_ptr<device_ buffer>, with the ownership being moved to the Cython extension instance.

DeviceBuffer provides __reduce__ method to support pickling (which works by copying content of the device buffer to host) and provides the following set of routines, among others:

  • copy_to_host(host_buf_obj) to copy content of the underlying device_buffer to a host buffer

  • copy_from_host(host_buf_obf) to copy content of the host buffer into memory of underlying device_buffer

  • copy_from_device(cuda_ary_obj) to copy device memory underlying cuda_ary_obj Python object implementing __cuda_array_interface__ to the memory underlying DeviceBuffer instance.

RMM’s methods are declared nogil.