Devices¶

PyTorch has the capability to create or move tensors, modules and optimizers onto CPU or GPU devices. Internally, PyTorch has a more varied set of devices than are generally used in the python or c++ interface; the main device choices are CPU or Nvidia GPU’s with compute capability >= 3.7 as of version 2.0.1.

Recent versions of PyTorch also support MPS (Metal Performance Shaders), with a subset of the operations and datatypes that are implemented for CPU and CUDA devices. Progress on MPS is detailed in this tracking issue.

The k interface attempts to discover available devices during initialization; the available device(s) mapped to their random seeds can be displayed via the help() function.

q)help`device  /macbook with M2 chip
cpu  | 6921113472458783870
mps  | -5399364027288748321
mps:0| -5399364027288748321

q)help`device  /linux with 2 nvidia gpus
cpu   | 6755511332966579566
cuda  | 6868070473537856
cuda:0| 6868070473537856
cuda:1| 7372682086670203

q)seed 123

q)help`device
cpu   | 123
cuda  | 123
cuda:0| 123
cuda:1| 123

CUDA devices can be specified with an optional device index.

Specifiying `cuda without a device index implies the default CUDA device – typically `cuda:0, but would mean `cuda:1 if the default CUDA device were switched to the second GPU.

From a k session, the following functions deal with CPU and CUDA devices:

device - query the device for the session or allocated object, e.g. tensor, vector, module, etc.
cudadevice - query or set the default CUDA device if any available.
cudadevices - query the count or names of the available CUDA devices.
to - move previously allocated object to a different device.

Device¶

device() → sym¶: For any empty or null argument, returns `cuda if any CUDA devices available, `mps if an Apple MPS device found, else `cpu.

device(ptr) → sym

Parameters:: ptr (pointer) – a previously allocated api-pointer to a PyTorch object, e.g. a tensor, module, etc.
Returns:: sym indicating the specific device where object’s memory resides.

q)device() / on machine with 2 GPUs
`cuda

q)t:tensor 1 2 3
q)device t
`cpu

q)to(t;`cuda:1)
q)device t
`cuda:1

q)a:tensor(1 2;`cuda)
q)b:tensor(3 4;`cuda:1)
q)c:tensor(5 6 7;`cpu)
q)d:dict `a`b`c!(a;b;c)

q)device d
a| cuda:0
b| cuda:1
c| cpu

Default CUDA device¶

cudadevice() → sym¶

Returns:: Given an empty or null argument, returns the specific CUDA device that is used when the generic symbol `cuda is specified.

cudadevice(device) → null

Parameters:: device (sym) – a specific cuda device with index specified, e.g. `cuda:0
Returns:: null

q)cudadevice()
`cuda:0

q)cudadevice`cuda
'unrecognized CUDA device, expecting cuda with valid device number, e.g. `cuda:0
  [0]  cudadevice`cuda
       ^

q)cudadevice `cuda:1

q)t:tensor(1 2 3;`cuda)
q)device t
`cuda:1

Available CUDA devices¶

cudadevices() → syms¶

cudadevice(::) → long: For any empty list, the function returns a list of symbols of available CUDA devices, both specific and generic. For null argument, returns the number of CUDA devices.

q)cudadevices[]     / on host with 2 GPU's
2

q)cudadevices()
`cuda`cuda:0`cuda:1

Moving to device¶

Once a PyTorch object is established on a device, it can be moved with to(). The typical case is to create a tensor or module on the host, then move to a CUDA device. This k interface function is designed to behave somewhat like PyTorch’s tensor.to() and module.to() methods.

to(ptr;options) → null¶

to(ptr;options;async) → null

Parameters:

ptr (ptr) – a previously allocated api-pointer to a tensor, vector, dictionary or module.
options (sym) – one or more symbols for device, data type and other tensor attributes.
async (bool) – asynchronous flag, default is false. If true, will attempt to perform host to CUDA device transfer without blocking.

Returns:

Null return, supplied pointer now has specified data type, memory, device, etc.

An alternate form uses an example tensor instead of specified options to define the target device and data type.

to(ptr;example) → null

to(ptr;example;async) → null

Parameters:

ptr (tensor) – a previously allocated api-pointer to a tensor.
example (tensor) – an api-pointer to a previously allocated tensor whose device and datatype will be used.
async (bool) – asynchronous flag, default is false. If true, will attempt to perform host to CUDA device transfer without blocking.

Returns:

null, supplied pointer now has same device and data type as given example tensor.

q)a:options t:tensor 1 2 3    / create tensor of longs on cpu
q)ptr t                       / get internal PyTorch shared pointer to tensor
60520816

q)to(t;`cuda`double`grad)     / convert to CUDA tensor on default GPU, type double
q)ptr t                       / new interal pointer, k interface ptr is unchanged
1814122272

q)(a;options t)               / compare options to verify the change
device dtype  layout  gradient pin      memory
--------------------------------------------------
cpu    long   strided nograd   unpinned contiguous
cuda:0 double strided grad     unpinned contiguous

q)to(t;`cuda`double`grad)     / call to() again
q)ptr t                       / same internal ptr -- tensor attributes unchanged
1814122272

q)e:tensor()  / empty tensor
q)to(e;t)     / use t as an example tensor

q)options e
device  | cuda:0     / device changed
dtype   | double     / data type changed
layout  | strided
gradient| nograd     / gradient unset (only device & dtype from example tensor)
pin     | unpinned
memory  | contiguous

Copy to device¶

For tensors only, copyto() will make a copy of the current tensor with new datatype and/or new device and other given charasteristics (this is somewhat equivalent to PyTorch’s tensor.to method with copy=True).

copyto(ptr;options) → new tensor pointer¶

copyto(ptr;options;async) → new tensor pointer

Parameters:

ptr (ptr) – a previously allocated api-pointer to a tensor.
options (sym) – one or more symbols for device, data type and other tensor attributes.
async (bool) – asynchronous flag, default is false. If true, will attempt to perform host to CUDA device transfer without blocking.

Returns:

An api-pointer to the new tensor.

An alternate form uses an example tensor instead of specified options to define the target device and data type.

copyto(ptr;example) → ptr

copyto(ptr;example;async) → ptr

Parameters:

ptr (pointer) – a previously allocated api-pointer to a tensor.
example (pointer) – an api-pointer to a previously allocated tensor whose device and datatype will be used to create the new copy of the input tensor.
async (bool) – asynchronous flag, default is false. If true, will attempt to perform host to CUDA device transfer without blocking.

Returns:

An api-pointer to the new tensor.

q)t:tensor 1 2 3 4#til 24

q)r:copyto(t; `cuda`float`channel2d`grad)

q)options each(t;r)
device dtype layout  gradient pin      memory
-------------------------------------------------
cpu    long  strided nograd   unpinned contiguous
cuda:0 float strided grad     unpinned channel2d

Synchronize¶

PyTorch provides a synchronize call to wait for all kernels in all streams on a CUDA device to complete.

The k api function sync() provides a similar capabilty:

sync() → null¶

sync(device) → null

sync(device index) → null

Parameters:

device (symbol) – A valid CUDA device, e.g. `cuda or `cuda:1
index (long) – A valid CUDA device index, e.g. 1 for `cuda:1

Returns:

Waits for all streams to complete on given device/device index. If null or empty argument, uses default CUDA device if available. If a valid but non-CUDA device supplied, no action is taken. Returns null.

In addition to using default device, device name/index, it is also possible to specify a tensor or collection of tensors, a module or a model as an argument to sync(). For a collection of tensors, the first CUDA device found for the tensors is used for the synchronization. For a module or module, the first paramater stored on a CUDA device is used to provide the device for the synchronize step.

sync(tensor) → null

sync(vector) → null

sync(dictionary) → null

sync(module) → null

sync(model) → null

q)sync()
q)sync`cuda
q)sync`cuda:1
q)sync 1

q)t:tensor(1 2 3e;`cuda:1)
q)sync t

q)m:module enlist(`linear;64;10)
q)to(m;`cuda)
q)sync m

Devices¶

Device¶

Default CUDA device¶

Available CUDA devices¶

Moving to device¶

Copy to device¶

Synchronize¶

Docs

Examples

Github