Shortcuts

Devices

PyTorch has the capability to create or move tensors, modules and optimizers onto CPU or GPU devices. Internally, PyTorch has a more varied set of devices than are generally used in the python or c++ interface; the main device choices are CPU or Nvidia GPU’s with compute capability >= 3.7 as of version 2.0.1.

Recent versions of PyTorch also support MPS (Metal Performance Shaders), with a subset of the operations and datatypes that are implemented for CPU and CUDA devices. Progress on MPS is detailed in this tracking issue.

The k interface attempts to discover available devices during initialization; the available device(s) mapped to their random seeds can be displayed via the help() function.

q)help`device  /macbook with M2 chip
cpu  | 6921113472458783870
mps  | -5399364027288748321
mps:0| -5399364027288748321

q)help`device  /linux with 2 nvidia gpus
cpu   | 6755511332966579566
cuda  | 6868070473537856
cuda:0| 6868070473537856
cuda:1| 7372682086670203

q)seed 123

q)help`device
cpu   | 123
cuda  | 123
cuda:0| 123
cuda:1| 123

CUDA devices can be specified with an optional device index.

Specifiying `cuda without a device index implies the default CUDA device – typically `cuda:0, but would mean `cuda:1 if the default CUDA device were switched to the second GPU.

From a k session, the following functions deal with CPU and CUDA devices:

  • device - query the device for the session or allocated object, e.g. tensor, vector, module, etc.

  • cudadevice - query or set the default CUDA device if any available.

  • cudadevices - query the count or names of the available CUDA devices.

  • to - move previously allocated object to a different device.

Device

device() sym
For any empty or null argument, returns `cuda if any CUDA devices available, `mps if an Apple MPS device found, else `cpu.
device(ptr) sym
Parameters:

ptr (pointer) – a previously allocated api-pointer to a PyTorch object, e.g. a tensor, module, etc.

Returns:

sym indicating the specific device where object’s memory resides.

q)device() / on machine with 2 GPUs
`cuda

q)t:tensor 1 2 3
q)device t
`cpu

q)to(t;`cuda:1)
q)device t
`cuda:1

q)a:tensor(1 2;`cuda)
q)b:tensor(3 4;`cuda:1)
q)c:tensor(5 6 7;`cpu)
q)d:dict `a`b`c!(a;b;c)

q)device d
a| cuda:0
b| cuda:1
c| cpu

Default CUDA device

cudadevice() sym
Returns:

Given an empty or null argument, returns the specific CUDA device that is used when the generic symbol `cuda is specified.

cudadevice(device) null
Parameters:

device (sym) – a specific cuda device with index specified, e.g. `cuda:0

Returns:

null

q)cudadevice()
`cuda:0

q)cudadevice`cuda
'unrecognized CUDA device, expecting cuda with valid device number, e.g. `cuda:0
  [0]  cudadevice`cuda
       ^

q)cudadevice `cuda:1

q)t:tensor(1 2 3;`cuda)
q)device t
`cuda:1

Available CUDA devices

cudadevices() syms
cudadevice(::) long
For any empty list, the function returns a list of symbols of available CUDA devices, both specific and generic. For null argument, returns the number of CUDA devices.
q)cudadevices[]     / on host with 2 GPU's
2

q)cudadevices()
`cuda`cuda:0`cuda:1

Moving to device

Once a PyTorch object is established on a device, it can be moved with to(). The typical case is to create a tensor or module on the host, then move to a CUDA device. This k interface function is designed to behave somewhat like PyTorch’s tensor.to() and module.to() methods.

to(ptr;options) null
to(ptr;options;async) null
Parameters:
  • ptr (ptr) – a previously allocated api-pointer to a tensor, vector, dictionary or module.

  • options (sym) – one or more symbols for device, data type and other tensor attributes.

  • async (bool) – asynchronous flag, default is false. If true, will attempt to perform host to CUDA device transfer without blocking.

Returns:

Null return, supplied pointer now has specified data type, memory, device, etc.

An alternate form uses an example tensor instead of specified options to define the target device and data type.

to(ptr;example) null
to(ptr;example;async) null
Parameters:
  • ptr (tensor) – a previously allocated api-pointer to a tensor.

  • example (tensor) – an api-pointer to a previously allocated tensor whose device and datatype will be used.

  • async (bool) – asynchronous flag, default is false. If true, will attempt to perform host to CUDA device transfer without blocking.

Returns:

null, supplied pointer now has same device and data type as given example tensor.

q)a:options t:tensor 1 2 3    / create tensor of longs on cpu
q)ptr t                       / get internal PyTorch shared pointer to tensor
60520816

q)to(t;`cuda`double`grad)     / convert to CUDA tensor on default GPU, type double
q)ptr t                       / new interal pointer, k interface ptr is unchanged
1814122272

q)(a;options t)               / compare options to verify the change
device dtype  layout  gradient pin      memory
--------------------------------------------------
cpu    long   strided nograd   unpinned contiguous
cuda:0 double strided grad     unpinned contiguous

q)to(t;`cuda`double`grad)     / call to() again
q)ptr t                       / same internal ptr -- tensor attributes unchanged
1814122272

q)e:tensor()  / empty tensor
q)to(e;t)     / use t as an example tensor

q)options e
device  | cuda:0     / device changed
dtype   | double     / data type changed
layout  | strided
gradient| nograd     / gradient unset (only device & dtype from example tensor)
pin     | unpinned
memory  | contiguous

Copy to device

For tensors only, copyto() will make a copy of the current tensor with new datatype and/or new device and other given charasteristics (this is somewhat equivalent to PyTorch’s tensor.to method with copy=True).

copyto(ptr;options) new tensor pointer
copyto(ptr;options;async) new tensor pointer
Parameters:
  • ptr (ptr) – a previously allocated api-pointer to a tensor.

  • options (sym) – one or more symbols for device, data type and other tensor attributes.

  • async (bool) – asynchronous flag, default is false. If true, will attempt to perform host to CUDA device transfer without blocking.

Returns:

An api-pointer to the new tensor.

An alternate form uses an example tensor instead of specified options to define the target device and data type.

copyto(ptr;example) ptr
copyto(ptr;example;async) ptr
Parameters:
  • ptr (pointer) – a previously allocated api-pointer to a tensor.

  • example (pointer) – an api-pointer to a previously allocated tensor whose device and datatype will be used to create the new copy of the input tensor.

  • async (bool) – asynchronous flag, default is false. If true, will attempt to perform host to CUDA device transfer without blocking.

Returns:

An api-pointer to the new tensor.

q)t:tensor 1 2 3 4#til 24

q)r:copyto(t; `cuda`float`channel2d`grad)

q)options each(t;r)
device dtype layout  gradient pin      memory
-------------------------------------------------
cpu    long  strided nograd   unpinned contiguous
cuda:0 float strided grad     unpinned channel2d

Synchronize

PyTorch provides a synchronize call to wait for all kernels in all streams on a CUDA device to complete.

The k api function sync() provides a similar capabilty:

sync() null
sync(device) null
sync(device index) null
Parameters:
  • device (symbol) – A valid CUDA device, e.g. `cuda or `cuda:1

  • index (long) – A valid CUDA device index, e.g. 1 for `cuda:1

Returns:

Waits for all streams to complete on given device/device index. If null or empty argument, uses default CUDA device if available. If a valid but non-CUDA device supplied, no action is taken. Returns null.

In addition to using default device, device name/index, it is also possible to specify a tensor or collection of tensors, a module or a model as an argument to sync(). For a collection of tensors, the first CUDA device found for the tensors is used for the synchronization. For a module or module, the first paramater stored on a CUDA device is used to provide the device for the synchronize step.

sync(tensor) null
sync(vector) null
sync(dictionary) null
sync(module) null
sync(model) null
q)sync()
q)sync`cuda
q)sync`cuda:1
q)sync 1

q)t:tensor(1 2 3e;`cuda:1)
q)sync t

q)m:module enlist(`linear;64;10)
q)to(m;`cuda)
q)sync m

Docs

Access documentation for k api to PyTorch

View Docs

Examples

Examples using the k api to PyTorch

Examples

Github

C++ library source code and q/k examples

Github