Initializing parameters¶
When modules are created, any parameters are given initial values: the default initializations can be overwritten using other probability distributions or heuristics that make for more stable training or quicker convergence. The k api implements the PyTorch initialization routines and some of the probability distributions used to reset initial parameter values.
zeros: reset tensor to zeros.
ones: reset tensor to ones.
fill: fill tensor with a single value.
eye: set 2d tensor to the identity matrix.
dirac: fill 3,4,5d tensor with the Dirac delta function.
orthogonal: fill tensor with a semi-orthogonal matrix.
knormal: Kaiming initialization using a normal distribution.
kuniform: Kaiming initialization using a uniform distribution.
snormal: fill 2d matrix as sparse, with non-sparse elements form a normal distribution.
xnormal: Xavier initialization using a normal distribution.
xuniform: Xavier initialization using a uniform distribution.
The normal & uniform probability distributions are included in the module initialization group as well as part of a broader group of distributions:
normal: fills tensor with values from the normal distribution, with optional mean & standard deviation.
uniform: fills tensor with values drawn from the uniform distribution with optional lower & upper bounds.
Additional distributions implemented in the k-api:
cauchy: samples from a Cauchy (Lorentz) distribution given median and half-width.
exponential: creates an exponential distribution parameterized by rate.
geometric: creates a geometric distribution given probability of success of Bernoulli trials.
lognormal: creates a log-normal distribution given mean & standard deviation of log of the distribution.
random: fills tensor with numbers sampled from the discrete uniform distribution with optional low & high limits.
Utility to calculate the recommended gain() value for given nonlinearity:
gain: given non-linearity name, e.g.
reluand optional parameter, returns factor used to scale standard deviation.
Note
The above initialization routines reset existing tensors with gradient calculations disabled.
Syntax¶
The initialization functions have a common syntax and accept a tensor or a collection of tensors, along with other arguments that typically override mean, standard deviation or some other property of the underlying distribution:
- fn(tensors) null¶
- fn(tensors; options..) null
- Parameters:
tensors (pointer) – an api-pointer to an existing tensor or vector/dictionary of tensors.
options (scalar) – typically numeric scalars, with symbol scalars used for Kaiming initialization.
- Returns:
The input tensor or collection of tensors are modified in place, null return.
q)t:tensor 3 4#0e
q)normal(t;2;.5)
q)tensor t
1.990077 1.99109 2.886662 1.551937
2.027538 1.058909 1.429988 1.673985
2.082229 3.062976 2.641823 2.190108
q)d:dict `a`b!(10#0e;4 2#0)
q)random(d;0;10)
q)dict d
a| 3 5 9 6 9 9 3 9 2 1e
b| (7 8;3 4;0 1;6 5)
q)v:vector(1 2 3e; 4 5#0e)
q)uniform(v)
q)vector v
0.02381 0.2233 0.7237e
(0.3632 0.4588 0.7514 0.4138 0.4747e;0.5428 0.1709 0.5703 0.04745 0.5007e;0.6..
Tensor/vector indices¶
Numeric indices (longs) can be used with a non-scalar tensor or a vector of tensors:
- fn(tensors; indices) null¶
- fn(tensors; indices; options..) null
- Parameters:
tensors (pointer) – an api-pointer to an existing tensor or vector of tensors.
indices (long) – a single index or set of indices into the first dimension of a tensor or a vector of tensors.
options (scalar) – typically numeric scalars, with symbol scalars used for Kaiming initialization.
- Returns:
The input tensor or vector of tensors are modified in place using given indices, null return.
q)t:tensor 3 4#0e
q)normal(t;0 2) / normal(0,1) distribution for first & final rows
q)tensor t
0.6594 -0.5249 1.596 -0.19
0 0 0 0
0.5576 0.6255 -0.2015 -0.6794
Using indices with a vector of tensors:
q)v:vector(98 99 100;10#0)
q)random(v;1;0;10) / random integers over interval 0,10)
q)vector v
98 99 100
3 5 5 0 7 5 2 7 5 6
Note
There is some ambiguity in an argument list with a single index or an index and partially specified distribution options: the initial scalar(s) are interpreted as distribution options unless given as a 1-element list.
q)v:vector(98 99 100.0; 10#.0; 5#.0)
q)random(v;2;5) / scalars 2 & 5 are used as the range for the random sample
q)vector v
4 2 4f
3 4 4 2 2 3 2 2 3 4f
3 2 2 2 2f
q)v:vector(98 99 100.0; 10#.0; 5#.0)
q)random(v; 1#2; 5) /index is enlisted to distinguish from option
q)vector v
98 99 100f
0 0 0 0 0 0 0 0 0 0f
2 4 0 2 1f
No ambiguity with a single index if all the distribution options are also specified:
q)v:vector(98 99 100.0; 10#.0; 5#.0)
q)random(v; 2; 0; 5)
q)vector v
98 99 100f
0 0 0 0 0 0 0 0 0 0f
3 3 4 0 3f
Tensor names¶
Tensor names can be used to index a subset of a dictionary of tensors. Parameter or buffer names must be supplied if a module or model is given as the leading argument:
- fn(tensors; names) null¶
- fn(tensors; names; options..) null
- Parameters:
tensors (pointer) – an api-pointer to an existing dictionary, module, model or optimizer.
names (symbol) – keys into the given dictionary or names of parameters/buffers in the supplied module.
options (scalar) – typically numeric scalars, with symbol scalars used for Kaiming initialization.
- Returns:
The named tensors, parameters or buffers are modified in place, null return.
q)p:parms m:module enlist(`linear;2;2)
q)dict p
weight| 0.5732 -0.2588 0.4686 0.398
bias | 0.4718 -0.6752
q)normal(p;`weight;0;.01)
q)dict p
weight| -0.01368 0.007652 -0.01319 -0.0006103
bias | 0.4718 -0.6752
q)zeros m / modules require parameter or buffer names
'zeros: not implemented for single module argument
[0] zeros m
^
q)zeros(m;`bias)
q)dict p
weight| -0.01368 0.007652 -0.01319 -0.0006103
bias | 0 0
Note
If a module has both a parameter and a buffer with the same name, only the parameter will be reset. Access to the buffer in this case will have to be via functions buffer() or buffers(), which search only the buffer namespaces.
Kaiming initialization¶
The Kaiming initialization functions, knormal() and kuniform() accept up to three options: the name of the non-linearity, the fan mode & slope of the rectifier (typically `leakyrelu).
- knormal(tensors) null¶
- kuniform(tensors) null¶
- knormal(tensors; nonlinearity; fanmode; slope) null
- kuniform(tensors; nonlinearity; fanmode; slope) null
- Parameters:
tensors (pointer) – an api-pointer to an existing tensor, vector or dictionary of tensors.
nonlinearity (symbol) – name of the non-linear function, e.g.
`reluor`leakyrelu, used to calculate standard deviation (normal distribution) or bounds (uniform distribution).fanmode (symbol) – one of
`faninor`fanoutto preserve the magnitude of the variance of the weights in the forward (in) or backwards (out) pass.slope (double) – the negative slope of the rectifier used after this layer, e.g. for
`leakyrelu.
- Returns:
The tensors are modified in place, null return.
Note
The symbol and double scalar options may be given in any order following the initial tensor specification.
q)t:tensor 3 4#0e
q)kuniform(t)
q)tensor t
0.7619 -0.2598 -0.8482 -1.068
0.4993 -0.8645 -0.2984 0.9116
-0.8443 0.6993 0.4329 1.043
q)kuniform(t;`leakyrelu;`fanout)
q)tensor t
-0.2935 0.7956 -1.237 -0.6511
-0.8029 1.043 -1.293 -0.9753
0.3985 0.8391 -0.6392 -0.0994
The Kaiming initialization functions may also be used with indices as the 2nd argument:
- knormal(tensors; indices) null¶
- knormal(tensors; indices; options..) null
- Parameters:
tensors (pointer) – an api-pointer to an existing tensor or vector of tensors; if tensor, indices select on 1st dimension.
indices (long) – the index or indices into the vector or 1st dimension of a given tensor, enlist scalar index to avoid confusion with other numeric argument.
q)t:tensor 2 3 4#0e
q)kuniform(t;1#1)
q)tensor(t;0)
0 0 0 0
0 0 0 0
0 0 0 0
q)tensor(t;1)
-0.8719 -1.073 -1.144 -0.6039
-0.3201 0.1402 -0.8489 -0.6861
0.151 -0.9593 0.02821 -0.191
The Kaiming initialization functions are also used with parameter/buffer names as the 2nd argument:
- knormal(tensors; names) null¶
- knormal(tensors; names; options..) null
- Parameters:
tensors (pointer) – an api-pointer to an existing dictionary or module.
names (symbol) – the name or names of dictionary tensors or module parameters/buffers, scalar names can be enlisted to avoid confusion with other scalar symbol arguments.
q)m:module(`sequential; enlist(`linear;`fc;2;2); enlist(`leakyrelu;`fn;.01))
q)p:parms m
q)dict p
fc.weight| 0.211 -0.5037 0.2513 0.03965
fc.bias | 0.08189 -0.04078
q)knormal(m;`fc.weight;`fanout;`relu;.01)
q)dict p
fc.weight| 1.063 0.9703 -0.1206 1.102
fc.bias | 0.08189 -0.04078
q)d:dict`fanout`relu!(2 3#0e;1 4#0e)
q)kuniform(d;`fanout)
q)dict d / both tensors reset, name interpreted as option
fanout| (0.8979 0.1735 1.717e;1.669 -0.468 1.427e)
relu | ,-1.817 1.026 2.206 0.0301e
q)d:dict`fanout`relu!(2 3#0e;1 4#0e)
q)kuniform(d;1#`fanout) / enlist to treat as key
q)dict d
fanout| (-0.2362 1.208 -0.4897e;-0.3338 -0.6536 -0.01446e)
relu | ,0 0 0 0e
Using k arrays¶
The initialization routines also accept k arrays as input, returning k arrays after the initialization is applied:
- fn(input) output¶
- fn(input; options..) output
- Parameters:
input (k-array) – a scalar, list or n-dim array
options (scalar) – typically numeric scalars, with symbol scalars used for Kaiming initialization.
- Returns:
An output array of the same shape and type as input, with initialization applied.
q)normal(3 4#0e)
1.144 0.03057 0.9454 -0.3712
-0.8005 0.4368 -0.2662 0.03962
2.15 -0.503 1.133 -0.2594
q)normal(3 4#0e;0;.01)
-0.002944 -0.01436 0.002313 -0.005135
-0.009178 -0.002622 0.01533 -0.01474
-0.002925 0.002028 0.01697 -0.002394
Scalar inputs may require enlisting to distinguish from scalar options:
q)random(0;0;9) /arg is read as a single 3-element list
6317635140054588591 5831672079708576995 3983176133206258450
q)random((1#0);0;9) /enlist value to interpret other orgs as lower & upper bounds
,3
q)random((1#0);0;9)
,6
q)random(0e;5) / scalar type of real distinguishes input from upper bound
4e
Calculating gain¶
Return the recommended gain value for the given nonlinearity function; this is the factor used to scale standard deviation.
- gain(nonlinearity) value¶
- gain(nonlinearity; factor) value
- Parameters:
nonlinearity (symbol) – name of the non-linear function, e.g.
`relu,`leakyrelu,`linear, etc.factor (double) – optional parameter or factor, e.g. negative slope for
`leakyrelu.
- Returns:
The recommended gain value (scalar double) for the given nonlinearity function.
q)s:`conv1d`conv2d`conv3d`convtranspose1d`convtranspose2d`convtranspose3d
q)s,:`linear`sigmoid`tanh`relu`leakyrelu
q)s!gain each s
conv1d | 1
conv2d | 1
conv3d | 1
convtranspose1d| 1
convtranspose2d| 1
convtranspose3d| 1
linear | 1
sigmoid | 1
tanh | 1.666667
relu | 1.414214
leakyrelu | 1.414143
q)gain(`leakyrelu;.5)
1.264911