Module: deep_learning
Deep learning nodes.
These nodes can be used to implement deep learning workflows, including training and inference. The main node here is the Deep Model node, which is a simple "one-stop shop" for controlling the training and use of a neural network in a workflow, similarly to conventional machine learning nodes such as LinearDiscriminantAnalysis. This node receives a network (composed of, among others, Layer nodes and other math operations) and may also receive an optimizer step. However, one may also build deep learning workflows from scratch using the Net nodes, Step nodes, and the Gradient and/oror Jacobian nodes (found in the optimization category). The following sets of nodes are provided: - Nodes ending in Layer: these are the traditional neural network layers, which are characterized by containing implicit trainable parameters. - Nodes ending in Initializer: these can be used to initialize the parameters of a layer, but they are less frequently needed in practice, since the initializer can also be specified from a drop-down menu per node (as a string in Python). - Nodes ending in Norm: these are normalization stages that can be interspersed between layers. Some have (non-trainable) state, which needs to be explicitly managed when using low-level optimization primitives. - Node starting with Net: these ar ethe high-level network management nodes, which act on a whole network module (i.e. a set of layers). These are used to define a module, materialize or share it in a larger computational graph, and to obtain initialization and forward-pass functions to perform training. - nodes ending in Step: these are the optimization steps that can be used to train a network. There are two categories: end-to-end optimizer steps such as AdamStep, and partial gradient processing steps such as CenteringStep. - node with Core in the name: these pertain to recurrent cores, that is, the portions of networks that receive (part) of their past output as input. - nodes ending in Schedule: these are used to schedule the learning rate and other hyperparameters during training, which are typically annealed. - other nodes are stateless (pure math) operations that are frequently used in neural networks, e.g., pooling, activation functions, gradient, and so forth. Note that many other nodes from other categories, especially any mathematical operation nodes that have a "backend" parameter can be used in neural nets.
AMSGradStep
The AMSGrad optimizer step.
Based on Reddi et al, 2018, AMSGrad is a modification of the popular Adam optimizer, which improves the convergence of the algorithm (guarantees it) by using a long-term memory of past gradients. If Adam fails on a problem (e.g., diverges/explodes), AMSGrad is a useful thing to try. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.999
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling. Note that larger epsilon values have been explored in the literature.- verbose name: Epsilon
- default value: 1e-08
- port type: FloatPort
- value type: float (can be None)
-
epsilon_inroot
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling. A case where this is needed is when differentiating the optimizer itself, eg for bilevel optimization.- verbose name: Epsilon (Inside Root)
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
mu_precision
Numeric precision for the first-order accumulator. Keep resolves to the precision of the inputs.- verbose name: Mu Precision
- default value: keep
- port type: EnumPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
Activation
Apply an elementwise non-linear neural network activation function to the given data.
This node supports a number of commonly-used activation functions; which one performs best on the given data may have to be determined by experimentation. The most traditional functions are sigmoid and tanh, and the former is commonly used in the output layer of a network to produce a probability distribution over the possible classes. The relu (rectified linear unit) function is extremely compute efficient (it is max(0,x)) and can work quite well in hidden layers, but it has a number of potential shortcomings that limit its use as a general-purpose activation function (some of these are partially addressed in functions like leaky_relu and relu6). Some of the more recently developed activation functions are designed to work well with deep networks and are everywhere differentiable unlike relu; these include silu (aka swish), celu, selu, gelu, and softplus, and one can generally not go wrong with any of these. The softsign and elu functions are negative for negative inputs, which can have interesting effects on the network's learning behavior. The hard_* functions are simple (3-part) piecewise linear approximations of the activation functions they model. The 'linear' option is a pass-through to bypass the activation function. A few functions have a tunable parameter, which can be specified via the alpha parameter.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
activation
Activation function to apply. See https://jax.readthedocs.io/en/latest/jax.nn.html#activation-functions for details.- verbose name: Activation Function
- default value: relu
- port type: EnumPort
- value type: str (can be None)
-
alpha
Alpha value for elu, leaky_relu, and celu.- verbose name: Alpha (Elu, Leaky_relu, Celu)
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AdaBeliefStep
The AdaBelief optimizer step.
Based on Zhuang et al, 2020. This is a modified version of the popular Adam optimizer, and focuses on fast convergence, generalization, and stability. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. Must be provided. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.999
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling. Note that larger epsilon values have been explored in the literature.- verbose name: Epsilon
- default value: 1e-16
- port type: FloatPort
- value type: float (can be None)
-
epsilon_inroot
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling. A case where this is needed is when differentiating the optimizer itself, eg for bilevel optimization.- verbose name: Epsilon (Inside Root)
- default value: 1e-16
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AdaFactorStep
The AdaFactor optimizer step.
Based on Shazeer and Stern, 2018, this is an adaptive optimizer that is designed for fast training of large-scale networks (might be overkill the small networks usually used with biosignals). The approach is to saves memory by using a factored representation of the second moment gradient estimates, and only applies to matrix/tensor-shaped parameters that meet a minimum axis size (see the min_size_to_factor setting). Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port. The weight decay can be used in conjunction with a mask data structure that has the same nested structure as the weights being optimized, but which contains booleans indicating which weights should be decayed.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
weight_decay_mask
Mask structure for the weight decay (optional).- verbose name: Weight Decay Mask
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
learning_rate
Learning rate. Must be provided. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
min_size_to_factor
Only factor statistics if two dimensions of some weight are larger than this value.- verbose name: Min Size To Factor
- default value: 128
- port type: IntPort
- value type: int (can be None)
-
decay_rate
Controls second-moment exponential decay schedule.- verbose name: Decay Rate
- default value: 0.8
- port type: FloatPort
- value type: float (can be None)
-
decay_offset
Starting step when the fine-tuning phase begins.- verbose name: Decay Offset
- default value: 0
- port type: IntPort
- value type: int (can be None)
-
multiply_by_parameter_scale
Scale learning rate by parameter norm. If False, provided learning rate is absolute step size.- verbose name: Multiply By Parameter Scale
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
clipping_threshold
Optional gradient clipping threshold (norm). If set to None, this is disabled. This is per parameter vector/matrix.- verbose name: Optional Clipping Threshold
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
momentum
Optional exponential decay rate for momentum.- verbose name: Optional Momentum
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
momentum_precision
Numeric precision for the momentum buffer. Keep resolves to the precision of the inputs.- verbose name: Momentum Precision
- default value: keep
- port type: EnumPort
- value type: str (can be None)
-
weight_decay
Optional rate at which to decay weights. This is usually a small number, like e.g., 1e-4.- verbose name: Optional Weight Decay
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Regularization constant for RMS gradient.- verbose name: Epsilon
- default value: 1e-30
- port type: FloatPort
- value type: float (can be None)
-
factored
Whether to use factored second-moment estimates. This can be turned off to disable the factorization (e.g., to mimick a simpler optimizer).- verbose name: Factored
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AdagradStep
The Adagrad optimizer step.
Based on Duchi et al, 2011, Adagrad is one of the early successful optimizers for deep learning, and anneals the learning rate for each parameter over the course of training. One issue is that the learning rate eventually becomes so small that the optimizer stops learning. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
initial_accumulator_value
Initial value for the accumulator.- verbose name: Initial Accumulator Value
- default value: 0.1
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling.- verbose name: Epsilon
- default value: 1e-07
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AdamStep
The Adam optimizer step.
Based on Kingma and Ba, 2014, Adam is one of the most popular optimizers for deep learning due to its effectiveness given a wide variety of network topologies and training regimes, making it a good initial choice. Note that for best accuracy, the learning rate is often adapted using a schedule, which all optimizers support via the learning_rate_schedule port. Adam can suffer from failure to converge in some cases, which is addressed by some close relatives like AMSGrad and Yogi. Another potential issue is instability or large variance during early training, which can be addressed by using a warmup schedule for the learning rate, or by using a different optimizer, such as Novograd and RAdam. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.999
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling. Note that larger epsilon values have been explored in the literature.- verbose name: Epsilon
- default value: 1e-08
- port type: FloatPort
- value type: float (can be None)
-
epsilon_inroot
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling. A case where this is needed is when differentiating the optimizer itself, eg for bilevel optimization.- verbose name: Epsilon (Inside Root)
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
mu_precision
Numeric precision for the first-order accumulator. Keep resolves to the precision of the inputs.- verbose name: Mu Precision
- default value: keep
- port type: EnumPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AdamWStep
The AdamW optimizer step (adam with weight decay).
Based on Loshchilov et al, 2019 (but see the note about the weight decay parameter), AdamW is a variant of the popular Adam optimizer that additionally regularizes weights to have small l2 norm, which helps with generalization, especially when the amount of training data is low relative to the number of parameters. This is analogous to l2 regularization terms on the weights, but works better with adaptive gradient algorithms like Adam. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port. The weight decay can be used in conjunction with a mask data structure that has the same nested structure as the weights being optimized, but which contains booleans indicating which weights should be decayed.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
weight_decay_mask
Mask structure for the weight decay.- verbose name: Weight Decay Mask
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.999
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling. Note that larger epsilon values have been explored in the literature.- verbose name: Epsilon
- default value: 1e-08
- port type: FloatPort
- value type: float (can be None)
-
epsilon_inroot
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling. A case where this is needed is when differentiating the optimizer itself, eg for bilevel optimization.- verbose name: Epsilon (Inside Root)
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
weight_decay
Strength of the weight decay. This is multiplied by the learning rate as in e.g., PyTorch and Optax, but differs from the paper, where it is only multiplied by the schedule multiplier but not the base learning rate.- verbose name: Weight Decay
- default value: 0.0001
- port type: FloatPort
- value type: float (can be None)
-
mu_precision
Numeric precision for the first-order accumulator. Keep resolves to the precision of the inputs.- verbose name: Mu Precision
- default value: keep
- port type: EnumPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AdamaxStep
The Adamax optimizer step.
Based on Kingma and Ba, 2014, Adamax is a variant of the popular Adam optimizer that uses the infinity norm (max norm) for scaling, and which is therefore more conservative at controlling the gradients for unstable subsets of the weights. See also documentation on Adam for some additional considerations when using this family of optimizers. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.999
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling. Note that larger epsilon values have been explored in the literature.- verbose name: Epsilon
- default value: 1e-08
- port type: FloatPort
- value type: float (can be None)
-
epsilon_inroot
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling. A case where this is needed is when differentiating the optimizer itself, eg for bilevel optimization.- verbose name: Epsilon (Inside Root)
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
mu_precision
Numeric precision for the first-order accumulator. Keep resolves to the precision of the inputs.- verbose name: Mu Precision
- default value: keep
- port type: EnumPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AdamaxWStep
The AdamaxW optimizer step (adamax with weight decay).
Based on Loshchilov et al, 2019 (but see the note about the weight decay parameter), AdamaxW is a variant of the Adamax optimizer that additionally regularizes weights to have small l2 norm, which helps with generalization, especially when the amount of training data is low relative to the number of parameters. This is analogous to l2 regularization terms on the weights, but works better with adaptive gradient algorithms like Adam. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port. The weight decay can be used in conjunction with a mask data structure that has the same nested structure as the weights being optimized, but which contains booleans indicating which weights should be decayed.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
weight_decay_mask
Mask structure for the weight decay.- verbose name: Weight Decay Mask
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.999
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling. Note that larger epsilon values have been explored in the literature.- verbose name: Epsilon
- default value: 1e-08
- port type: FloatPort
- value type: float (can be None)
-
epsilon_inroot
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling. A case where this is needed is when differentiating the optimizer itself, eg for bilevel optimization.- verbose name: Epsilon (Inside Root)
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
weight_decay
Strength of the weight decay. This is multiplied by the learning rate as in e.g., PyTorch and Optax, but differs from the paper, where it is only multiplied by the schedule multiplier but not the base learning rate.- verbose name: Weight Decay
- default value: 0.0001
- port type: FloatPort
- value type: float (can be None)
-
mu_precision
Numeric precision for the first-order accumulator. Keep resolves to the precision of the inputs.- verbose name: Mu Precision
- default value: keep
- port type: EnumPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AdditiveNoiseAugmentation
Add univariate noise drawn from a given distribution to the data.
This simulates uniform random noise that applies the same to every channel, sample, instance, and so forth. A reasonable starting point is a normal distribution with a standard deviation of 25uV, but check the unit of your data to be sure that it is matched appropriately. As a special case, if our data is standardized or whitened, the noise values should be divided by at least 100. The optimal noise likely depends to a large extent on the nature and amount of available training data, so be prepared to experiment with a range of at least 10-50uV. Like most augmentation nodes, this node does not by itself amplify the amount of data, which therefore has to be done beforehand using, for example, the RepeatAlongAxis node. Also as with most augmentation nodes, you need to wire in a random seed (for example using the DrawRandomSeed node, see docs for more info) to ensure reproducibility. You also need to wire a distribution to the dist input to specify the distribution of interest (e.g., NormalDistribution).
Version 0.8.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: Packet (can be None)
- data direction: INOUT
-
seed
Random seed for deterministic results.- verbose name: Seed
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
dist
Distribution to use.- verbose name: Dist
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
is_training
Whether the node is used in training mode.- verbose name: Is Training
- default value: None
- port type: DataPort
- value type: bool (can be None)
- data direction: IN
-
bypass
Whether to bypass the augmentation and pass the input data through unchanged.- verbose name: Bypass
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AdditiveNoiseStep
Chainable step that adds Gaussian noise to the gradients.
This can improve convergence and mitigate overfitting in some deep networks.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
eta
Initial variance for the Gaussian noise added to gradients.- verbose name: Eta
- default value: 0.01
- port type: FloatPort
- value type: float (can be None)
-
gamma
A parameter controlling the annealing of noise over time, the variance decays according to (1+t)^-gamma.- verbose name: Gamma
- default value: 0.55
- port type: FloatPort
- value type: float (can be None)
-
seed
A seed for the pseudo-random number generation.- verbose name: Seed
- default value: 12345
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
AggregateStep
A modifier for the step node that is wired into it which accumulates k successive gradient evaluations and passes them to the optimizer for use in one summed (or averaged) update.
This can be used for things like multi-task learning (if the k successive gradients stem from multiple tasks that are visited round-robin), or for varying the batch size over the course of training (since k can be controlled by a schedule), or for simulating large-batch training with batch sizes that otherwise would not fit in memory. Note this is not simply chained after another step using the ChainedStep node, but rather it is a modifier of the step node that is wired into it.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
modify_step
Step to modify.- verbose name: Modify Step
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
num_ministeps_schedule
Optional schedule for the k parameter.- verbose name: Num Ministeps Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
num_ministeps
Number of successive updates to accumulate. If not provided, a schedule must be wired in. This is the k parameter in the documentation.- verbose name: Num Ministeps
- default value: None
- port type: IntPort
- value type: int (can be None)
-
reduction
Whether to sum or mean the updates.- verbose name: Reduction Operation
- default value: sum
- port type: EnumPort
- value type: str (can be None)
-
skip_ministep_if
Optional criteria under which to skip mini-steps as if they did not happen. Note that using this on multi-task problems may cause the stepping to go out of sync between tasks.- verbose name: Skip Mini-Step If
- default value: never
- port type: ComboPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
ApplyIfFiniteStep
A modifier of the step node that is wired into it, which prevents NaN or infinite or inf updates from going through unless the max_consecutive_errors has been exceeded, in which case the update goes through.
Note this is not simply chained after another step using the ChainedStep node, but rather it is a modifier of the step node that is wired into it. This should usually not be used in a chain since it may cause division by zero errors in subsequent steps (depending on the step type).
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
modify_step
Step to modify.- verbose name: Modify Step
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
max_consecutive_errors
Maximum number of consecutive errors to tolerate before letting the update go through.- verbose name: Max Consecutive Errors
- default value: 2
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
BatchFlatten
Flatten all axes of the input tensor except for the batch dimension (or more generally n leading dimensions).
This simplifies flattening the input to a layer without having to explicitly preserve the batch dimension.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
preserve_dims
Number of leading dimensions to preserve from the input shape. When using packet data as inputs, this will ensure that any instance axes are at the beginning of the data, and when left unspecified and will automatically preserve those axes; otherwise it will preserve the selected number of dimensions. The flattened axis will be a generic feature axis. For plain array inputs, this will instead default to 1, which preserves the usual leading batch dimension.- verbose name: Number Of Dimensions To Preserve
- default value: None
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
BatchNorm
Apply batch normalization to the given data.
Like most normalization nodes, batch norm is usually applied right before the activation function in a neural network, and may follow for example a convolutional layer or a dense layer. This will z-score the data using statistics aggregated over the selected axes, which notably includes the instance (aka "batch") axis and usually all other axes except the feature axis. Consequently this operation will normalize each feature independently by statistics that go across all instances in the batch. Like most normalizations, batch normalization typically includes a learned scale and bias parameter, separately per feature, and these can be optionally overridden with externally generated values (note that, in the rare case that there is more than one feature axis in the input, these leared parameters are only per each of the elements in the trailing feature axis; you can use the Fold Into Axis node beforehand to combine multiple feature axes into one to avoid this). When this node is used at test time, the node will by default use statistics that were accumulated during training using an exponential moving average (controlled by the decay factor parameter); this is the recommended default, but one may optionally set the local_stats_on_test option, in which case the statistics from the current test batch are used. Note that the effect of batch norm varies with the batch size, and the normalization can become noisy if the batch size is small. In such cases, one may consider using layer or group normalization, which instead aggregates over the feature axes but not over the instance axes in a batch. These alternative norms are particularly common in conjunction with RNNs or when processing large-format image or spatio-temporal data. If packet data is given, this node ensures that the instance axes come first and the feature axes come last.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
is_training
Whether the node is used in training mode.- verbose name: Is Training
- default value: None
- port type: DataPort
- value type: bool (can be None)
- data direction: IN
-
scale_init
Initializer for the trainable scale.- verbose name: Scale Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
bias_init
Initializer for the trainable bias.- verbose name: Bias Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
scale_prior
Optional prior distribution for the scale.- verbose name: Scale Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
bias_prior
Optional prior distribution for the bias.- verbose name: Bias Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
axes
Optional comma-separated list of axis names or indices over which to accumulate the normalization statistics. If unspecified, the statistics will be accumulated over all except the feature axis ("channels" axis in classic deep learning nomenclature). If an axis shall occur more than once, one may list it multiple times, or prefix the axis name with an asterisk to apply to all axes of this type. This parameter is not limited to the predefined options.- verbose name: Axes
- default value: (non-feature)
- port type: ComboPort
- value type: str (can be None)
-
decay_rate
Decay rate / momentum across subsequent mini-batches, for accumulating test-time statistics. Values close to 1 will result in slower decay and thus more stable statistics across the whole training set.- verbose name: Decay Rate
- default value: 0.99
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value to add to the variance to avoid division by zero.- verbose name: Epsilon
- default value: 1e-05
- port type: FloatPort
- value type: float (can be None)
-
learnable_scale
Whether to learn a trainable scale parameter. Batch normalization typically includes such a parameter in order to drive the subsequent activation function in a regime that is desirable for downstream computations (e.g., saturating or linear).- verbose name: Learnable Scale
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
learnable_bias
Whether to learn a trainable bias parameter. See the learnable scale for more details.- verbose name: Learnable Bias
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
data_format
Format of the input data. This is only respected when working with plain arrays and is ignored for packet data, which always normalizes the data to 'channels_last' layout. If 'channels_last', the data is assumed to be in the format ({batch}, ..., channels). If 'channels_first', the data is assumed to be in the format ({batch}, channels, ...).- verbose name: Array Data Format
- default value: auto
- port type: EnumPort
- value type: str (can be None)
-
scale_initializer
Choice of scale initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Scale Initializer
- default value: ones
- port type: ComboPort
- value type: str (can be None)
-
bias_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
local_stats_on_test
At test time, whether to scale the data by stats of the test batch only, or using the global statistics accumulated during training. Classic batch normalization will use training statistics during testing, and this is the recommended default; local test statistics may be used when performing predictions in sufficiently large batches (e.g., same size as the training batches) and when there is specific concern of covariate shift between training and test data. The value 'default' translates to 'false'.- verbose name: Use Local Stats On Test Data
- default value: false
- port type: EnumPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of the trainable parameters.- verbose name: Layer Name
- default value: batchnorm
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
BatchReshape
Reshape input tensor preserving the batch dimension.
This simplifies reshaping the input to a layer without having to explicitly preserve the batch dimension.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
output_shape
Desired shape for the input tensor while preserving the first preserve_dims dimensions, given as a list of integers. If -1 is given for a dimension, the size is inferred from the input data. If the input data is a packet, then you can also specify the shape as a list of axis names (e.g., space, time, frequency, feature, axis) along with the special tokens singletonaxis, flattenedaxis, and ..., which stand for inserted singleton dimensions, flattened dimensions, and unspecified dimensions, respectively. The generic axis name "axis" can be used to refer to any axis in the input. Any named axes must be already present in the input data, and will be moved to the desired place; be aware that any flattened axes are by default of type Axis (i.e., they are not feature axes). If a plain integer shape is given, unspecific axes (of type axis) are generated. If last_as_feature is checked, then the last axis is made a feature axis regardless of what it was before. If plain array data is passed in, only integer shapes are supported, and the data is reshaped to the desired integer shape.- verbose name: Output Shape
- default value: flattenedaxis
- port type: ComboPort
- value type: str (can be None)
-
last_as_feature
If True, the last axis of the output shape will be made a feature axis, regardless of what it was before. This only applies if packet data is provided.- verbose name: Last Axis As Feature
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
preserve_dims
Number of leading dimensions to preserve from the input shape. When using packet data as inputs, this will ensure that any instance axes are at the beginning of the data, and when left unspecified and will automatically preserve those axes; otherwise it will preserve the selected number of dimensions. For plain array inputs, this will instead default to 1, which preserves the usual leading batch dimension.- verbose name: Number Of Dimensions To Preserve
- default value: None
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
CenteringStep
Chainable step that centers the gradients (subtracts their mean).
This has been explored in Yong et al. (2020).
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
ChainedStep
Compose a sequence of gradient processing steps into a single update step.
This can be used to build either custom optimizers or to augment existing optimizer steps with additional steps (e.g. gradient clipping). IMPORTANT: note that, by convention, a full end-to-end optimizer step (e.g., as implemented by AdamStep) negates the gradient before returning it, yielding an additive weight update, so that it can be applied to weights using the Add node. Therefore, when you build a custom optimizer from raw gradient processing steps, you need to include a negation step (scale by -1) near or at the end of the chain so as to obtain a full end-to-end optimizer step that is swappable with other optimizers. If your chain includes one of the end-to-end steps (which already do the negation), however, this will not be necessary. The scaling typically happens last (usually implemented using ScalingStep, which additionally can also take a learning rate), unless you are also using a ConstraintStep, in which case that must be the last step in the chain. Tip: Almost all canonical end-to-end optimizers (those that are not described as chainable, e.g., AdamStep, RMSPropStep, etc.), consist of a core gradient scaling rule followed by optional additional steps such as weight decay, momentum, scaling by a learning rate (schedule), etc. and generally a negation. For some optimizers, the core rule is available as a separate node, but for others, it is not -- for these optimizers, you can generally obtain a step analogous to the core scaling rule by setting the learning rate to -1.0 and disabling all optional features of the optimizer (like momentum, weight decay, etc). This rule can then be combined with other steps in a sensible order and a final negation to form a customized optimizer.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
step1
Step 1.- verbose name: Step1
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
step2
Step 2.- verbose name: Step2
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
step3
Step 3.- verbose name: Step3
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
step4
Step 4.- verbose name: Step4
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
step5
Step 5.- verbose name: Step5
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
step6
Step 6.- verbose name: Step6
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
step7
Step 7.- verbose name: Step7
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
step8
Step 8.- verbose name: Step8
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
step9
Step 9.- verbose name: Step9
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
step0
Step 0.- verbose name: Step0
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
ConstantInitializer
An initializer that always returns the same constant value.
Can be used to initialize neural net weights to e.g., zeros or ones.
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
constant
Constant value.- verbose name: Constant
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
ConstantSchedule
A constant parameter schedule.
This is the simplest of all schedule node in NeuroPype as it simply emits a constant value. This can be used as a simple drop-in alternative when testing a more complicated schedule. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Constant value to output.- verbose name: Constant Value
- default value: 1.0
- port type: FloatPort
- value type: float
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. All schedule nodes have this parameter, but in case of a constant schedule it does nothing.- verbose name: Step Multiplier (Ignored)
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
ConstraintStep
Chainable step that allows parameters (or in some cases, gradients) to be constrained by projecting them into the desired form.
This node can be used to apply different types of constraints, including non- negativity and zeroing out NaNs. Generally this node should be the last one in the chain, and follow any negation step.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
projection
Optional projection operation.- verbose name: Projection
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
constraint
Constraint to apply. Non-negative maintains non-negative parameters, zeroing out any negative values. Zero-nans zeros out any NaN values from gradients. general-projection can be used to implement projected gradient descent, but in this case this node must be the final step in the chain, and must be preceded by the negative rescaling.- verbose name: Constraint
- default value: nonnegative
- port type: ComboPort
- value type: str (can be None)
-
param
Parameter of the constraint, if any.- verbose name: Param
- default value: None
- port type: Port
- value type: object (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
ConvolutionLayer
A 1/2/3/N-D standard convolution layer.
A convolution sweeps a learned kernel array over the input data, along one or more dimensions, using a given step size. For each valid position of the kernel, the node computes a "matching score" between the data and the kernel as an inner product, i.e., an elementwise product of the two arrays followed by a sum over all product terms. This way, each kernel effectively learns a translation-invariant pattern in the data, and a typical convolution layer will learn multiple such kernels, each yielding a different feature (which are collected in an output feature axis). The output array is arranged to have the same spatial dimensions as the input, but becomes downsampled if the step size was greater than one. Furthermore, the kernel may be allowed to run off the edge of the input data, in which case the missing data is assumed to be zero, and this behavior is controlled via the padding argument. As special cases, a step size of 1 and padding of 'same' will result in an output array of the same size as the input. The operation generally does not care about the presence or absence of an instance ("batch") axis, and will process any such axes, or any other extra dimensions, independently with the same kernels. Another aspect of kernels is that the input data may (and likely will) already have a feature axis, and this axis is generally treated specially by convolution operations (also known as the "channels axis" in some frameworks, after color channels in image processing). Crucially, in full convolution the kernel array has an implicit feature axis, which allows the convolution to capture patterns that extend across both the swept spatial axes and any input feature axis (if any). Alternatively, this node can also implement grouped convolution (an in-between of full and depthwise convolution), where the input and output features are partitioned into N equally-sized subsets, and the first k/N output features are derived from only the first N input features, and so forth, in increments of N, yielding a number of output features that must be divisible by N. N must be given as the feature_group_count parameter. Another feature supported by this node is dilated convolution, in which the kernel is implicitly scaled by an integer factor along each spatial axis, which allows for using a smaller kernel to cover larger patterns, and which results in higher-resolution spatial output compared to the more common alternative of applying a convolution after pooling and downsampling of the data. Generally if there were multiple feature axes in the input they will be flattened into a single feature axis placed at the end. The full output shape is first the instance axes if any, then any unspecified non-feature dimensions, then the specified "spatial" (i.e., swept-over) dimensions in the order specified, followed by a single feature axis. In image processing, each convolution is usually followed by a non-linearity, which makes the operation an actual feature detector rather than just a linear transformation. However, in signal processing, convolutions are often used to learn spatial (e.g., ICA-like) and/or temporal (e.g., FIR-type) filters, mimicking traditional signal processing pipelines, and in such uses the subsequent non-linearity, and also the optional bias parameter of the convolution, are often omitted.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
mask
Mask to apply to the weights.- verbose name: Mask
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
w_init
Initializer for the weights.- verbose name: W Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
b_init
Initializer for the bias.- verbose name: B Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
w_prior
Optional prior distribution for the weights.- verbose name: W Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
b_prior
Optional prior distribution for the bias.- verbose name: B Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
sweep_axes
List and order of axes over which the convolution filter kernel is swept. If the input data are packets, this determines the order of these axes in the output data, and the order of the axes in the kernel (for plain array inputs, see end of tooltip). A kernel is a learned array that is shifted over all possible positions in the data (optionally with step size in each dimension, and optionally going past the edges of the data by half the kernel size if padding=same). For each position, the kernel is multiplied by the data in the region covered by the kernel and the resulting (elementwise) product is integrated (summed) to produce a single output score (a measure of match between the kernel and the data in that region). If the input data has an extra feature axis, the kernel will usually have an implicit extra axis to hold weights for each input feature. If the data has an instance axis, each instance will be processed separately (using the same kernels). If the input data are plain arrays, this merely determines the number of spatial axes and the names are just mnemonic and not otherwise used. This can alternatively be given as just a number to set the number of spatial dimensions, corresponding to the N in N-D convolution; for packet data, this will resolve to the last N axes in the data that are neither feature nor instance axes. This parameter is not limited to the predefined options.- verbose name: Axes To Sweep Kernel Over (Convolve)
- default value: time
- port type: ComboPort
- value type: str (can be None)
-
output_features
Number of filter kernels (and features) to learn. This value determines the length of the feature axis in the output data (each kernel yields one output feature, representing raw feature detection score produced by that kernel). In classic deep learning, this is also called the number of output channels -- analogous to RGB color channels in a raw image, or generally meant to be an unspecific feature axis in a data array (not to be confused with spatial channels in multi-channel time series, which more commonly treated like the vertical axis in 2d image data).- verbose name: Number Of Filters To Learn
- default value: 1
- port type: IntPort
- value type: int (can be None)
-
kernel_shape
Shape of the convolution filter kernel. This is a list of integers, one for each dimension as given in sweep axes. Can also be given as a single-element list, in which case the kernel is the same size along all of the given spatial dimensions. Note: if you make the kernel as large as the data along some axis, there is only a single valid position for the kernel along that axis (if padding=valid), and consequently the result is an inner product between the data and the kernel, or a matrix multiplication when more kernels are learned. Conversely, if you give the kernel a shape of 1 along an axis, the result is equivalent to processing each element along that axis separately using the same kernel. The latter is the same as not listing the axis in sweep axes, except that the output axis order can be controlled when specifying a 1-sized axis in sweep_axes. Which is more efficient depends on the implementation.- verbose name: Kernel Shape
- default value: [3]
- port type: ListPort
- value type: list (can be None)
-
strides
Step size with which the kernel is swept over the data. This is a list of integers, one for each dimension as given in sweep axes. Can also be given as a single-element list, in which case the same step size is used along all of the specified spatial dimensions. A step size greater than 1 means that the kernel will be shifted by this amount between successive positions; as a result, the amount of compute is lower by this factor, and the output data along this axis will also be shorter by this factor (matching the number of positions at which the kernel is applied).- verbose name: Step Size (Strides)
- default value: [1]
- port type: ListPort
- value type: list (can be None)
-
padding
Padding strategy for the data. This can be either 'valid' or 'same', or a custom list of padding amounts. 'valid' means no padding (i.e., the kernel will not run off the edges of the data, but the output data will be shorter along each axis according to the number of valid positions of the kernel along that axis), and 'same' means that the output will have the same shape as the input (aside from dilation and striding). Can be customized by giving a list[(low, high), ...] pairs, where low is the padding to apply before the data along each axis, and high is the padding to apply after the data along each axis. low and high can also be negative to trim the data instead of padding. If a single [(low, high)] pair is given, it is applied to all axes.- verbose name: Padding
- default value: valid
- port type: ComboPort
- value type: str (can be None)
-
with_bias
Whether to include a bias term. If given, then for each output feature, a bias term is learned and added to the output of the convolution. This increases the flexibility of the learned model, but note that the result is no longer strictly equivalent to e.g., a learned FIR filter applied to time-series data or a learned spatial filter / matrix multiplication applied to spatial data.- verbose name: Learn Bias Term(S)
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
dilations
Dilation (scaling) of the convolution kernel. This is a list of integers, one for each dimension as given in sweep axes. Can also be given as a single-element list, in which case the same step size is used for all of the specified spatial dimensions. This causes the kernel array to be "stretched" by this factor (or factors if multiple axes) to cover a larger region in the data without having to learn a higher-resolution (larger size) kernel. This is an alternative to the more traditional approach of first pooling the data with a step size greater than one before applying a regular (non-dilated) convolution, and can be used to, e.g., preserve the original shape and resolution of the data.- verbose name: Dilation (Kernel Scaling)
- default value: [1]
- port type: ListPort
- value type: list (can be None)
-
feature_group_count
Number of feature groups to use. This will partition features into N equal-sized groups before processing. The length of the input and output feature axes must be divisible by this number. This is mainly an optimization to reduce the number of parameters and computation, at the expense of learning kernels that integrate only a subset of the input features, perhaps at earlier stages of the network.- verbose name: Optional Feature Partitions
- default value: 1
- port type: IntPort
- value type: int (can be None)
-
w_initializer
Choice of weight initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Weight Initializer
- default value: lecun_normal
- port type: ComboPort
- value type: str (can be None)
-
b_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
data_format
Format of the input data. This is only respected when working with plain arrays and is ignored for packet data, which always normalizes the data to 'channels_last' layout. If 'channels_last', the data is assumed to be in the format ({batch}, ..., channels). If 'channels_first', the data is assumed to be in the format ({batch}, channels, ...).- verbose name: Array Data Format
- default value: auto
- port type: EnumPort
- value type: str (can be None)
-
op_precision
Operation precision. This is a compute performance optimization. See jax documentation for details on these options. Note that this only applies to the operation, while the storage precision may be separately configurable depending on the node in question.- verbose name: Operation Precision
- default value: default
- port type: EnumPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of weights.- verbose name: Layer Name
- default value: conv
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
CosineDecaySchedule
A cosine decay schedule.
This is a smooth falloff transition from the initial value down to alphainitial_value that follows the shape of the cosine function (from its initial peak to its first trough) over the course of transition_steps steps. After that, the parameter is held at alphainitial_value. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
init_value
Initial parameter value. This is the value at the beginning of the schedule.- verbose name: Initial Value
- default value: 1.0
- port type: FloatPort
- value type: float
-
transition_steps
The number of steps over which the cosine decay takes place. This is a soft transition following a raised- cosine function from a maximum scale of 1 (times initial_value) to a minimum scale of alpha (times initial value). The formula is: initial*((1-alpha)(raised_cosine(step/transition_steps)) + alpha)- verbose name: Transition Steps
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
alpha
The minimum scale of the cosine decay. This is the multiplier applied to the initial value at the end of the decay schedule (bottom of cosine function).- verbose name: Final Value Ratio
- default value: 0.0
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
CosineOneCycleSchedule
A one-cycle cosine ramp up/down parameter schedule.
This schedule is a cosine-shaped ramp up to a peak value, followed by a cosine-shaped ramp down to the final value (at transition_steps). The parameter is then held at that value until the end of the schedule. Only the peak value is specified directly, while the initial value is given as ratio to the peak value, as is the final value. The upslope duration is given as a fraction of the total transition_steps, while the downslope is the remainder of the transition steps. This schedule is inspired by Smith Topin's 2018 paper, "Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates" (see URL). Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
peak_value
Parameter value at peak. This is the maximum value that the parameter will attain over the course of the schedule.- verbose name: Peak Value
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
transition_steps
Step count at which to end the transition from the initial value to the final value. The parameter is held at the final value after this step count is reached. This is the duration of the full scaling cycle.- verbose name: Transition Steps
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
peak_initial_ratio
Ratio of the peak to the initial value. This is the ratio of the peak value to the initial value, i.e., the initial value is this many times smaller than the peak value.- verbose name: Peak Initial Ratio
- default value: 25
- port type: FloatPort
- value type: float (can be None)
-
peak_final_ratio
Ratio of the peak to the final value at the end of the cycle. The final value is this many times smaller than the initial value, and the parameter will be held at this value after transition_steps have been reached.- verbose name: Peak Final Ratio
- default value: 10000.0
- port type: FloatPort
- value type: float (can be None)
-
upslope_fraction
Fraction of the transition_steps that will be used for the cosine-shaped upslope. That is, the peak is reached after transition_steps * upslope_fraction steps. After this, the value slopes down in a single cosine-shaped transition to the final value.- verbose name: Upslope Step Fraction
- default value: 0.3
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
CustomInitializer
A custom initializer that is defined by a computational graph that takes in a shape and data type (string) and returns an array of the given shape/type.
An example graph has a Placeholder node with (slot)name "shape" and another one with name "dtype" (matching the signature in the signature parameter), and wires these into the "shape" and "precision" inputs of a RandomNormal node; that node's "This" output is then wired into the "graph" input of the CustomInitializer node (this node). Since the RandomNormal node also needs a random seed, which should derived from the root seed governing the overall neural network computation to ensure reproducibility, it is best practice to obtain that seed from a "Draw Random Seed" node and to check the "from haiku context" option to source it from the neural network-associated random sequence. For Experts: the graph can also define a third positional input (i.e., a placeholder node whose name appears at the end of the signature), which receives a random seed (for more on managing random seeds, see the documentation of the "Create Random Seed" node).
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
graph
Generator graph.- verbose name: Graph
- default value: None
- port type: GraphPort
- value type: Graph
-
graph__signature
Signature for the "graph" input. This represents the signature for the subgraph that is wired into the "graph" port. This is formatted as in (a,b,c) where a,b,c are names of placeholders that are expected in the subgraph that goes into the "graph" port. Alternatively, it can also be provided in data structure form as a list of lists, as in: [['a','b','c']].- verbose name: Graph [Signature]
- default value: (shape,dtype)
- port type: Port
- value type: object (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
CustomSchedule
A custom schedule that is defined by a graph function that takes in a step count and returns a parameter value.
A simple example is a Placeholder node with (slot)name "step" followed by a "Scale by Constant Factor" (Scaling) node that scales the step count by some factor, e.g., -0.01, and an "Add Constant Value" (Shift) node, which adds an offset (e.g., 1.0). Typically this would then be followed by a clipping node that ensures that the parameter value remains in a specified limit range, e.g., [0, 1]. This could be accomplished by ending the graph with a Clamp node. Finally you then wire that node's "This" output port into the "graph" port of the CustomSchedule node to complete the setup. The effect of the step_multiplier is to effectively stretch out or compress the schedule; the graph will effectively receive an adjusted step count that is divided by the step_multiplier.
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
graph
Schedule function.- verbose name: Graph
- default value: None
- port type: GraphPort
- value type: Graph
-
graph__signature
Signature for the "graph" input. This represents the signature for the subgraph that is wired into the "graph" port. This is formatted as in (a,b,c) where a,b,c are names of placeholders that are expected in the subgraph that goes into the "graph" port. Alternatively, it can also be provided in data structure form as a list of lists, as in: [['a','b','c']].- verbose name: Graph [Signature]
- default value: (step)
- port type: Port
- value type: object (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
CustomStep
A custom optimizer step that is defined by a graph applied to gradients and, optionally, current weights.
The graph must have either one or two placeholders (one for the gradients and one for the current weights), which must be listed in the graph's positional signature in that order. You can then implement any computation that processes and returns the updated gradients and, if you are building an end-to-end optimizer step, also converts the gradients into additive updates (by negating them), since the result of end-to-end steps is expected to be applied additively. Your graph may use any of the Step nodes (e.g., AdamStep, GradientClippinStep, etc.) or stateless operations (e.g., Add etc). If you are optimizing parameters that are in the form of a Packet (by setting prefer_packets to True in NetTransform), you can also use stateless nodes that operate on Packets such as ExtractStreams, MergeStreams, and so forth, which is useful when processing subsets of the parameters (e.g., corresponding to different layers) differently. There is a small set of stateful nodes that can also be used safely here; these are the nodes that have a "state" input and output port. Your graph may also expose additional hyper- parameters (e.g., the weight decay parameters), which can be defined using ParameterPort nodes and which need to have default values. These parameters can be overridden by passing name/value pairs into Custom Step.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
graph
Optimizer step returning additive updates.- verbose name: Graph
- default value: None
- port type: GraphPort
- value type: Graph
-
graph__signature
Signature for the "graph" input. This represents the signature for the subgraph that is wired into the "graph" port. This is formatted as in (a,b,c) where a,b,c are names of placeholders that are expected in the subgraph that goes into the "graph" port. Alternatively, it can also be provided in data structure form as a list of lists, as in: [['a','b','c']].- verbose name: Graph [Signature]
- default value: (gradients)
- port type: Port
- value type: object (can be None)
-
name1
Override 1.- verbose name: Name1
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val1
Value 1.- verbose name: Val1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name2
Override 2.- verbose name: Name2
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val2
Value 2.- verbose name: Val2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name3
Override 3.- verbose name: Name3
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val3
Value 3.- verbose name: Val3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name4
Override 4.- verbose name: Name4
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val4
Value 4.- verbose name: Val4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name5
Override 5.- verbose name: Name5
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val5
Value 5.- verbose name: Val5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name6
Override 6.- verbose name: Name6
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val6
Value 6.- verbose name: Val6
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name7
Override 7.- verbose name: Name7
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val7
Value 7.- verbose name: Val7
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name8
Override 8.- verbose name: Name8
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val8
Value 8.- verbose name: Val8
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name9
Override 9.- verbose name: Name9
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val9
Value 9.- verbose name: Val9
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
nameN
Additional overridden names.. .- verbose name: Namen
- default value: None
- port type: ListPort
- value type: list (can be None)
-
valN
Additional overridden values.. .- verbose name: Valn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
name0
Override 0.- verbose name: Name0
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val0
Value 0.- verbose name: Val0
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
CyclicCosineDecaySchedule
A cyclic linear warmup followed by cosine decay schedule.
This is essentially a repeated application of the "Linear warmup cosine decay" schedule, where each parameter is a list of values, one for each cycle. See also the "Linear Warmup Cosine Decay Schedule" schedule node for how a single cycle behaves. This schedule is more commonly known as the SGDR (SGD with Restarts) schedule, following a paper by Lochilov and Hutter, 2017 (see also URL). The basic idea is that the optimization can get stuck in a local optimum, and the subsequent cycle can "shake out" the current solution from that optimum and find a better one. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
init_values
Initial parameter values. This is the initial value at the beginning of each successive cycle.- verbose name: Cycle Initial Values
- default value: [0.0]
- port type: ListPort
- value type: list
-
peak_values
Peak parameter values. This is the value at the peak following the initial warmup, before it is lowered again following a cosine function, for each cycle.- verbose name: Peak Values
- default value: [1.0]
- port type: ListPort
- value type: list (can be None)
-
final_values
Final parameter values. The cosine decay reduces the value from the peak down to the final value; given for each cycle.- verbose name: Cycle Final Values
- default value: [0.0]
- port type: ListPort
- value type: list
-
warmup_steps
Number of steps over which to ramp up from the initial value to the peak value. After this, the parameter is lowered again following the shape of a cosine function down to the desired final value value; given for each cycle.- verbose name: Cycle Warmup Steps
- default value: [100]
- port type: ListPort
- value type: list (can be None)
-
decay_steps
The number of steps over which the cosine decay takes place. This is a soft transition following a raised- cosine function from the peak value down to the final value; given for each cycle.- verbose name: Cycle Decay Steps
- default value: [100]
- port type: ListPort
- value type: list (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
DPSGDStep
The differentially private SGD (DPSGD) optimizer step.
Based on Abadi et al., 2016, this optimizer can be used to reduce the sensitivity of the model to individual training samples or groups thereof, and can thus be used to train models on sensitive data. The optimizer has a number of parameters that are potentially data dependent, and must be provided by the user. IMPORTANT: this optimizer, unlike the others, requires access to the per-example gradients; thus, the gradients should have a leading "batch" dimension. This can be accomplished by using the VectorizedMap node on the gradient pipeline. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
l2_norm_clip
L2 norm clipping value. Maximum l2-norm of the per-example parameter updates. Must be provided.- verbose name: L2 Norm Clip
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
noise_multiplier
Noise multiplier. Ratio of standard deviation to the clipping norm. Must be provided.- verbose name: Noise Multiplier
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
randseed
Integer random seed. Must be provided.- verbose name: Randseed
- default value: None
- port type: IntPort
- value type: int (can be None)
-
momentum
Optional exponential decay rate for momentum.- verbose name: Optional Momentum
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
nesterov
Whether to use Nesterov acceleration.- verbose name: Use Nesterov Acceleration
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
DeepModel
A deep-learning based machine learning model.
This can be used as a drop-in replacement for off-the-shelf ML nodes such as the "Linear Discriminant Analysis" node. The node is configured by wiring a network graph, starting with an input placeholder, and containing some mix of Layer nodes, other deep learning nodes (in the deep learning category) and pure math nodes, and any other data restructuring nodes. Stateful nodes like filters (e.g., in the signal processing category) are not currently supported. The last node of the network graph should produce an output (i.e., a prediction) for each instance (trial) in the original input data. The output can be packet or a plain array. The input placeholder name should then be listed in the network signature of the deep model node. Alternatively you can also end the network graph with a Define Net node, which gives the network (and its weights) a name prefix; in this case the input placeholder needs to be listed in the network signature of the Define Net node (and you will need to use a (*) signature for the network graph in the deep model node. Optionally your network may also end in a NetTransform node, but this is not necessary (the deep model will automatically do the equivalent of that if needed). The predictions are then passed into the selected loss function along with the target labels; the loss can either be chosen from among a set of predefined losses, or you can specify a custom loss graph. The latter needs to have two input placeholders, both listed in the loss signature (defaulting to preds and targs). The first listed input must receive the predictions and the second must be the one for the target values. Typically the graph will then use one of the predefined loss nodes (nodes ending in Loss), but it can also implement, for example, custom per-class weights or per-instance weights. This node then takes the gradient of the provided loss with respect to the weights, and uses the selected optimizer to update the weights. The optimizer can be chosen from a set of predefined optimizers, or you can wire in a custom optimizer step node (node ending in Step). Of these, the ChainedStep and CustomStep nodes allow you to build fully custom optimizer graphs, which can be used to implement techniques such as learning rate schedules, gradient clipping, weight decay, weight constraints via projected gradient descent, or different optimizers for different weights (e.g., excluding layers or treating biases differently, etc). The full step will optionally be compiled to run efficiently, although not every combination of nodes used may be compilable. The learning behavior of the node can be configured in terms of the used batch size, number of steps (measured in instances, batches, or epochs), data shuffling behavior, and what kind of input data shall be used for training (e.g., only offline data or both offline and streaming, etc) vs only for prediction. You can also pass a 2-element list of training and validation data to the node, in which case a validation step will be executed on the validation data with some specified frequency, and which will evaluate the specified metric on predictions vs. targets. You can also provide a custom validation step as a graph, which can be used for custom progress tracking such as specific output requirements, custom metrics, custom validation frequency, and so forth. It is recommended to benchmark your GPU utilization and memory usage using a tool like nvitop while running, in order to ensure that you're using the GPU(s) optimally. There are be pros/cons to moving the data to the GPU using the MoveToBackend node before passing to this node; this may use more GPU memory (especially with larger datasets) but will typically achieve higher utilization. Alternatively, you can keep the data on the host but place two (or rarely more) tasks on the same GPU, for example by setting your cross validation node to run in parallel; this can restore any lost utilization. Also, while validation data is a protection against overfitting, it can be a good idea to instead limit the epoch count to the point where the model starts to overfit and to disable the validation split. This will typically be more efficient and use less GPU memory, often allowing you to place more than one task on the GPU. Like all machine learning methods, this method needs to be calibrated ("trained") before it can make any predictions on data. For this, the method requires training instances and associated training labels. The typical way to get such labels associated with time-series data is to make sure that a marker stream is included in the data, which is usually imported together with the data using one of the Import nodes, or received over the network alongside with the data, e.g., using the LSL Input node (with a non-empty marker query). These markers are then annotated with target labels using the Assign Targets node. To generate instances of training data for each of the training markers, one usually uses the Segmentation node to extract segments from the continuous time series around each marker. Since this machine learning method is not capable of being trained incrementally on streaming data, the method requires a data packet that contains the entire training data; this training data packet can either be accumulated online and then released in one shot using the Accumulate Calibration Data node, or it can be imported from a separate calibration recording and then spliced into the processing pipeline using the Inject Calibration Data, where it passes through the same nodes as the regular data until it reaches the machine learning node, where it is used for calibration. Once this node is calibrated, the trainable state of this node can be saved to a model file and later loaded for continued use.
Version 0.5.1
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process. This can be a packet or a two- element list of training and validation data.- verbose name: Data
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
net
Neural network graph.- verbose name: Net
- default value: None
- port type: GraphPort
- value type: Graph
-
net__signature
Argument names of network graph. This is a listing of the names of input arguments of your neural network, which is typically a single argument conventionally named "inputs" (although you can choose your own name). Your network is then a graph that begins with a Placeholder node whose slotname must match the name listed here, and which is followed by a series of NN nodes (e.g., Layers, normalization, etc), possibly interleaved with other mathematical operations and/or data formatting nodes. The final output of your network is expected to be predictions given the inputs (without a link function applied, i.e., if you are doing classification, the final output should be logits as produced by a Dense layer without a trailing Activation node, instead of probabilities). This final output node is then wired into the DeepModel's "net" input port. Note that in graphical UIs, the edge that goes into the "net" input will be drawn in dotted style to indicate that this is not normal forward data flow, but that the network graph runs under the control of the DeepModel node. The DeepModel node will act on your network in various ways, among others taking derivatives to optimize the weights, and invoking it to make predictions. The final predictions are ideally in a form that can be directly wired into one of the Loss nodes (e.g., SquaredLoss) without any further processing; a Dense node trivially satisfies this. Your network graph may also optionally contain an additional Placeholder node with slotname set to is_training, which can also be listed here in the signature. This placeholder will receive True if the network is called on training data or on prediction data.- verbose name: Net [Signature]
- default value: (inputs)
- port type: Port
- value type: object (can be None)
-
loss
Optional custom loss function graph.- verbose name: Loss
- default value: None
- port type: GraphPort
- value type: Graph
-
loss__signature
Optional argument names accepted by loss function graph. This is only used if you aim to fully override the loss function by a completely custom graph; in all other cases you can simply select the desired loss in the loss_function port. This is a graph with usually at least two placeholders (one for predictions and one for targets) whose slotnames match those listed here, and which usually ends in one of the Loss nodes (note that the loss returned by this graph is per prediction rather than a single overall scalar). The default loss is structured like this, where the Loss node is chosen and configured according to the loss_function setting. Your may also accept a third input that receives per-sample weights. For certain unsupervised losses you may also build for targets being passed in as all-zeros. The final output of the loss graph is then wired into the "loss" input of the DeepModel node. Note that in graphical UIs, the edge that goes into the "loss" input will be drawn in dotted style to indicate that this is not normal forward data flow, but that the loss graph runs under the control of the DeepModel node.- verbose name: Loss [Signature]
- default value: (preds,targs)
- port type: Port
- value type: object (can be None)
-
optstep
Optional optimizer step node.- verbose name: Optstep
- default value: None
- port type: GraphPort
- value type: Graph
-
valstep
Optional validation step graph.- verbose name: Valstep
- default value: None
- port type: GraphPort
- value type: Graph
-
valstep__signature
Arguments for the optional validation step graph. This is a graph with usually three placeholders (iteration count, predictions, and optionally targets) that receive their input from the supplied validation data. This graph can do anything (e.g., compute and print a performance metric), may re-batch the data (e.g., to compute a running average), could optionally only emit outputs once every few calls, and may invoke a Break or Continue node to either stop the optimization (e.g., as a form of early stopping) or to reject the current optimizer update (e.g. for backtracking to the state at the previous validation update), which can be used if loss spikes or other signs of divergence occur. The final output of the graph should be a performance measure, e.g., as obtained from the PerformanceMetric node. The default graph has three placeholders named iter, preds, and targs that are wired into the update, preds, and targs ports of the PerformanceMetric node. The node's perf_metric setting defaults to the DeepModel's eval_metric setting. To replace this graph, you can start with a custom graph that replicates this recipe and then modify it to your needs.- verbose name: Valstep [Signature]
- default value: (iter,preds,targs)
- port type: Port
- value type: object (can be None)
-
trainfeed
Optional training feed graph.- verbose name: Trainfeed
- default value: None
- port type: GraphPort
- value type: Graph
-
trainfeed__signature
Arguments for the optional training feed graph. The purpose of this graph is to emit shuffled mini-batches of training data, and it can be overridden to realize highly customized input pipelines, which is however rarely necessary. The basic function of the graph is to take the inputs (e.g., a Packet), and according to the iter parameter, which is the 0-based iteration number, emit a mini-batch of data; the default behavior is fairly complex and works as follows: first, an integer range is created that indexes all trials in the input dataset. Then, in a collecting fold loop loop over the number of epochs (passes through the training set), the range is successively shuffled and the shuffled ranges are then concatenated and flattened into a single index array. Then, a length-based index range starting at the current iter and of length batch size is selected from that index sequence and then used to index the input dataset, which is then returned. Due to the need to shuffle each range differently, the fold loop uses the split random seed node to successively advance the current random key while splitting off a key for use with the permutation. Note that the graph only sees data meant for training and neither test-only nor validation data, and that all inputs except for the iter are constant for the duration of the training process.The total_insts arguments is the total number of instances to emit across the training process, i.e., this will be larger than the number of instances in the dataset of multiple passes over the dataset are to be made.- verbose name: Trainfeed [Signature]
- default value: (inputs,randkey,batch_size,total_insts,iter)
- port type: Port
- value type: object (can be None)
-
augmentations
Optional data augmentation graph.- verbose name: Augmentations
- default value: None
- port type: GraphPort
- value type: Graph
-
augmentations__signature
Arguments for the optional data augmentation graph. This graph applies to the output of the training feed, i.e., a single mini-batch. The graph may increase/decrease or otherwise recombine the instances in the batch, and may also add noise or other perturbations. One technique is to first use the RepeatAlongAxis node to replicate the batch a number of times, then to apply the augmentations, and finally to select a random subset (driven by the randkey) to bring the batch back to its original size. Another variant defines a larger than usual incoming batch size (e.g., 5x) in the DeepModel node, augments this, and finally reduces the output to a normal batch size. This allows the augmentation to operate on a greater diversity of data than it otherwise would; however, this has the side effect that a pass through a dataset will only use a relatively small percentage of the trials, so the epoch count has to also be increased accordingly (e.g., 5x). The graph may also have a third argument that receives the is_training flag, which is True when the graph is called on training data and False when it is called on test data; if this is not present, the graph is only applied to training data and never test or validation data. This is mostly useful for augmentations that produce biased results.- verbose name: Augmentations [Signature]
- default value: (batch,randkey,~)
- port type: Port
- value type: object (can be None)
-
loss_function
Loss function to optimize for. The two cross-entropy losses are the default losses to use for classification problems. The sigmoid loss is for two-class classification and assumes that your pipeline typically ends in a single Dense Layer node (or equivalent) that has one output unit and no trailing Activation node (the loss function applies the sigmoid). The softmax loss is for multi-class classification and instead assumes that you use a Dense Layer node with as many output units as there are classes. Again, the Dense Layer should not be followed by an Activation node, as a softmax non-linearity is applied by the loss function. The hinge loss is a robust but non-probabilistic loss for two-class classification (used in support vector machines). The squared, huber, and log_cosh are all losses for use in regression problems, i.e., if a continuous-valued quantity with normally-distributed (in case of squared loss) or heavy-tailed noise (with huber and log_cosh) is being predicted. The latter two losses are therefore usable for robust regression, but note that the robustness of Huber critically depends on a parameter. The cosine_distance loss is usable for vector-valued predictions where the correlation against some target vectors is being optimized. Additional information can also be found in the node of similar name ending in Loss. You may also specify an entirely custom loss via a graph, in which case the loss function may be set to 'custom' for clarity. To specify a custom loss, wire a graph into the "loss" port that has a 'preds' and 'targs' placeholder node and which emits a single scalar loss for the given batch, which in the simplest case is one of the existing Loss nodes followed by a Sum node (that drops the summed-over axis). Overriding the loss allows you to use, for example, class-weighted losses or instance-weighted losses (for the latter your loss graph needs to accept a third positional argument that receives the per-instance weights).- verbose name: Loss Function
- default value: sigmoid_binary_crossentropy
- port type: ComboPort
- value type: str (can be None)
-
optimizer
Optimizer to use. Optimizers differ in a number of characteristics, most importantly in the speed of convergence and the tendency to either diverge ("blow up", resulting in NaNs or infinites) or to fail to learn anything. Other characteristics are how much memory they use, and to some extent how likely they are to overfit the training data. Good starting choices are adam and adamw, or one of adabelief, amsgrad, fromage or yogi if adam fails (e.g., diverges or fails to learn), but all optimizers have scenarios where they outperform all others. More advanced users may experiment with rmsprop, novograd, sgd, or noisysgd. For very large batch sizes, lamb and lars are well-adapted. Also adafactor and sm3 can be useful for very large models to conserve memory. For information on the available optimizers, see the individual Step nodes in NeuroPype's deep learning category. Optimizers can be configured with parameters here, by appending them in parentheses after the name and separated by commas, in the order of appearance in the respective step node. For example, 'adam(0.001, 0.9, 0.999)' specifies the Adam optimizer with a learning rate of 0.001, beta1 of 0.9, and beta2 of 0.999. Note that in most optimizers the learning rate is potentially problem-specific, and may have to be tuned. Beyond basic scenarios and to achieve maximum accuracy, the learning rate would typically be decayed over time following a schedule to improve robustness, convergence and final-solution accuracy. To specify such a learning rate schedule you need use a custom optimizer. This is done by choosing the "custom" optimizer setting and then wiring one of the Step nodes corresponding to your optimizer into the "optstep" port (using the step node's "this" output port). Finally you wire one of the Schedule nodes (e.g., "Linear Warmup Exponential Decay Schedule" (WarmupExponentialDecaySchedule)) into the into the step node's 'learning_rate_schedule' port. This also allows you to override parameters of the optimizer in your pipeline graph rather than in textual form. When doing so note that not all Step node are end-to-end optimizers - if in doubt, go by the optimizers in this listing or read the optimizer's documentation text. It is also possible to use a fully custom optimizer using either the ChainedStep node or the CustomStep node, which gives fine-grained control over how different layers are updated, weight-decay regularization and constraints, and other aspects of the optimization process. (see documentation of these two nodes for more details on their use)- verbose name: Optimizer
- default value: adam(0.001)
- port type: ComboPort
- value type: str (can be None)
-
batch_size
Number of samples to process in each (mini-)batch. This is the number of instances that will be processed at a time by the network. The batch size is a tradeoff between computational speed (larger batches can be faster on sufficiently parallel hardware) vs stochastic noise in the gradient estimate (smaller batches are more noisy) and memory usage (large batches require more memory). Note that noisy gradients are not necessarily a bad thing, since they tend to act as a form of regularization, which may protect against overfitting. The batch size also interacts with the learning rate, so after changing one the other may have to be readjusted. For very large batch sizes (1000s), you may also need to select a different type of optimizer step, for example LAMB or LARS.- verbose name: Batch Size
- default value: 32
- port type: IntPort
- value type: int (can be None)
-
maxsteps
Maximum number of training steps to perform, measured in the given step unit (see stepunit). Must be specified. If set to 0, this will suppress training.- verbose name: Maxsteps
- default value: None
- port type: IntPort
- value type: int (can be None)
-
stepunit
Unit in which maxsteps is given. Important: the 'batches' and 'instances' units are the total number of steps to train, even if training on streaming data or on successively provided offline datasets, or if the data has more instances than used as per maxsteps. In contrast, the 'epochs' unit always refers to the training data on the current invocation of the node; that is, if epochs is set to N, then the node will make exactly N passes through each given training data packet, whether that is the first and only packet or wether this is the m'th such packet received during eg training on streaming data or on successive offline data (as governed by the train_on setting).- verbose name: Maxsteps Unit
- default value: epochs
- port type: EnumPort
- value type: str (can be None)
-
shuffle
Whether and when to shuffle the training data. Shuffling is necessary when training on time-series data (or other serially correlated data) to implement stochastic gradient descent. If set to "none", no shuffling is performed, which usually requires that your training data is preshuffled. If set to "once", then the training dataset is shuffled once at the start of training. If set to "per-epoch", then the training data is shuffled at the start of each epoch (this behaves the same way regardless of whether your maxsteps is counted in terms of epochs or iterations). This option is ignored and may be set to 'custom' when a custom trainfeed graph is specified, which takes over shuffling.- verbose name: Shuffle
- default value: per-epoch
- port type: ComboPort
- value type: str (can be None)
-
train_on
Update the model on the specified data. Note that generally the model will output predictions on any data that it receives whether it is training on it or not. In the "initial offline" mode - and only if the model is not already trained (i.e., if no previously learned model was loaded in) - the model is trained on the first non-streaming packet that it receives, which should thus be the training set. This can be used when the first packet is the training data and any subsequent packets are test data only, OR when a pretrained model is loaded that should not be further updated. In the "successive offline" mode, the model is trained on any non-streaming packet that it receives, whether it is already pretrained or not (in such case it will be further fine-tuned given the new training data). Any streaming data is taken as test-only data; this is a typical scenario for real-time processing when the model is first trained or fine-tuned on some pre-recorded data, but it can also be used to just train a model on multiple successive datasets. If max_steps is given in epochs, it means that each successive dataset will be passed through the model for this many epochs. The last mode, "offline and streaming" will train on any data, whether it is offline or streaming. This can be used for either real-time training or fine-tuning, i.e., while data is being collected, or for "out-of-core" training on a very large dataset that would not fit in memory if loaded up-front, and which is therefore streamed through the model in chunks. The parameter can also be changed at runtime, to switch from one mode to another.- verbose name: Train On
- default value: initial offline
- port type: EnumPort
- value type: str (can be None)
-
dont_reset_model
Do not reset the model when the input data graph is changed. Normally, when certain parameters of preceding nodes are being changed, the model will be reset. If this is enabled, the model will persist, but there is a chance that the model is incompatible when input data format to this node has changed.- verbose name: Do Not Reset Model
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
validation_split
Fraction of the training data to use for validation. Setting this to a number between 0 and 1 will split off the specified fraction from the training data (always from the end of the data) and use it as validation data. This data will not be used for training but to support the early-stopping technique, wherein training stops when the validation performance no longer improves. This can also be omitted (set to zero) to disable validation and early stopping, in which case the maximum number of steps needs to be set more carefully. As an alternative to using this parameter, one may also provide validation data along with training data through the data input.- verbose name: Validation Split
- default value: 0.2
- port type: FloatPort
- value type: float (can be None)
-
eval_metric
Metric to use for reporting progress. This can be either a built-in validation metric or a fully custom validator step (which must be wired into the valstep port). The metric is evaluated both on the training data and on the validation data if any was passed in. Note that ONLY the validation metric will track generalization performance, while the training-set metric mainly serves to spot potential optimizer pathologies or model inadequacy. One training-set diagnostic is whether the model learned anything at all from the training data vs flatlining at chance performance level, and another is as an upper bound on the best-case generalization error under ideal circumstances, where an unsatisfactory score would indicate that the model capacity or architecture or learning process failed to adequately model the training data. Generally some of the metrics are suitable for classification problems (accuracy, precision/recall/f1, and roc metrics) while others are for use in regression problems (r2, explained_variance, max_error and the neg_error metrics). These metrics are all available in the scikit-learn package, and the documentation for them can be found on the web. In quick summary, balanced_accuracy is a good choice for generic classification problems on balanced or unbalanced data, precision/recall are sometimes asked for in detection problems of a relatively rare class (e.g., clinical diagnosis), and the roc metrics are useful both for measuring performance on unbalanced data (although on extremely unbalanced data, the measures can break down), and for measuring the performance in detection problems across a range of decision thresholds, e.g., when the relative importance of type-1 and type-2 errors is not fixed at the time the ML work is done. The F1 scores are a specific blend of precision and recall across classes that can be used as a general-purpose (albeit perhaps not very interpretable) performance measure on some signal detection tasks. For regression, the r2 score is a fair choice for generic signal regression and is relative to the scale of the target variable; explained variance is similar but specifically does not score systematic offset (i.e., bias) in the predictions (usually a deficiency of the metric but sometimes useful, e.g. while working on a model that will still receive a bias correction later). The neg_error metrics are the most common tools in regression problems; of those, the mean_absolute, median_absolute variants are perhaps the most interpretable as the errors are directly in the same units as the target variable (e.g., meters), and the latter is robust if there are outliers in either the targets or predictions. The mean_squared error is the default choice under a Gaussian noise assumption, and the mean_squared_log error is useful if the data are log-normally distributed (e.g., data from exponential growth processes). Somewhat similarly, the neg_mean_absolute_percentage_error is also adequate for measuring performance on data where the target variable ranges across several orders of magnitude in scale, and the error is relative to each individual target value (but note that the score is not measured in percent but is a fraction). Note that in evaluation scores, higher values are better (unlike loss measures). Also unlike the loss measures, most of the metrics are not differentiable, so they cannot be directly plugged into the optimizer. However, model hyper-parameters can be optimized with respect to these metrics when using the Parameter Optimization node, at significant computational overhead (requiring many model training runs for different hyper parameter values).- verbose name: Evaluation Metric
- default value: balanced_accuracy
- port type: ComboPort
- value type: str (can be None)
-
verbosity
Verbosity level. Higher numbers will produce more extensive diagnostic output.- verbose name: Verbosity Level
- default value: 1
- port type: EnumPort
- value type: str (can be None)
-
random_seed
Seed for any pseudo-random choices during training. This can be either a splittable seed as generated by Create Random Seed or a plain integer seed. If left unspecified, this will resolve to either the current 0-based "task number" if used in a context where such a thing is defined (for example cross-validation or a parallel for loop), or 12345 as a generic default.- verbose name: Random Seed
- default value: None
- port type: Port
- value type: AnyNumeric (can be None)
-
compile_model
When to compile the model used for training. This can be set to 'never' to run the model in what is known as "eager mode", which is slow but allows for potentially easier debugging. The difference between always and cached is that the latter is a hint to attempt to save the compiled model to disk and reuse it later when possible. However, whether and in what situations that is actually done depends on DeepModel and the underlying implementation.- verbose name: Compile Model
- default value: cached
- port type: EnumPort
- value type: str (can be None)
-
skip_incomplete_batch
Whether to skip the last batch encountered during training if it is smaller than the batch size. This is applicable when the dataset size is not a multiple of the batch size. Processing this batch will require a separate compilation for the odd batch size, but leaves no data samples unprocessed. When training for typical number of epochs and trials, the trials in this batch tend to be insignificant to the overall training progress, which typically involves thousands of batches, so this option should generally be left enabled.- verbose name: Skip Incomplete Batch
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
retain_model
What model to retain for output. This can be either the model with the best loss or validation score, or the model that corresponds to the last training step. If no validation data are provided, this will always be the last model.- verbose name: Retain Model
- default value: lowest-loss
- port type: EnumPort
- value type: str (can be None)
-
canonicalize_output_axes
Whether to canonicalize the output axes of the model to match the expected output axes of other machine-learning nodes. This can be turned off if your model emits a handcrafted feature or statistic axis to describe its predictions that you'd like to retain. Note though that some downstream nodes, like MeasureLoss might not work as expected.- verbose name: Canonicalize Output Axes
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
no_compile_in_debug
Do not compile the model when running in debug mode.- verbose name: No Compile In Debug
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
conserve_memory_every
Conserve memory by clearing caches etc. every this many steps (in the respective stepunit). This is a hint to the system that may not actually be honored, depending on the implementation and settings such as whether the model is compiled or not. The default is to NOT clear any caches. A good choice is to clear caches every 3-5 epochs for heavy models. For extremely memory-constrained setups, this can also be set to -1, in which case the cache will be cleared before training a new epoch and before predictions for a new epoch.- verbose name: Conserve Memory Every
- default value: 0
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
DenseLayer
Dense neural network layer.
This is a fully-connected layer, where each input feature is connected to each output feature. Optionally includes a bias term. As with all built-in layers, you can override the initializer for the weights and/or the bias, which default to Lecun Normal and zero, respectively.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
w_init
Initializer for the weights.- verbose name: W Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
b_init
Initializer for the bias.- verbose name: B Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
w_prior
Optional prior distribution for the weights.- verbose name: W Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
b_prior
Optional prior distribution for the bias.- verbose name: B Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
units
Number of units (i.e . output features).- verbose name: Units
- default value: 1
- port type: IntPort
- value type: int (can be None)
-
with_bias
Whether to include a bias term.- verbose name: With Bias
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
w_initializer
Choice of weight initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Weight Initializer
- default value: lecun_normal
- port type: ComboPort
- value type: str (can be None)
-
b_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
op_precision
Operation precision. This is a compute performance optimization. See jax documentation for details on these options. Note that this only applies to the operation, while the storage precision may be separately configurable depending on the node in question.- verbose name: Operation Precision
- default value: default
- port type: EnumPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of weights.- verbose name: Layer Name
- default value: dense
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
DepthwiseConvolutionLayer
A 1/2/3/N-D depthwise convolution layer.
See the "Convolution Layer" node for a general overview of convolutions. In depthwise convolution, and in contrast to regular ("full") convolution, instead of learning a kernel that goes across all input features, the node learns N smaller kernels, each of which sees only one of the input features. As a result, this variant does not learn any cross-feature patterns, but it saves both computation and parameters when it is applicable, and can thus reduce the chance of overfitting. A notable difference to regular convolution is that the output feature count is not given as a total but as a multiplier of the input features, since the node can also learn multiple kernel per input feature, each resulting in a separate output feature. Depthwise convolution can also be used to implement other kinds of factorized processing across features depending on the application.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
mask
Mask to apply to the weights.- verbose name: Mask
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
w_init
Initializer for the weights.- verbose name: W Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
b_init
Initializer for the bias.- verbose name: B Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
w_prior
Optional prior distribution for the weights.- verbose name: W Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
b_prior
Optional prior distribution for the bias.- verbose name: B Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
sweep_axes
List and order of axes over which the convolution filter kernel is swept. If the input data are packets, this determines the order of these axes in the output data, and the order of the axes in the kernel (for plain array inputs, see end of tooltip). A kernel is a learned array that is shifted over all possible positions in the data (optionally with step size in each dimension, and optionally going past the edges of the data by half the kernel size if padding=same). For each position, the kernel is multiplied by the data in the region covered by the kernel and the resulting (elementwise) product is integrated (summed) to produce a single output score (a measure of match between the kernel and the data in that region). If the input data has an extra feature axis, the kernel will usually have an implicit extra axis to hold weights for each input feature. If the data has an instance axis, each instance will be processed separately (using the same kernels). If the input data are plain arrays, this merely determines the number of spatial axes and the names are just mnemonic and not otherwise used. This can alternatively be given as just a number to set the number of spatial dimensions, corresponding to the N in N-D convolution; for packet data, this will resolve to the last N axes in the data that are neither feature nor instance axes. This parameter is not limited to the predefined options.- verbose name: Axes To Sweep Kernel Over (Convolve)
- default value: time
- port type: ComboPort
- value type: str (can be None)
-
output_features
Number of filter kernels to learn for each input feature. This is a multiplier applied to the number of input features rather than the total number of output features. This value generally determines the length of the feature axis in the output data (each kernel yields one output feature per input feature, representing raw feature detection score produced by that kernel). In classic deep learning, this is also called the number of output channels -- analogous to RGB color channels in a raw image, or generally meant to be an unspecific feature axis in a data array (not to be confused with spatial channels in multi-channel time series, which more commonly treated like the vertical axis in 2d image data).- verbose name: Filters Per Input Feature
- default value: 1
- port type: IntPort
- value type: int (can be None)
-
kernel_shape
Shape of the convolution filter kernel. This is a list of integers, one for each dimension as given in sweep axes. Can also be given as a single-element list, in which case the kernel is the same size along all of the given spatial dimensions. Note: if you make the kernel as large as the data along some axis, there is only a single valid position for the kernel along that axis (if padding=valid), and consequently the result is an inner product between the data and the kernel, or a matrix multiplication when more kernels are learned. Conversely, if you give the kernel a shape of 1 along an axis, the result is equivalent to processing each element along that axis separately using the same kernel. The latter is the same as not listing the axis in sweep axes, except that the output axis order can be controlled when specifying a 1-sized axis in sweep_axes. Which is more efficient depends on the implementation.- verbose name: Kernel Shape
- default value: [3]
- port type: ListPort
- value type: list (can be None)
-
strides
Step size with which the kernel is swept over the data. This is a list of integers, one for each dimension as given in sweep axes. Can also be given as a single-element list, in which case the same step size is used along all of the specified spatial dimensions. A step size greater than 1 means that the kernel will be shifted by this amount between successive positions; as a result, the amount of compute is lower by this factor, and the output data along this axis will also be shorter by this factor (matching the number of positions at which the kernel is applied).- verbose name: Step Size (Strides)
- default value: [1]
- port type: ListPort
- value type: list (can be None)
-
padding
Padding strategy for the data. This can be either 'valid' or 'same', or a custom list of padding amounts. 'valid' means no padding (i.e., the kernel will not run off the edges of the data, but the output data will be shorter along each axis according to the number of valid positions of the kernel along that axis), and 'same' means that the output will have the same shape as the input (aside from dilation and striding). Can be customized by giving a list[(low, high), ...] pairs, where low is the padding to apply before the data along each axis, and high is the padding to apply after the data along each axis. low and high can also be negative to trim the data instead of padding. If a single [(low, high)] pair is given, it is applied to all axes.- verbose name: Padding
- default value: valid
- port type: ComboPort
- value type: str (can be None)
-
with_bias
Whether to include a bias term. If given, then for each output feature, a bias term is learned and added to the output of the convolution. This increases the flexibility of the learned model, but note that the result is no longer strictly equivalent to e.g., a learned FIR filter applied to time-series data or a learned spatial filter / matrix multiplication applied to spatial data.- verbose name: Learn Bias Term(S)
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
w_initializer
Choice of weight initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Weight Initializer
- default value: lecun_normal
- port type: ComboPort
- value type: str (can be None)
-
b_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
data_format
Format of the input data. This is only respected when working with plain arrays and is ignored for packet data, which always normalizes the data to 'channels_last' layout. If 'channels_last', the data is assumed to be in the format ({batch}, ..., channels). If 'channels_first', the data is assumed to be in the format ({batch}, channels, ...).- verbose name: Array Data Format
- default value: auto
- port type: EnumPort
- value type: str (can be None)
-
op_precision
Operation precision. This is a compute performance optimization. See jax documentation for details on these options. Note that this only applies to the operation, while the storage precision may be separately configurable depending on the node in question.- verbose name: Operation Precision
- default value: default
- port type: EnumPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of weights.- verbose name: Layer Name
- default value: depthwise_conv
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
DepthwiseSeparableConvolutionLayer
A 1/2/3/N-D depthwise separable convolution layer.
See the "Convolution Layer" node for a general overview of convolutions and the "Depthwise Convolution Layer" node for a description of the depthwise variant. The depthwise separable variant is a further restriction of depthwise and is a highly factorized model. Besides being limited to 2 swept axes, a kernel is not a full learnable matrix but can be viewed as being generated as a product of a learnable horizontal and a learnable vertical weight vector, the result of which is applied like an (implicitly defined) highly redundant matrix (of rank 1). Note a potential point of confusion: Keras has a layer named Depthwise Separable Convolution, which is not a factorized convolution but a regular Depthwise Convolution followed by a 1x1 Convolution.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
mask
Mask to apply to the weights.- verbose name: Mask
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
w_init
Initializer for the weights.- verbose name: W Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
b_init
Initializer for the bias.- verbose name: B Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
w_prior
Optional prior distribution for the weights.- verbose name: W Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
b_prior
Optional prior distribution for the bias.- verbose name: B Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
sweep_axes
List and order of axes over which the convolution filter kernel is swept. If the input data are packets, this determines the order of these axes in the output data, and the order of the axes in the kernel (for plain array inputs, see end of tooltip). A kernel is a learned array that is shifted over all possible positions in the data (optionally with step size in each dimension, and optionally going past the edges of the data by half the kernel size if padding=same). For each position, the kernel is multiplied by the data in the region covered by the kernel and the resulting (elementwise) product is integrated (summed) to produce a single output score (a measure of match between the kernel and the data in that region). If the input data has an extra feature axis, the kernel will usually have an implicit extra axis to hold weights for each input feature. If the data has an instance axis, each instance will be processed separately (using the same kernels). If the input data are plain arrays, this merely determines the number of spatial axes and the names are just mnemonic and not otherwise used. This can alternatively be given as just a number to set the number of spatial dimensions, corresponding to the N in N-D convolution; for packet data, this will resolve to the last N axes in the data that are neither feature nor instance axes. This parameter is not limited to the predefined options.- verbose name: Axes To Sweep Kernel Over (Convolve)
- default value: time
- port type: ComboPort
- value type: str (can be None)
-
output_features
Number of filter kernels to learn for each input feature. This is a multiplier applied to the number of input features rather than the total number of output features. This value generally determines the length of the feature axis in the output data (each kernel yields one output feature per input feature, representing raw feature detection score produced by that kernel). In classic deep learning, this is also called the number of output channels -- analogous to RGB color channels in a raw image, or generally meant to be an unspecific feature axis in a data array (not to be confused with spatial channels in multi-channel time series, which more commonly treated like the vertical axis in 2d image data).- verbose name: Filters Per Input Feature
- default value: 1
- port type: IntPort
- value type: int (can be None)
-
kernel_shape
Shape of the convolution filter kernel. This is a list of integers, one for each dimension as given in sweep axes. Can also be given as a single-element list, in which case the kernel is the same size along all of the given spatial dimensions. Note: if you make the kernel as large as the data along some axis, there is only a single valid position for the kernel along that axis (if padding=valid), and consequently the result is an inner product between the data and the kernel, or a matrix multiplication when more kernels are learned. Conversely, if you give the kernel a shape of 1 along an axis, the result is equivalent to processing each element along that axis separately using the same kernel. The latter is the same as not listing the axis in sweep axes, except that the output axis order can be controlled when specifying a 1-sized axis in sweep_axes. Which is more efficient depends on the implementation.- verbose name: Kernel Shape
- default value: [3]
- port type: ListPort
- value type: list (can be None)
-
strides
Step size with which the kernel is swept over the data. This is a list of integers, one for each dimension as given in sweep axes. Can also be given as a single-element list, in which case the same step size is used along all of the specified spatial dimensions. A step size greater than 1 means that the kernel will be shifted by this amount between successive positions; as a result, the amount of compute is lower by this factor, and the output data along this axis will also be shorter by this factor (matching the number of positions at which the kernel is applied).- verbose name: Step Size (Strides)
- default value: [1]
- port type: ListPort
- value type: list (can be None)
-
padding
Padding strategy for the data. This can be either 'valid' or 'same', or a custom list of padding amounts. 'valid' means no padding (i.e., the kernel will not run off the edges of the data, but the output data will be shorter along each axis according to the number of valid positions of the kernel along that axis), and 'same' means that the output will have the same shape as the input (aside from dilation and striding). Can be customized by giving a list[(low, high), ...] pairs, where low is the padding to apply before the data along each axis, and high is the padding to apply after the data along each axis. low and high can also be negative to trim the data instead of padding. If a single [(low, high)] pair is given, it is applied to all axes.- verbose name: Padding
- default value: valid
- port type: ComboPort
- value type: str (can be None)
-
with_bias
Whether to include a bias term. If given, then for each output feature, a bias term is learned and added to the output of the convolution. This increases the flexibility of the learned model, but note that the result is no longer strictly equivalent to e.g., a learned FIR filter applied to time-series data or a learned spatial filter / matrix multiplication applied to spatial data.- verbose name: Learn Bias Term(S)
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
w_initializer
Choice of weight initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Weight Initializer
- default value: lecun_normal
- port type: ComboPort
- value type: str (can be None)
-
b_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
data_format
Format of the input data. This is only respected when working with plain arrays and is ignored for packet data, which always normalizes the data to 'channels_last' layout. If 'channels_last', the data is assumed to be in the format ({batch}, ..., channels). If 'channels_first', the data is assumed to be in the format ({batch}, channels, ...).- verbose name: Array Data Format
- default value: auto
- port type: EnumPort
- value type: str (can be None)
-
op_precision
Operation precision. This is a compute performance optimization. See jax documentation for details on these options. Note that this only applies to the operation, while the storage precision may be separately configurable depending on the node in question.- verbose name: Operation Precision
- default value: default
- port type: EnumPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of weights.- verbose name: Layer Name
- default value: separable_conv
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
Dropout
Apply dropout regularization to the data.
This will randomly drop out a given element with probability given by the dropout rate. As a result, the downstream nodes will see a different input each time, and the network will be forced to learn robust representations that are not overly dependent on any one element.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
random
Random number key to use.- verbose name: Random
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
is_training
Whether the node is used in training mode.- verbose name: Is Training
- default value: None
- port type: DataPort
- value type: bool (can be None)
- data direction: IN
-
rate
Dropout rate. The probability of dropping out a given element.- verbose name: Dropout Rate
- default value: 0.5
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
EmbeddingLayer
A trainable layer for mapping categorical (integer) data to low-dimensional vectors.
This is used because neural networks are not inherently well adapted to mapping categorical data to continuous representations, so for small to mid-sized categorical data, one typically has to convert such data to a one-hot encoding, which can then be used with a Dense layer. However, in the case of many categories, this can be inefficient, and if categories inherently share separable features, this node provides a scalable solution that works for tens of thousands of categories (e.g., when the inputs are indices corresponding to a large tabularized vocabulary of words or other tokens, e.g., event markers) by mapping these to a learned lower-dimensional feature vectors. The maximum integer that can be encountered as well as the dimensionality of the vector embedding are pre-specified as parameters. The node has a few options for performance tuning (op_precision and internal_mapping), and can also be initialized with a pretrained embedding matrix (w_pretrained) as an alternative to random initialization. In this case the input data may only contain one stream. For Packet data, the output of this node has the same axes as the input, except for an additional feature axis (of length embedding_features) at the end. Note that this node does not automatically work on event marker data in packets, since this node, like all other DNN nodes, ignores marker streams. Instead, you will have to define a fixed tabulated vocabulary of event markers and convert any markers you want to process into indices into this vocabulary, which would go into the data array of a regular non-marker stream (which can and most likely should still be indexed by an instance axis etc.)
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
w_init
Initializer for the weights.- verbose name: W Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
w_pretrained
Optionally a pretrained embedding matrix.- verbose name: W Pretrained
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
w_prior
Optional prior distribution for the weights.- verbose name: W Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
num_categories
Number of discrete categories. This is equivalent to the largest integer index that the node may receive plus 1. This is also known as the vocabulary size if the inputs are indices into a dictionary when processing text data.- verbose name: Num Categories
- default value: 1000
- port type: IntPort
- value type: int (can be None)
-
embedding_features
Number of features in the embedding.- verbose name: Embedding Features
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
internal_mapping
How the node internally implements the lookup operation. The external behavior of the node is the same in either case, but performance characteristics may differ depending on the implementation and the other settings. The default is currently indices, but this may change in the future.- verbose name: Internal Mapping
- default value: default
- port type: EnumPort
- value type: str (can be None)
-
w_initializer
Choice of weight initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Weight Initializer
- default value: truncated_normal
- port type: ComboPort
- value type: str (can be None)
-
op_precision
Operation precision. This is a compute performance optimization. See jax documentation for details on these options. Note that this only applies to the operation, while the storage precision may be separately configurable depending on the node in question.- verbose name: Operation Precision
- default value: default
- port type: EnumPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of weights.- verbose name: Layer Name
- default value: embedding
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
ExponentialDecaySchedule
An exponential decay (or growth) schedule.
This schedule holds the parameter at the initial value until the current step count reaches the value set via decay_begin, and then applies exponential decay until the given final value is reached, at which point the parameter is held at the final value. The falloff is further parameterized by the decay_steps parameter, which determines over how many steps the parameter decays by the specified decay rate (e.g., if this is set to 100 and the decay rate is 0.9, the it will take 100 steps (after transition begin) for the parameter to reach 0.9initial_value, and another 100 steps to reach 0.81initial_value, and so forth). The node can also operate in "staircase" mode, where the transition is not smooth but is constant for each decay_steps steps, and then changes abruptly by the given decay rate. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
init_value
Initial parameter value. This is the value at the beginning of the schedule. The parameter is held at this value until the current step count reaches the value set via decay_begin.- verbose name: Initial Value
- default value: 1.0
- port type: FloatPort
- value type: float
-
final_value
Final parameter value. Once the schedule reaches this value, it will remain at this value for the remainder of the optimization process. (if the decay rate is < 1, this is effectively a lower bound on the parameter value, and if the decay rate is > 1, this is an upper bound)- verbose name: Final Value
- default value: 0.0
- port type: FloatPort
- value type: float
-
decay_rate
Decay rate. The parameter value decays by this factor for every decay_steps. This can be between 0 and 1 for a regular decay schedule, or greater than 1 for an exponential growth schedule.- verbose name: Decay (Or Growth) Rate
- default value: 0.99
- port type: FloatPort
- value type: float
-
decay_begin
Step count at which to begin the transition from the initial value to the final value. The parameter is held at the initial value until this step count is reached.- verbose name: Decay Begin
- default value: 0
- port type: IntPort
- value type: int (can be None)
-
decay_steps
The number of steps over which the parameter decays by decay_rate. The basic formula is value = initial_value * decay_rate ^ (count_since_decay_begin / decay_steps), followed by clipping according to the final_value.- verbose name: Decay Steps
- default value: 1
- port type: IntPort
- value type: int (can be None)
-
staircase
If True, the parameter value is decayed in a staircase fashion, i.e ., the parameter changed by exactly decay_rate every decay_steps steps. If False, the parameter value is decayed in a continuous fashion according to the formula given in the docs for decay_steps.- verbose name: Staircase
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
FromageStep
The Fromage (Frobenius Matched Gradient Descent) optimizer step.
Based on Bernstein et al, 2020, this optimizer requires no or little learning rate tuning or scheduling, and can work across a range of neural network topologies, including transformers and GANs, with the same setting. A minimal degree of adaptation, such as dividing by 10 when the loss plateaus, can be helpful, however. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate
Learning rate.- verbose name: Learning Rate
- default value: 0.01
- port type: FloatPort
- value type: float (can be None)
-
min_norm
Minimum gradient norm. This can be used to avoid dividing by zero when rescaling; small gradients are rescaled to at least this value.- verbose name: Min Norm
- default value: 1e-06
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
GatedRecurrentUnitLayer
A gated recurrent unit (GRU) recurrent core, based on Chung et al.
(2014). This is a recurrent layer that conceptually retains internal network activations across successive time steps when the node is used to process time series data (see below for more details). In contrast to LSTMs, GRUs have a simpler structure and their output size (length of emitted feature axis) equals the number of hidden units. The node is highly competitive in practice in its ability to learn long-term dependencies using a gating mechanism that controls Like all recurrent layers, the node does not usually store and carry over these activations, but depends on either a special loop node (the RecurrentLoop node, see docs for more details) to step aross the time axis of some given data array, or requires that the user manually passes in and retrieves that carry state via the carry in/out port. Also like all recurrent nodes, this node will move the instance axis of the input first, optionally retain any axes listed in the parallel_axes port, and and flatten any other axes into a single feature axis at the end. When managing the carry state manually, the state for this node can be obtained via get_initial() or be constructed as either an array or packet (depending on the data that it is used with), where the first axis is of size 2, the last axis is of size units, and the middle axis is sized to be the product of the instance axes and other parallel axes of the data. Also as with most built-in layers, you can override the initializer for the input and hidden weights and/or the bias, which default to Lecun Normal and zero, respectively.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process at current time step.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
carry
Carried-over activations from previous time step.- verbose name: Carry
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
wi_init
Initializer for the input weights.- verbose name: Wi Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
wh_init
Initializer for the hidden weights.- verbose name: Wh Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
b_init
Initializer for the bias.- verbose name: B Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
wi_prior
Optional prior distribution for the input weights.- verbose name: Wi Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
wh_prior
Optional prior distribution for the hidden weights.- verbose name: Wh Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
b_prior
Optional prior distribution for the bias.- verbose name: B Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
units
Number of hidden units.- verbose name: Hidden Units
- default value: 256
- port type: IntPort
- value type: int (can be None)
-
wi_initializer
Choice of input weight initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Input Weight Initializer
- default value: lecun_normal
- port type: ComboPort
- value type: str (can be None)
-
wh_initializer
Choice of hidden weight initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Hidden Weight Initializer
- default value: lecun_normal
- port type: ComboPort
- value type: str (can be None)
-
b_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of weights.- verbose name: Layer Name
- default value: gru
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
parallel_axes
Optionally an axis or comma-separated list of axes that shall be processed in parallel by the layer, using the same learned weights. This is useful if you have multiple parallel processes that you wish to process separately but you assume that they are all ultimately governed by the same rules, you can learn a single kernel for all of them. Note that, for packet data, the instance axis (if present) and for plain-array data the first axis is always treated as a parallel axis, so you don't need to list it here.- verbose name: Parallel Axes
- default value:
- port type: ComboPort
- value type: str (can be None)
GradientClippingStep
Chainable step that clips incoming gradients based on their norm, ensuring that the gradient norm does not exceed a provided threshold.
This implements several variants, including elementwise clipping, layerwise clipping, global clipping (as in Pascanu et al, 2012), and adaptive clipping (relative to the prior parameter norm). The adaptive clipping follows Brock, Smith, De, and Simonyan (2021), "High-Performance Large-Scale Image Recognition Without Normalization.". Unlike the end-to-end steps (named after specific published algorithms), this is a chainable step (to be used with the ChainedStep node) that takes in a gradient and outputs a modified gradient, and it would usually be combined with other steps (like scaling by the learning rate) to yield a full optimizer step.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
variant
Variant of gradient clipping to apply. Per-weight means that each individual weight update is limited to +/- threshold, per-layer means that the gradient for each parameter vector/matrix is scaled such that its norm is at most threshold, and global means that the gradient for the entire parameter set is scaled such that its norm is at most threshold. Per-weight-relative means that each individual weight update is limited to +/- threshold times the norm of the corresponding parameter before the update (requires prior parameter values to be provided to the StepUpdate node).- verbose name: Variant
- default value: per-parameter
- port type: EnumPort
- value type: str (can be None)
-
threshold
Threshold value. A typical value is 1.0, but depending on the network and data, other values, may be explored.- verbose name: Threshold
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value to prevent clipping of zero-initialized parameters. Only used for per-weight-relative.- verbose name: Epsilon (If Per-Weight-Relative)
- default value: 0.001
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
GroupNorm
Apply group normalization to the given data.
This splits the feature axis into groups, and z-scores the data in each group separately, computing statistics across typically the feature axis wihin the group and any spatial axes. This can be a useful alternative to other norms such as batch norm, layer norm, or instance norm, when there are enough features to split into groups and ideally when those features have some groupwise structure (for example outputs of a grouped convolution). The normalization is applied per instance and no cross-instance statistics are taken. Like most normalizations, group normalization typically includes a learned scale and bias parameter, whose shape (and thus dimensionality) can be configured; these can also be optionally overridden with externally generated values. If packet data is given, this node ensures that the instance axes come first and the feature axes come last and are flattened into a single feature axis.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
scale_init
Initializer for the trainable scale.- verbose name: Scale Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
bias_init
Initializer for the trainable bias.- verbose name: Bias Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
scale_prior
Optional prior distribution for the scale.- verbose name: Scale Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
bias_prior
Optional prior distribution for the bias.- verbose name: Bias Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
num_groups
Number of feature groups to use for the normalization. If this is set to 1, this becomes nearly equivalent to layer normalization. If this is set to the number of features, this becomes equivalent to instance normalization.- verbose name: Number Of Feature Groups
- default value: 32
- port type: IntPort
- value type: int (can be None)
-
epsilon
Small value to add to the variance to avoid division by zero.- verbose name: Epsilon
- default value: 1e-05
- port type: FloatPort
- value type: float (can be None)
-
learnable_scale
Whether to learn a trainable scale parameter. Normalizations typically include such a parameter in order to drive the subsequent activation function in a regime that is desirable for downstream computations (e.g., saturating or linear). Note the shape (and thus dimensionality) of the learned parameter is governed by the param_axes (learn scale/bias across axes) parameter.- verbose name: Learnable Scale
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
learnable_bias
Whether to learn a trainable bias parameter. See the learnable scale for more details.- verbose name: Learnable Bias
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
data_format
Format of the input data. This is only respected when working with plain arrays and is ignored for packet data, which always normalizes the data to 'channels_last' layout. If 'channels_last', the data is assumed to be in the format ({batch}, ..., channels). If 'channels_first', the data is assumed to be in the format ({batch}, channels, ...).- verbose name: Array Data Format
- default value: auto
- port type: EnumPort
- value type: str (can be None)
-
scale_initializer
Choice of scale initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Scale Initializer
- default value: ones
- port type: ComboPort
- value type: str (can be None)
-
bias_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of the trainable parameters.- verbose name: Layer Name
- default value: groupnorm
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
IdentityInitializer
An initializer that initializes the array to an identity matrix or stacks thereof (where the last two dimensions are the identity matrix).
The array must have at least two dimensions.
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
scale
Scale multiplier to apply to the identity matrix.- verbose name: Scale
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
InstanceNorm
Apply instance normalization to the given data.
This z-scores data across the spatial (non-instance/feature) dimensions of a given instance only. This is like batch norm without going over the batch dimension, and is suitable to data with large spatial dimensions but possibly small batch sizes; see also layer and group norm for variants that offer more control. Like most normalizations, instance normalization typically includes a learned scale and bias parameter per feature. These can also be optionally overridden with externally generated values. If packet data is given, this node ensures that the instance axes come first and the feature axes come last.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
scale_init
Initializer for the trainable scale.- verbose name: Scale Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
bias_init
Initializer for the trainable bias.- verbose name: Bias Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
scale_prior
Optional prior distribution for the scale.- verbose name: Scale Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
bias_prior
Optional prior distribution for the bias.- verbose name: Bias Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
epsilon
Small value to add to the variance to avoid division by zero.- verbose name: Epsilon
- default value: 1e-05
- port type: FloatPort
- value type: float (can be None)
-
learnable_scale
Whether to learn a trainable scale parameter. Normalizations typically include such a parameter in order to drive the subsequent activation function in a regime that is desirable for downstream computations (e.g., saturating or linear). Note the shape (and thus dimensionality) of the learned parameter is governed by the param_axes (learn scale/bias across axes) parameter.- verbose name: Learnable Scale
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
learnable_bias
Whether to learn a trainable bias parameter. See the learnable scale for more details.- verbose name: Learnable Bias
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
data_format
Format of the input data. This is only respected when working with plain arrays and is ignored for packet data, which always normalizes the data to 'channels_last' layout. If 'channels_last', the data is assumed to be in the format ({batch}, ..., channels). If 'channels_first', the data is assumed to be in the format ({batch}, channels, ...).- verbose name: Array Data Format
- default value: auto
- port type: EnumPort
- value type: str (can be None)
-
scale_initializer
Choice of scale initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Scale Initializer
- default value: ones
- port type: ComboPort
- value type: str (can be None)
-
bias_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
fast_variance
If True, use a faster but less accurate variance calculation.- verbose name: Fast Variance
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
layername
Name of the layer. Used for naming of the trainable parameters.- verbose name: Layer Name
- default value: instancenorm
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
LAMBStep
The LAMB optimizer step.
Based on You et al, 2019, LAMB is an optimizer that combines the versatility of Adam with ability to operate on both small and very large batch sizes (e.g., 16k examples) using the strategy employed by LARS, which is a precursor to LAMB. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port. The weight decay can be used in conjunction with a mask data structure that has the same nested structure as the weights being optimized, but which contains booleans indicating which weights should be decayed.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
weight_decay_mask
Mask structure for the weight decay.- verbose name: Weight Decay Mask
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.999
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling.- verbose name: Epsilon
- default value: 1e-06
- port type: FloatPort
- value type: float (can be None)
-
epsilon_inroot
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling. A case where this is needed is when differentiating the optimizer itself, eg for bilevel optimization.- verbose name: Epsilon (Inside Root)
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
weight_decay
Strength of the weight decay. This is multiplied by the learning rate as in e.g., PyTorch and Optax, but differs from the paper, where it is only multiplied by the schedule multiplier but not the base learning rate.- verbose name: Optional Weight Decay
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
LARSStep
The LARS optimizer step.
Based on You et al, 2017, LARS is a layer-wise adaptive optimizer that enables the use of very large batch sizes (e.g., 16k examples) with SGD. This is a precursor to the more recent LAMB optimizer. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port. The weight decay can be used in conjunction with a mask data structure that has the same nested structure as the weights being optimized, but which contains booleans indicating which weights should be decayed.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
weight_decay_mask
Mask structure for the weight decay.- verbose name: Weight Decay Mask
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
trust_ratio_mask
Mask structure for the where to apply the trust ratio step.- verbose name: Trust Ratio Mask
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
weight_decay
Strength of the weight decay. This is multiplied by the learning rate as in e.g., PyTorch and Optax, but differs from the paper, where it is only multiplied by the schedule multiplier but not the base learning rate.- verbose name: Weight Decay
- default value: 0.0001
- port type: FloatPort
- value type: float (can be None)
-
trust_coefficient
Multiplier for the trust ratio.- verbose name: Trust Coefficient
- default value: 0.001
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Optional additive constant in the trust ratio denominator.- verbose name: Epsilon
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
momentum
Optional exponential decay rate for momentum.- verbose name: Optional Momentum
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
nesterov
Whether to use Nesterov acceleration.- verbose name: Use Nesterov Acceleration
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
LayerNorm
Apply layer normalization to the given data.
This z-scores data across the spatial and typically also the feature dimensions, but separately per instance (train or test example). Layer normalization can be the best choice when dealing with data that has a small batch size, or when used inside an RNN (both cases rendering the batch norm potentially inapplicable) and/or when no additional large spatial axes are present (rendering the instance norm inapplicable). Also consider the group norm for a variant that can partition the norm across feature groups and the RMS norm for a variant that does not affect the mean of the data and is in that sense less computationally costly. Like most normalizations, layer normalization typically includes a learned scale and bias parameter, whose shape (and thus dimensionality) can be configured, and which in case of the layer norm varies between different NN suites. These can also be optionally overridden with externally generated values. If packet data is given, this node ensures that the instance axes come first and the feature axes come last.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
scale_init
Initializer for the trainable scale.- verbose name: Scale Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
bias_init
Initializer for the trainable bias.- verbose name: Bias Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
scale_prior
Optional prior distribution for the scale.- verbose name: Scale Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
bias_prior
Optional prior distribution for the bias.- verbose name: Bias Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
axes
"Optional comma-separated list of axis names or indices over which to accumulate the normalization statistics. If unspecified, the statistics will be accumulated over all except the instance axes. This parameter is not limited to the predefined choices.- verbose name: Accumulate Across Axes
- default value: (non-instance)
- port type: ComboPort
- value type: str (can be None)
-
param_axes
List of axis names/indices across which to learn separate per-element scale and bias parameters. Since the layer norm as originally introduced does not discuss additional spatial (non-feature) axes, different NN libraries use different conventions for this parameter. Haiku and Sonnet use the feature axis (or last axis) by default, meaning that each feature is post-scaled independently as in the batch norm, but some other ML set this to the same as the axes parameter, which causes a separate scale/bias to be learned also across all entries of the spatial axes. Like axis, this parameter is not limited to the predefined choices.- verbose name: Learn Scale/bias Across Axes
- default value: feature
- port type: ComboPort
- value type: str (can be None)
-
epsilon
Small value to add to the variance to avoid division by zero.- verbose name: Epsilon
- default value: 1e-05
- port type: FloatPort
- value type: float (can be None)
-
learnable_scale
Whether to learn a trainable scale parameter. Normalizations typically include such a parameter in order to drive the subsequent activation function in a regime that is desirable for downstream computations (e.g., saturating or linear). Note the shape (and thus dimensionality) of the learned parameter is governed by the param_axes (learn scale/bias across axes) parameter.- verbose name: Learnable Scale
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
learnable_bias
Whether to learn a trainable bias parameter. See the learnable scale for more details.- verbose name: Learnable Bias
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
scale_initializer
Choice of scale initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Scale Initializer
- default value: ones
- port type: ComboPort
- value type: str (can be None)
-
bias_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
fast_variance
If True, use a faster but less accurate variance calculation.- verbose name: Fast Variance
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
layername
Name of the layer. Used for naming of the trainable parameters.- verbose name: Layer Name
- default value: layernorm
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
LinearOneCycleSchedule
A one-cycle linear ramp up/down parameter schedule.
This schedule is a linear ramp up to a peak value, followed by a linear ramp down to the initial value, and finally a linear ramp down to the final value that is held until the end of the schedule (transition_steps). The parameter is held at the final value after that. Only the peak value is specified directly, while the initial value (aka base value) is given as ratio to the peak value, as is the final value. Also, the upslope and downslope durations are given as fractions of the total transition_steps. The fraction of the remaining final slope is the remainder of the transition_steps after the upslope and downslope durations have been subtracted. This schedule is inspired by Smith and Topin's 2018 paper, "Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates" (see URL). Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
peak_value
Parameter value at peak. This is the maximum value that the parameter will attain over the course of the schedule.- verbose name: Peak Value
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
transition_steps
Step count at which to end the transition from the initial value to the final value. The parameter is held at the final value after this step count is reached. This is the duration of the full scaling cycle.- verbose name: Transition Steps
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
peak_base_ratio
Ratio of the peak to the base value. This is the ratio of the peak value to the initial value, and the value that will be attained after the downslope. Note that there's a final slope that reduces the parameter to the final minimum that is usually considerably lower than the base.- verbose name: Peak Base Ratio
- default value: 25
- port type: FloatPort
- value type: float (can be None)
-
peak_final_ratio
Ratio of the peak to the final value at the end of the cycle. The parameter will be held at the final value after transition_steps have been reached.- verbose name: Peak Final Ratio
- default value: 10000.0
- port type: FloatPort
- value type: float (can be None)
-
upslope_fraction
Fraction of the transition_steps that will be used for the upslope. That is, the peak is reached after transition_steps * upslope_fraction steps.- verbose name: Upslope Step Fraction
- default value: 0.3
- port type: FloatPort
- value type: float (can be None)
-
downslope_fraction
Fraction of the transition_steps that will be used for the downslope back to the initial value. After this, there is a final slope going to the final value that continues over transition_steps * (1 - upslope_fraction - downslope_fraction) steps. Note this is parameterized slightly less confusingly than the underlying optax linear_onecycle_schedule function.- verbose name: Downslope Step Fraction
- default value: 0.55
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
LinearSchedule
A linear parameter schedule.
This schedule holds the parameter at the initial value until the current step count reaches the value set via transition_begin, and then uses linear interpolation to the final value over a number of steps set via transition_steps; after that, the parameter is held at the final value. The interpolation polynomial is evaluated over the rang 0.0 (at the beginning) to 1.0 (at the end). This is a special case of the "Polynomial Schedule" node. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
init_value
Initial parameter value. This is the value at the beginning of the schedule. The parameter is held at this value until the current step count reaches the value set via transition_begin.- verbose name: Initial Value
- default value: 1.0
- port type: FloatPort
- value type: float
-
final_value
Final parameter value. This is the value at the end of the schedule.- verbose name: Final Value
- default value: 0.0
- port type: FloatPort
- value type: float
-
transition_begin
Step count at which to begin the transition from the initial value to the final value. The parameter is held at the initial value until this step count is reached.- verbose name: Transition Begin
- default value: 0
- port type: IntPort
- value type: int (can be None)
-
transition_steps
Step count at which to end the transition from the initial value to the final value. The parameter is held at the final value after this step count is reached.- verbose name: Transition Steps
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
LongShortTermMemoryLayer
A long short-term memory (LSTM) recurrent layer, based on Hochreiter and Schmidhuber (1997).
This is a recurrent layer that conceptually retains internal network activations across successive time steps when the node is used to process time series data (see below for more details). In contrast to earlier recurrent cores, LSTMs have an improved ability to learn long-term dependencies using a gating mechanism that can selectively retain or forget past information. This implementation follows Zaremba et al. 2015, which is a minor modification that reduces forgetting during initial training. Like all recurrent layers, the node does not usually store and carry over these activations, but depends on either a special loop node (the RecurrentLoop node, see docs for more details) to step aross the time axis of some given data array, or requires that the user manually passes in and retrieves that carry state via the carry in/out port. Also like all recurrent nodes, this node will move the instance axis of the input first, optionally retain any axes listed in the parallel_axes port, and and flatten any other axes into a single feature axis at the end. When managing the carry state manually, the state for this node can be obtained via get_initial() or be constructed as either an array or packet (depending on the data that it is used with), where the first axis is of size 2, the last axis is of size units, and the middle axis is sized to be the product of the instance axes and other parallel axes of the data.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process at current time step.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
carry
Carried-over activations from previous time step.- verbose name: Carry
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
units
Number of hidden units.- verbose name: Hidden Units
- default value: 256
- port type: IntPort
- value type: int (can be None)
-
layername
Name of the layer. Used for naming of weights.- verbose name: Layer Name
- default value: lstm
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
parallel_axes
Optionally an axis or comma-separated list of axes that shall be processed in parallel by the layer, using the same learned weights. This is useful if you have multiple parallel processes that you wish to process separately but you assume that they are all ultimately governed by the same rules, you can learn a single kernel for all of them. Note that, for packet data, the instance axis (if present) and for plain-array data the first axis is always treated as a parallel axis, so you don't need to list it here.- verbose name: Parallel Axes
- default value:
- port type: ComboPort
- value type: str (can be None)
MixupAugmentation
Interpolate between training exemplars, including class labels.
This data augmentation interpolates linearly between pairs of training exemplars, where the interpolation parameter is drawn from a Beta distribution with the given mixup parameter. This is a general-purpose augmentation that can be used on any dataset and domain. Warning: if your data has mor than 2 classes, the labels must be one-hot encoded. Like most augmentation nodes, this node does not by itself amplify the amount of data, which therefore has to be done beforehand using, for example, the RepeatAlongAxis node. Note that, unless you shuffle the data immediately following RepeatAlongAxis, the shuffle option in this node must remain enabled, even if your mini-batches prior to RepeatAlongAxis are already shuffled. When shuffling is enabled, you need to wire in a random seed (for example using the DrawRandomSeed node, see docs for more info) to ensure reproducibility.
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: Packet (can be None)
- data direction: INOUT
-
seed
Random seed for deterministic results.- verbose name: Seed
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
is_training
Whether the node is used in training mode.- verbose name: Is Training
- default value: None
- port type: DataPort
- value type: bool (can be None)
- data direction: IN
-
alpha
Mixup parameter. This parameter controls how far generated samples may deviate from the original data; the blend value is drawn from a Beta distribution with parameter (alpha, alpha). A value of 0 means no deviation from training exemplars, small positive values between 0.2 and 0.4 have been shown to work well in practice, and large positive values (up to infinity) can work on xsome datasets but generally tend to lead to underfitting.- verbose name: Mixup Parameter
- default value: 0.2
- port type: FloatPort
- value type: float (can be None)
-
shuffle
Whether to shuffle the data before applying the augmentation. This can be disabled if the data has already been shuffled. Note, however, that the common pattern of duplicating shuffled data and then augmenting via this node does NOT qualify as a proper shuffle since the MixUp operation acts on consecutive pairs of data (wrapped around at the end). If shuffling is enabled, a random seed must be provided.- verbose name: Shuffle
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
bypass
Whether to bypass the augmentation and pass the input data through unchanged.- verbose name: Bypass
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
MomentumStep
Chainable step that adds momentum based on one of several formulations, including classic momentum, Nesterov acceleration, and exponential moving average.
This can improve the convergence behavior of the optimizer, including the convergence rate (esp. when nesterov acceleration is used) and the ability to escape saddle points.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
decay
Exponential decay rate for past values.- verbose name: Decay
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
type
Type of momentum to use. Momentum is classic momentum, Nesterov is Nesterov acceleration, and EMA is exponential moving average.- verbose name: Type
- default value: momentum
- port type: EnumPort
- value type: str (can be None)
-
mu_precision
Numeric precision for the first-order accumulator. Keep resolves to the precision of the inputs.- verbose name: Mu Precision
- default value: keep
- port type: EnumPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
MultiHeadAttentionLayer
A multi-head attention layer.
This layer enables information retrieval using an array of query patterns that are matched against a memory of key patterns and their associated value patterns, using a "soft" (0 to 1) matching score. This can be interpreted as a content-based memory lookup, or as a "soft" dictionary lookup where keys and values are patterns, and where each query pattern is matched against all keys to retrieve (a blend of) the respectively matching values. What represents a key, value, or query in the input data is learned by the layer in the form of learnable projections of the input data into a key, value, or query space. A triplet of these projections is called an "attention head", and the layer is usually used with multiple heads (e.g., 8-32). Due to the learnable nature, all parts of the attention process are ultimately neurally driven and thus learned, including the input data to the layer. Attention is the core building block of the Transformer architecture where it is combined with a few other layers (LayerNorm and Dense) and then repeated, but is not specific to it. The layer can either be used with three input packets (or arrays) wired into query, key and value, respectively, or with a single packet/array wired into just the query port, which is then used for all three inputs (this is called self-attention). All three inputs are expected to have an axis along which the data is interpreted as a sequence of items across which attention acts (e.g., time for temporal data or space for image data), and another axis along which the data is interpreted as patterns. These axes generally need not have the same length among the three inputs, although key and value need to have the same sequence length. The inputs may have additional axes (e.g., an instance axis holding a batch of data), and the operation proceeds independently in parallel across these axes; however, the extra axes must be the same for all inputs. It is possible to omit either the key or value (in which case the other of the two will be used for both key and value); if key and value are omitted, the query data is used for all three (self-attention). Self-attention can be used as a drop-in replacement for either recurrent or convolutional layers for processing fixed-length or variable-length sequences. For data with detailed time structure, often the input data is augmented with synthetic features representing a "positional" encoding, e.g., sine/cosine waves of different frequencies, to enable the attention to learn to attend to specific positions in the sequence. The basic operation of the layer is, separately for each attention head, as follows: for each entry of the query data (along the sequence axis), the pattern of the k'th query along the pattern axis is passed through a linear layer (a learnable projection corresponding to the current query head) to produce a pattern in the "query space" (of dimensionality key_dim). Likewise, for each entry in the key data along its sequence axis, the pattern of the n'th key (along the respective pattern axis) is passed through another linear projection (also learnable) to produce a pattern in the "key space" (also of dimensionality key_dim). The query and key patterns are then matched against each other using a dot product (i.e., the sum of elementwise products), which yields a sort of matching score. After possibly some normalization, the score is then passed through a sigmoid (softmax) to produce a 0 to 1 "weight" for that query pattern compared to each of the key patterns. The weight is then applied to the corresponding n'th value pattern, and the lookup result for the k'th query is simply the weighted sum of all the value patterns. This is an adaptive selection, which can act across long distances in the data (namely across the entire sequence length of the key/query data). The operation is performed in parallel for all attention heads (using their separate projections) and the retrieved value patterns are stacked. The resulting stacked patterns are then passed through a final linear projection (again learnable) to reduce dimensionality to the desired output dimensionality.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
queries
Query data.- verbose name: Queries
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: IN
-
keys
Key data.- verbose name: Keys
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: IN
-
values
Value data.- verbose name: Values
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: IN
-
mask
Optional mask data. If specified, this must amount to a boolean array with a shape that's compatible with the attention weights.- verbose name: Mask
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: IN
-
data
Retrieved data.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: OUT
-
w_init
Initializer for the weights.- verbose name: W Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
w_prior
Optional prior distribution for the weights.- verbose name: W Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
num_heads
Number of parallel heads. Each head represents a different learnable projection of the input data into a projected query, key, and value space, and the operation is usually used with multiple parallel heads whose outputs are stacked before being reduced to the final output dimensionality by another learnable projection.- verbose name: Number Of Attention Heads
- default value: 16
- port type: IntPort
- value type: int (can be None)
-
sequence_axis
The axis across which to attend. The lookup acts along this axis in the key and value data and treats each element along it as a different item that can be retrieved ("attended to"). The query data is also interpreted as a sequence of multiple queries along this axis, and therefore the output will inherit this axis from the query. Like pattern_axis, this is not limited to the predefined choices (see pattern_axis for an example).- verbose name: Sequence Axis
- default value: time
- port type: ComboPort
- value type: str (can be None)
-
pattern_axis
Axis along which data is interpreted as query, key, or value patterns. This can be a single axis or comma-separated list of axes in the query, key, and value data along which data will be treated as patterns that are mapped into the key/query or value space. This can also by the index of an axis. This parameter is not limited to the predefined choices; for example, you can refer to an axis by its custom label, e.g., feature.mylabel.- verbose name: Pattern Axis
- default value: feature
- port type: ComboPort
- value type: str (can be None)
-
mask_axes
The trailing axes of the mask, if a mask is given (otherwise ignored). This should correspond to (optionally) an axis indexing the heads (whose length is either 1, or the number of heads), followed by an axis of the same length as the query's sequence axis, and lastly an axis of the same length as the key's sequence axis. The special value singletonaxis means that a dummy axis of length 1 will be inserted at the respective position (relative to the end of the mask array).- verbose name: Trailing Mask Axes
- default value: singletonaxis, time, time
- port type: ComboPort
- value type: str (can be None)
-
key_dim
Size of the key and query vectors. The key and query patterns are each projected into a space of this dimensionality before the matching is performed between them.- verbose name: Key/query Dimensionality
- default value: 32
- port type: IntPort
- value type: int (can be None)
-
value_dim
Size of the value vectors. The value pattern retrived by each attention head is projected into a space of this dimensionality.- verbose name: Value Dimensionality
- default value: 32
- port type: IntPort
- value type: int (can be None)
-
output_dim
Size of the output vectors. The stacked outputs produced by the attention heads are projected into a space of this dimensionality and represent the final output of the node. Note that the output axis will always be of type feature.- verbose name: Output Dimensionality
- default value: None
- port type: IntPort
- value type: int (can be None)
-
w_initializer
Choice of weight initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Weight Initializer
- default value: lecun_normal
- port type: ComboPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of weights.- verbose name: Layer Name
- default value: attention
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
MultiplicativeNoiseAugmentation
Scale the given data by noise drawn from some provided distribution.
The node can optionally use the same scale across all elements along an axis, or draw an independent scale. It is recommended for the distribution to be centered around 1, this is not enforced. A good starting point is a normal or truncated normal distribution with mean of 1 and standard deviation of 0.1, yielding a scale range of 0.9 to 1.1, but the actual range should be experimented with to find a good regime that reflects the variability due to the sensors used. Like most augmentation nodes, this node does not by itself amplify the amount of data, which therefore has to be done beforehand using, for example, the RepeatAlongAxis node. Also As with most augmentation nodes, you need to wire in a random seed (for example using the DrawRandomSeed node, see docs for more info) to ensure reproducibility. You also need to wire a distribution to the dist input to specify the distribution of interest (e.g., NormalDistribution).
Version 1.0.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: Packet (can be None)
- data direction: INOUT
-
seed
Random seed for deterministic results.- verbose name: Seed
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
dist
Distribution to use.- verbose name: Dist
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
is_training
Whether the node is used in training mode.- verbose name: Is Training
- default value: None
- port type: DataPort
- value type: bool (can be None)
- data direction: IN
-
separate_across
A comma-separated list of axis name(s) over which to create individual scale values. For example, 'space' will create a random scale value drawn from the input distribution per each channel. If given 'space, feature', random scale values will be created for each space by feature elements (e.g. total values = space elements * feature elements). It is recommended to always include the instance axis, since each instance in a mini-batch should have a different random draw of scales.- verbose name: Separate Scales Across Axes
- default value: space,instance
- port type: ComboPort
- value type: str (can be None)
-
bypass
Whether to bypass the augmentation and pass the input data through unchanged.- verbose name: Bypass
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NetDefine
Define a neural network module (subnet).
A subnet is a portion of a neural network that contains implicit parameters that are optimized during training. A subnet may also contain instances of other subnets. The easiest way to use a subnet is to wire its "this" output into a DeepModel node, which can be used like any other machine-learning node as part of a data pipeline (e.g., for classification or regression). The normally implicit parameters of a subnet can also be managed explicitly and by hand by invoking the subnet's initialize function (by wiring it into the Call node) to obtain an initial set of parameters, and then passing the parameters into the subnet's forward function (again by wiring it into the Call node) to compute the outputs of the subnet. Typically one would then apply the the Calculate Gradient node to the forward function to obtain a function that yields gradients, invoke the resulting gradient function given some data and parameters, and then use the resulting gradients to update the parameters (i.e., performing gradient descent). This node is used by wiring a subgraph into its graph port. The subgraph must contain one or more Placeholder nodes for the network inputs, and the network signature must name all the placeholders in some order, which is in the "Network [signature]" (or network__signature in code) port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
network
Graph that defines the network.- verbose name: Network
- default value: None
- port type: GraphPort
- value type: Graph
-
network__signature
Optional argument names of the network being defined. Your network is specified as a graph with one placeholder node for each argument name specified here (whose slotname must match that argument name). Those placeholders then feed into any number of neural network or other mathematical operations, and the final output of your network is wired into the "network" input of the NetDefine node. This is analogous to the Function Declaration node that defines a function of some arguments in a similar manner. Note that in graphical UIs, the edge that goes into the "network" input will be drawn in dotted style to indicate that this is not normal forward data flow, but that a graph (here your network definition) is being passed verbatim to the NetDefine node.- verbose name: Network [Inputs]
- default value: (input)
- port type: Port
- value type: object (can be None)
-
netname
Name of the network module. This name is inherited by the parameters of the network and can be used to group parameters logically. When the network is materialized (instantiated) multiple times, the name will be suffixed with a number.- verbose name: Name
- default value: mynetwork
- port type: StringPort
- value type: str (can be None)
-
desc
Description of the function. The first sentence is taken as the executive summary and should not exceed 60 characters). The next paragraph is the essential description, and any following paragraphs are considered additional description text. This should not list the arguments, but can give a high-level overview of what the network can accept and what it does. It is possible to use limited amounts of HTML formatting, for instance for emphasis.- verbose name: Description
- default value: None
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NetForward
Define a neural network module (subnet).
A subnet is a portion of a neural network that contains implicit parameters that are optimized during training. A subnet may also contain instances of other subnets. The easiest way to use a subnet is to wire its "this" output into a DeepModel node, which can be used like any other machine-learning node as part of a data pipeline (e.g., for classification or regression). The normally implicit parameters of a subnet can also be managed explicitly and by hand by invoking the subnet's initialize function (by wiring it into the Call node) to obtain an initial set of parameters, and then passing the parameters into the subnet's forward function (again by wiring it into the Call node) to compute the outputs of the subnet. Typically one would then apply the the Calculate Gradient node to the forward function to obtain a function that yields gradients, invoke the resulting gradient function given some data and parameters, and then use the resulting gradients to update the parameters (i.e., performing gradient descent). This node is used by wiring a subgraph into its graph port. The subgraph must contain one or more Placeholder nodes for the network inputs, and the network signature must name all the placeholders in some order, which is in the "Network [signature]" (or network__signature in code) port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
transformed
Network definition to use.- verbose name: Transformed
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
seed
Optional random number seed.- verbose name: Seed
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
weights
Network parameters.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Optional network state (in/out).- verbose name: State
- default value: None
- port type: DataPort
- value type: dict (can be None)
- data direction: INOUT
-
output
Output of the network.- verbose name: Output
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
arg1
Argument 1.- verbose name: Arg1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg2
Argument 2.- verbose name: Arg2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg3
Argument 3.- verbose name: Arg3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg4
Argument 4.- verbose name: Arg4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg5
Argument 5.- verbose name: Arg5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg6
Argument 6.- verbose name: Arg6
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg7
Argument 7.- verbose name: Arg7
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg8
Argument 8.- verbose name: Arg8
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg9
Argument 9.- verbose name: Arg9
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
argN
Additional arguments.. .- verbose name: Argn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
name1
Name 1.- verbose name: Name1
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val1
Value 1.- verbose name: Val1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name2
Name 2.- verbose name: Name2
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val2
Value 2.- verbose name: Val2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name3
Name 3.- verbose name: Name3
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val3
Value 3.- verbose name: Val3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name4
Name 4.- verbose name: Name4
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val4
Value 4.- verbose name: Val4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name5
Name 5.- verbose name: Name5
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val5
Value 5.- verbose name: Val5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name6
Name 6.- verbose name: Name6
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val6
Value 6.- verbose name: Val6
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name7
Name 7.- verbose name: Name7
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val7
Value 7.- verbose name: Val7
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name8
Name 8.- verbose name: Name8
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val8
Value 8.- verbose name: Val8
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name9
Name 9.- verbose name: Name9
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val9
Value 9.- verbose name: Val9
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
nameN
Additional argument names.. .- verbose name: Namen
- default value: None
- port type: ListPort
- value type: list (can be None)
-
valN
Additional named argument values.. .- verbose name: Valn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
arg0
Argument 0.- verbose name: Arg0
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name0
Name 0.- verbose name: Name0
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val0
Value 0.- verbose name: Val0
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NetInitialize
Get initial weights and optionally state for a network that has been transformed to functional form.
This node, along with "Net Forward Pass" (NetForward) offers full control over the gradient descent process in a given neural net. Tip: for simple use cases you will not need to use this node directly but may instead use a higher-level node such as "Deep Model" (DeepModel), which behaves like a regular self-contained machine-learning node in NeuroPype. This node needs a NetTransform node wired into the "transformed" input (see documentation), optionally a random key, and one or more positional and/or named inputs for the computational graph that was wired into the NetTransform node (the positional inputs are those that are listed in the signature that goes with the NetTransform's graph port). The initial weights are typically a dictionary of named weights (naming depends the hierarchy of subnets within which the weights are contained).
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
transformed
Transformed network to use.- verbose name: Transformed
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
seed
Random seed to use.- verbose name: Seed
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
weights
Initialized parameters.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
state
Initialized state.- verbose name: State
- default value: None
- port type: DataPort
- value type: dict (can be None)
- data direction: OUT
-
arg1
Argument 1.- verbose name: Arg1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg2
Argument 2.- verbose name: Arg2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg3
Argument 3.- verbose name: Arg3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg4
Argument 4.- verbose name: Arg4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg5
Argument 5.- verbose name: Arg5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg6
Argument 6.- verbose name: Arg6
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg7
Argument 7.- verbose name: Arg7
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg8
Argument 8.- verbose name: Arg8
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg9
Argument 9.- verbose name: Arg9
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
argN
Additional arguments.. .- verbose name: Argn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
name1
Name 1.- verbose name: Name1
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val1
Value 1.- verbose name: Val1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name2
Name 2.- verbose name: Name2
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val2
Value 2.- verbose name: Val2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name3
Name 3.- verbose name: Name3
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val3
Value 3.- verbose name: Val3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name4
Name 4.- verbose name: Name4
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val4
Value 4.- verbose name: Val4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name5
Name 5.- verbose name: Name5
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val5
Value 5.- verbose name: Val5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name6
Name 6.- verbose name: Name6
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val6
Value 6.- verbose name: Val6
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name7
Name 7.- verbose name: Name7
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val7
Value 7.- verbose name: Val7
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name8
Name 8.- verbose name: Name8
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val8
Value 8.- verbose name: Val8
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name9
Name 9.- verbose name: Name9
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val9
Value 9.- verbose name: Val9
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
nameN
Additional argument names.. .- verbose name: Namen
- default value: None
- port type: ListPort
- value type: list (can be None)
-
valN
Additional named argument values.. .- verbose name: Valn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
arg0
Argument 0.- verbose name: Arg0
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name0
Name 0.- verbose name: Name0
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val0
Value 0.- verbose name: Val0
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NetMaterialize
Materialize a network module (subnet), which allows it to be wired into a larger computational graph along with input placeholders and other nodes (incl.
stateless math ops and other network nodes). Materialized subnets have implicit parameters (weights) that first need to be optimized before the network can be used. For this reason, a materialized subnet, along with its containing comptuational graph, is not directly callable with data, but instead it the computational graph needs to be wired into either a DeepModel or other high-level NN training node, OR it needs to be transformed into a functional form, which can be done by wiring the graph into a NetTransform node. Note that NetMaterialize also has a companion node NetShared, which allows you to use a materialized network in multiple places in a graph, and all of those places will share the same parameters.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
definition
Network definition to use.- verbose name: Definition
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
output
Output of the network.- verbose name: Output
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
arg1
Argument 1.- verbose name: Arg1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg2
Argument 2.- verbose name: Arg2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg3
Argument 3.- verbose name: Arg3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg4
Argument 4.- verbose name: Arg4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg5
Argument 5.- verbose name: Arg5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg6
Argument 6.- verbose name: Arg6
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg7
Argument 7.- verbose name: Arg7
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg8
Argument 8.- verbose name: Arg8
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg9
Argument 9.- verbose name: Arg9
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
argN
Additional arguments.. .- verbose name: Argn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
name1
Name 1.- verbose name: Name1
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val1
Value 1.- verbose name: Val1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name2
Name 2.- verbose name: Name2
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val2
Value 2.- verbose name: Val2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name3
Name 3.- verbose name: Name3
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val3
Value 3.- verbose name: Val3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name4
Name 4.- verbose name: Name4
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val4
Value 4.- verbose name: Val4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name5
Name 5.- verbose name: Name5
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val5
Value 5.- verbose name: Val5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name6
Name 6.- verbose name: Name6
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val6
Value 6.- verbose name: Val6
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name7
Name 7.- verbose name: Name7
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val7
Value 7.- verbose name: Val7
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name8
Name 8.- verbose name: Name8
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val8
Value 8.- verbose name: Val8
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name9
Name 9.- verbose name: Name9
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val9
Value 9.- verbose name: Val9
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
nameN
Additional argument names.. .- verbose name: Namen
- default value: None
- port type: ListPort
- value type: list (can be None)
-
valN
Additional named argument values.. .- verbose name: Valn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
arg0
Argument 0.- verbose name: Arg0
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name0
Name 0.- verbose name: Name0
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val0
Value 0.- verbose name: Val0
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NetShare
Reuse a materialized network in a computational graph.
This node allows you to use the same network in multiple places in a computational graph, sharing the same weights. To use this, you need to first materialize a network definition using the "Materialize Net" (NetMaterialize) node. You can then wire the "this" output of that node into the "materialized" input of this node. The node can be used in two equivalent styles: 1) have a main instantiation of the network that is used somewhere in your graph (in the form of a NetMaterialize node) and then have one or more secondary uses in the graph that use the NetShared node that reference the main copy, or 2) have a materialize node that is not directly embedded in a computation, but which merely serves as a node to which several NetShared nodes refer, which are then embedded in some computation.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
materialized
Materialized network to use.- verbose name: Materialized
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
output
Output of the network.- verbose name: Output
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
arg1
Argument 1.- verbose name: Arg1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg2
Argument 2.- verbose name: Arg2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg3
Argument 3.- verbose name: Arg3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg4
Argument 4.- verbose name: Arg4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg5
Argument 5.- verbose name: Arg5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg6
Argument 6.- verbose name: Arg6
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg7
Argument 7.- verbose name: Arg7
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg8
Argument 8.- verbose name: Arg8
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
arg9
Argument 9.- verbose name: Arg9
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
argN
Additional arguments.. .- verbose name: Argn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
name1
Name 1.- verbose name: Name1
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val1
Value 1.- verbose name: Val1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name2
Name 2.- verbose name: Name2
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val2
Value 2.- verbose name: Val2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name3
Name 3.- verbose name: Name3
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val3
Value 3.- verbose name: Val3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name4
Name 4.- verbose name: Name4
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val4
Value 4.- verbose name: Val4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name5
Name 5.- verbose name: Name5
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val5
Value 5.- verbose name: Val5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name6
Name 6.- verbose name: Name6
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val6
Value 6.- verbose name: Val6
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name7
Name 7.- verbose name: Name7
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val7
Value 7.- verbose name: Val7
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name8
Name 8.- verbose name: Name8
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val8
Value 8.- verbose name: Val8
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name9
Name 9.- verbose name: Name9
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val9
Value 9.- verbose name: Val9
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
nameN
Additional argument names.. .- verbose name: Namen
- default value: None
- port type: ListPort
- value type: list (can be None)
-
valN
Additional named argument values.. .- verbose name: Valn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
arg0
Argument 0.- verbose name: Arg0
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
name0
Name 0.- verbose name: Name0
- default value: None
- port type: StringPort
- value type: str (can be None)
-
val0
Value 0.- verbose name: Val0
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NetTransform
Transform a computational graph that involves neural net nodes into a functional form, with weights threaded out.
This node is for full manual control over NN optimization. The computational graph is your overall neural network forward pass, and starts with one or more placeholder nodes for the inputs, followed by some combination of neural net nodes and plain math ops as needed, and ends ultimately with the NetTransform node. Your graph can have positional inputs, which are Placeholder nodes whose names are listed in the desired order in the signature that goes with this node's graph port ("Graph [Signature]" in the UI or graph__signature in scripts). Additionally, you may have named (keyword-only) inputs, which are placeholders whose names are not listed in the signature; in this case your signature must look like "()" or "(myarg1, myarg2, )" for two positional arguments here named myarg1 and myarg2 (note in a script you can also pass this as a tuple as documented in GraphPort). The "this" output of this node can be wired into the NetInitialize and NetForward nodes to obtain the initial weights, and to apply those weights to some data. These two functions give full control over the gradient descent process, simply by applying the Calculate Gradient node to the NetForward function to obtain a function that yields gradients, invoking the resulting gradient function given some data and parameters, and then using the resulting gradients to update the parameters or parts thereof.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
graph
Computational graph to transform.- verbose name: Graph
- default value: None
- port type: GraphPort
- value type: Graph
-
graph__signature
Argument names of function to transform. The NetTransform node is used for manually performing gradient descent on a graph (representing an application of a neural network to some data and typically containing NN Layer nodes). The graph represents a function of some argument and therefore must contain at least one Placeholder node whose slotname must match the listed argument name. The remainder of the graph is an application of a network, either implemented in-place by chaining a series of NN layers after the placeholder, or by invoking a previously defined network using NetMaterialize and/or NetShare. The final output of your operation is taken to be the predictions of the network given the inputs. Note that in graphical UIs, the edge that goes into the "graph" input will be drawn in dotted style to indicate that this is not normal forward data flow, but that a graph (here your network application) is being passed verbatim to the NetTransform node.- verbose name: Graph [Signature]
- default value: (inputs)
- port type: Port
- value type: object (can be None)
-
prefer_packets
Prefer to use packets to represent parameter sets instead of dicts.- verbose name: Prefer Packets
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NetWeightArray
Define network weight array.
This node can only occur inside the network graph wired into a Define Net (NetDefine) node. The node is used to request a weight array whose value can be used as part of a neural network computation. The weight array is implicit in the containing subnet, and is optimized during training, as in the built-in NN nodes (e.g., convolution, etc). As with all implicit parameters, if you want to optimize them manually (i.e., using the Calculate Gradient node) rather than using one of the higher-level nodes such as "Deep Model" (DeepModel), you first need to transform your subnet (or rather a computational graph in which it is materialized) into functional style, which is done using the NetTransform node.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
w_init
Initializer for the weights.- verbose name: W Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
w_prior
Optional prior distribution for the weights.- verbose name: W Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
value
Value of the weights.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
name
Name of the weight array. Must be unique within the subnet where it is used.- verbose name: Name
- default value: weights
- port type: StringPort
- value type: str (can be None)
-
shape
Shape (dimensions) of the weight array.- verbose name: Shape
- default value: [1]
- port type: ListPort
- value type: list (can be None)
-
dtype
Data type of the weight array. Float32 is the easiest to use, but the 16-bit data types can be more efficient -- however, float16 has limited range, and can cause numeric problems, which typically manifest as the network failing to train. Some hardware supports bfloat16, which is as efficient as float16, but has a wider range and is therefore an easier drop-in replacement for float32 when it is supported.- verbose name: Data Type
- default value: float32
- port type: EnumPort
- value type: str (can be None)
-
initializer
Choice of initializer. This can either be one of the named initializers, or the value "custom", in which case the initializer must be wired into the the initializer port. For some initializers that take arguments, you can also specify these positionally as in "truncated_normal(1.0,0.0)" (note order of stddev, mean).- verbose name: Initializer
- default value: lecun_normal
- port type: ComboPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NoisySGDStep
The NoisySGD optimizer step.
Based on Neelakantan et al, 2014, noisy SGD is a variant of stochastic gradient descent (SGD) that adds additional Gaussian-distributed noise. This can help generalization in particularly deep networks. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
eta
Initial variance for the Gaussian noise added to gradients.- verbose name: Eta
- default value: 0.01
- port type: FloatPort
- value type: float (can be None)
-
gamma
A parameter controlling the annealing of noise over time, the variance decays according to (1+t)^-gamma.- verbose name: Gamma
- default value: 0.55
- port type: FloatPort
- value type: float (can be None)
-
seed
A seed for the pseudo-random number generation.- verbose name: Seed
- default value: 12345
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NormalInitializer
An initializer that draws initial weights from a Gaussian normal distribution with a given mean and standard deviation.
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
mean
Distribution mean.- verbose name: Mean
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
stddev
Distribution standard deviation.- verbose name: Stddev
- default value: 1
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NovoGradStep
The Novograd optimizer step.
Based on Ginsburg et al, 2019, Novograd is more robust to initial learning rate and weight initialization than other optimizers, and can for instance be used without learning-rate warm-up. The optimizer also works very well for large batch sizes, and has shown to be effective for up to 32k exemplars. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.25
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling.- verbose name: Epsilon
- default value: 1e-06
- port type: FloatPort
- value type: float (can be None)
-
epsilon_inroot
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling. A case where this is needed is when differentiating the optimizer itself, eg for bilevel optimization.- verbose name: Epsilon (Inside Root)
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
weight_decay
Strength of the weight decay. This is multiplied by the learning rate as in e.g., PyTorch and Optax, but differs from the paper, where it is only multiplied by the schedule multiplier but not the base learning rate.- verbose name: Weight Decay
- default value: 0.0001
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
NullStep
A no-op gradient update step that can be used to explicitly freeze (a subset of) weights, e.g
., in combination with the PartitionedStep node.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
OptimisticGDStep
The Optimistic gradient descent optimizer step.
Based on Mokhtari et al, 2019, this is an advanced optimizer that was originally proposed in the context of saddle-point problems, and has strong convergence for min-max games, where standard gradient descent can oscillate or diverge. Note that this optimizer can be used with schedulers for not only the learning rate but also the alpha and beta parameters, by wiring in the appropriate schedule nodes into the respective ports, without having to use a CustomStep node. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
alpha_schedule
Optional alpha schedule.- verbose name: Alpha Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
beta_schedule
Optional beta schedule.- verbose name: Beta Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
alpha
Alpha coefficient for generalized OGD.- verbose name: Alpha
- default value: 1
- port type: FloatPort
- value type: float (can be None)
-
beta
Beta coefficient for generalized OGD negative momentum.- verbose name: Beta
- default value: 1
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
OrthogonalInitializer
An initializer that generates a random matrix of orthogonal vectors.
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
scale
Scale factor.- verbose name: Scale
- default value: 1
- port type: FloatPort
- value type: float (can be None)
-
axis
Axis that corresponds to the output dimension of the tensor. The array will be row orthonormal along this axis (where -1 refers to the last dimension), unless the product of the remaining dimensions is larger along the axis, in which case it will be made column orthonormal along the axis.- verbose name: Axis
- default value: -1
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
PartitionedStep
Apply a set of steps to different labeled subsets of the parameters, using a separately provided labeling dictionary, yielding a composite update step.
Using this node involves grouping your parameters into different groups, each identified by a specific group label. First, you list your group labels in the label0 to label9 entries. Then, you wire different steps (e.g., Adam, SGD, nothing, etc) into the corresponding step0 to step9 ports to configure which step shall run on which group of parameters. And finally you need to describe what parameters of your model, which are assumed to live in a parameters dictionary, belong to which group. This is done by specifying a labeling dictionary that has the same form as your model parameters dictionary, but instead of the actual tensors it contains the group labels (see help for that setting for more details). The update step will then run the specific labeled steps on parameters with the corresponding labels. Note that, if you want to run the same step on a heterogeneous data structure (e.g., nested dictionary) of parameters, you do not need this node since all step nodes already work on nested data structures.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
labeling
A nested data structure (dictionary) of labels for individual parameters. This parameter can be used to assign different groups of parameters in your model different labels, by mirroring the data structure that holds your model parameter, but containing strings rather than the actual weight tensors. These labels are then used to select which of the wired-in steps to apply to which parameters. For example, if your model weights are stored in a dictionary with keys {'layer0', 'layer1', 'layer2', 'layer3'}, and you want to apply step0 to only the weights under 'layer1' and 'layer3', and step1 to the weights under 'layer0' and 'layer0', you might set the labeling port to {'layer0': 'a', 'layer1': 'b', 'layer2': 'a', 'layer3': 'b'}, and then set the label0 and label1 ports to 'b' and 'a' respectively. If your parameters dictionary contains a subtree and you want to assign the same label to all parameters in it, you can simply omit the subtree and instead of it include a string label in your labeling dict (i.e., the labeling can be a prefix tree, not necessarily a full tree). You can also wire the output of CreateDict into this port to generate the dictionary programmatically.- verbose name: Parameter Labeling Scheme
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
label1
Apply step1 to parameters with this label.- verbose name: Apply Step1 To Parameters With This Label
- default value:
- port type: StringPort
- value type: str (can be None)
-
step1
Step 1.- verbose name: Step1
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
label2
Apply step2 to parameters with this label.- verbose name: Apply Step2 To Parameters With This Label
- default value:
- port type: StringPort
- value type: str (can be None)
-
step2
Step 2.- verbose name: Step2
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
label3
Apply step3 to parameters with this label.- verbose name: Apply Step3 To Parameters With This Label
- default value:
- port type: StringPort
- value type: str (can be None)
-
step3
Step 3.- verbose name: Step3
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
label4
Apply step4 to parameters with this label.- verbose name: Apply Step4 To Parameters With This Label
- default value:
- port type: StringPort
- value type: str (can be None)
-
step4
Step 4.- verbose name: Step4
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
label5
Apply step5 to parameters with this label.- verbose name: Apply Step5 To Parameters With This Label
- default value:
- port type: StringPort
- value type: str (can be None)
-
step5
Step 5.- verbose name: Step5
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
label6
Apply step6 to parameters with this label.- verbose name: Apply Step6 To Parameters With This Label
- default value:
- port type: StringPort
- value type: str (can be None)
-
step6
Step 6.- verbose name: Step6
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
label7
Apply step7 to parameters with this label.- verbose name: Apply Step7 To Parameters With This Label
- default value:
- port type: StringPort
- value type: str (can be None)
-
step7
Step 7.- verbose name: Step7
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
label8
Apply step8 to parameters with this label.- verbose name: Apply Step8 To Parameters With This Label
- default value:
- port type: StringPort
- value type: str (can be None)
-
step8
Step 8.- verbose name: Step8
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
label9
Apply step9 to parameters with this label.- verbose name: Apply Step9 To Parameters With This Label
- default value:
- port type: StringPort
- value type: str (can be None)
-
labelN
Additional labels corresponding to additional steps.- verbose name: Additional Steps
- default value: []
- port type: ListPort
- value type: list (can be None)
-
step9
Step 9.- verbose name: Step9
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
stepN
Additional Steps.- verbose name: Stepn
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
label0
Apply step0 to parameters with this label.- verbose name: Apply Step0 To Parameters With This Label
- default value: first_label
- port type: StringPort
- value type: str (can be None)
-
step0
Step 0.- verbose name: Step0
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
PiecewiseConstantSchedule
A piecewise-constant parameter schedule.
This schedule starts with the initial value, and whenver one of the step boundaries is crossed, the parameter is multiplied by the respective scale factor, and any scale factors that came before it. As a result, at step k, the value is the initial value times the product of all scale factors whose associated step boundaries preceded step k. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
init_value
Initial parameter value. This is the value at the beginning of the schedule. The parameter is successively multiplied by scale factors when the respective step boundaries are crossed.- verbose name: Initial Value
- default value: 1.0
- port type: FloatPort
- value type: float
-
step_boundaries
Step boundaries at which to multiply the parameter by the respective scale factor.- verbose name: Step Boundaries
- default value: [100, 200]
- port type: ListPort
- value type: list (can be None)
-
scale_factors
Scale factors to multiply the parameter by when the respective step boundaries are crossed.- verbose name: Scale Factors
- default value: [0.9, 0.9]
- port type: ListPort
- value type: list (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
PiecewiseInterpolatedSchedule
A piecewise interpolated parameter schedule.
This schedule has a series of boundary steps and associated scale factors. The parameter starts with the initial value and whenever the step count reaches a boundary step, the parameter becomes multiplied by the respective scale factor times all prior scale factors. In between boundaries, the parameter is interpolated according to the given interpolation type. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
init_value
Initial parameter value. This is the value at the beginning of the schedule. The parameter interpolates from the prior value to the prior value times the first boundary scale factor by the time the first boundary step is reached, and so forth, where successively reached scale factors multiply on top of each other.- verbose name: Initial Value
- default value: 1.0
- port type: FloatPort
- value type: float
-
step_boundaries
Step boundaries associated with scale factors. When the step reaches the boundary, the parameter becomes the initial value times all scale factors whose boundaries have so far been reached.- verbose name: Step Boundaries
- default value: [100, 200]
- port type: ListPort
- value type: list (can be None)
-
scale_factors
Scale factors to multiply the parameter by when the respective step boundaries are reached.- verbose name: Scale Factors
- default value: [0.9, 0.9]
- port type: ListPort
- value type: list (can be None)
-
interpolation
Interpolation type. Cosine follows the shape of the cosine function from its peak to its successive trough.- verbose name: Interpolation
- default value: linear
- port type: EnumPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
PolynomialSchedule
A polynomial parameter schedule.
This schedule holds the parameter at the initial value until the current step count reaches the value set via transition_begin, and then uses polynomial interpolation to the final value over a number of steps set via transition_steps; after that, the parameter is held at the final value. The interpolation polynomial is evaluated over the rang 0.0 (at the beginning) to 1.0 (at the end). Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
init_value
Initial parameter value. This is the value at the beginning of the schedule. The parameter is held at this value until the current step count reaches the value set via transition_begin.- verbose name: Initial Value
- default value: 1.0
- port type: FloatPort
- value type: float
-
final_value
Final parameter value. This is the value at the end of the schedule.- verbose name: Final Value
- default value: 0.0
- port type: FloatPort
- value type: float
-
power
Power of the polynomial. A value of 1.0 results in a linear schedule, while a value of 2.0 results in a quadratic schedule. Fractional values are allowed, as well.- verbose name: Power
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
transition_begin
Step count at which to begin the transition from the initial value to the final value. The parameter is held at the initial value until this step count is reached.- verbose name: Transition Begin
- default value: 0
- port type: IntPort
- value type: int (can be None)
-
transition_steps
Step count at which to end the transition from the initial value to the final value. The parameter is held at the final value after this step count is reached.- verbose name: Transition Steps
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
Pooling
Perform an N-dimensional spatial pooling operation (average or max) on the given data.
This will sweep a window over the data, using the given strides as the step size (and respecting padding at the borders), and compute the average or max value within the window. The output size is the size of the grid of valid positions for the window (given padding and strides). If there were multiple feature axes in the input, they will be flattened into a single feature axis placed at the end. The full output shape is first the instance axes if any, then any unspecified non-feature dimensions, then the specified "spatial" (i.e., swept-over) dimensions in the order specified, followed by a single feature axis. Beware that there is a difference between the semantics of Keras and Haiku pooling: Keras pooling only applies to spatial dimensions, while Haiku pooling windows are specified in terms of all (non-batch) dimensions, including the channels axis. This node defaults to the Keras semantics, but allows you to perform Haiku-style pooling by specifying a window size that is greater than the number of spatial dimensions in the data. Also as in Keras, the default strides equals the window size, meaning that pooling is by default a downsampling operation if not strides are specified.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
pool_axes
List and order of axes over which to pool adjacent values. If the input data are packets, this determines the order of these axes in the output data, and the order in which the size of the pooling window etc is given. Alternatively one may also give just an integer number of axes to pool over, in which case the last N axes in the data (that are neither feature nor instance) will be pooled over. If the input data are plain arrays and named axes are listed, this merely determines the number of spatial axes and the names are just mnemonic and not otherwise used. Conceptually, the pooling operation sweeps a window over the (with a step size given by the strides parameter), and for each position of the window, either the average or the maximum of the values is taken to produce a single number for each window position. The results are arranged in an array that has the same order of spatial dimensions, but which is (typically) smaller in each dimension by a factor depending on the step size. The window can either be limited to not run past the edges of the data (padding=valid), or it can be allowed to run past the edges of the data by half the window size, which results in an output size that is exactly the input size divided by the respective step sizes (padding=same). Note that this operation will not pool over feature axes, which will be arranged into a single feature axis at the end, but instead the pooling is applied separately for each feature. Likewise, the operation does not pool over instances, and any instance axes are arranged at the beginning. If plain array data is given, then the node needs to be told, using the feature_axis parameter, where the feature axis is assumed to be located (either at the end or before the spatial axes). This is automatic when packet data is provided. This parameter is not limited to the predefined options.- verbose name: Axes To Pool Over
- default value: time
- port type: ComboPort
- value type: str (can be None)
-
pool_type
Type of pooling to perform. Either the maximum or the average of the values in the pooling window is taken. Both pooling operations have their uses, with max pooling being more suitable for detecting local features in a translation-invariant manner, and average pooling can be used as a lowpass filter on features.- verbose name: Pooling Type
- default value: max
- port type: EnumPort
- value type: str (can be None)
-
window
Size of the pooling window. This is a list of integers, one for each dimension as given in pool axes. Can also be given as a single-element list, in which case the window is the same size along all of the given spatial dimensions. For pooling along a single dimension one may either just name the axis to pool over and give a single value for the window size, or one may list all spatial axes and describe the window size as in e.g., [1, 3, 1] to pool over the second axis. Which is more efficient depends on the implementation.- verbose name: Window Size
- default value: [3]
- port type: ListPort
- value type: list (can be None)
-
strides
Step size with which the window is swept over the data. If not given, defaults to the same as the window size. This is a list of integers, one for each dimension as given in pool axes. Can also be given as a single-element list, in which case the same step size is used along all of the specified spatial dimensions. A step size greater than 1 means that the window will be shifted by this amount between successive positions; as a result, the output data along this axis will be shorter by this factor (matching the number of positions at which the window is applied).- verbose name: Step Size (Strides)
- default value: None
- port type: ListPort
- value type: list (can be None)
-
padding
Padding strategy for the data. This can be either 'valid' or 'same'. 'valid' means no padding (i.e., the window will not run off the edges of the data, but the output data will be shortened along each axis according to the number of valid positions of the window along that axis), and 'same' means that the output will have the same shape as the input (aside from downsampling due to strides).- verbose name: Padding
- default value: valid
- port type: EnumPort
- value type: str (can be None)
-
feature_axis
Dimension to exclude from pooling. This is typically the feature dimension (channels in traditional deep earning nomenclature), which is often the last axis but can be the axis just prior to the spatial dimensions. If data is provided in packet format, then this is automatically determined inferred from the data and should not be specified. If plain array data is given, this defaults to -1, meaning the last axis in the data. This is ignored if you provide a longer list for windows/strides than there are spatial dimensions in the data.- verbose name: Feature Axis
- default value: None
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
RAdamStep
The Rectified Adam optimizer step.
Based on Liu et al., 2020, Rectified Adam addresses a shortcoming in the popular Adam optimizer, where during initial stages of training, the gradients exhibit a large variance due to the limited number of training samples used to estimate the optimizer's statistics, which typically is addressed using warm-up schedules. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.999
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling.- verbose name: Epsilon
- default value: 1e-08
- port type: FloatPort
- value type: float (can be None)
-
epsilon_inroot
Small value applied to the denominator inside the square root to avoid dividing by zero when rescaling. A case where this is needed is when differentiating the optimizer itself, eg for bilevel optimization.- verbose name: Epsilon (Inside Root)
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
threshold
Threshold for variance tractability.- verbose name: Threshold
- default value: 5
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
RMSNorm
Apply RMS normalization to the given data.
This standardizes data across the spatial and/or feature dimensions, separately per instance (example), but does not remove the mean (unlike LayerNorm, which does remove the mean). RMS normalization can be the best choice when dealing with data that has a small batch size, or when used inside an RNN (both cases rendering the batch norm potentially inapplicable) and/or when no additional large spatial axes are present (rendering the instance norm inapplicable). Like most normalizations, RMS normalization typically includes a learned scale parameter, whose shape (and thus dimensionality) can be configured, and which in case of the RMS norm varies between different NN suites. This can also be optionally overridden with an externally generated value. If packet data is given, this node ensures that the instance axes come first and the feature axes come last.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
scale_init
Initializer for the trainable scale.- verbose name: Scale Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
scale_prior
Optional prior distribution for the scale.- verbose name: Scale Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
axes
"Optional comma-separated list of axis names or indices over which to accumulate the normalization statistics. If unspecified, the statistics will be accumulated over all except the instance axes. This parameter is not limited to the predefined choices.- verbose name: Accumulate Across Axes
- default value: (non-instance)
- port type: ComboPort
- value type: str (can be None)
-
param_axes
List of axis names/indices across which to learn separate per-element scale and bias parameters. Like with the layer norm, different NN libraries use different conventions for this parameter. Haiku and Sonnet use the feature axis (or last axis) by default, meaning that each feature is post-scaled independently as in the batch norm, but some other ML may set this to the same as the axes parameter, which causes a separate scale/bias to be learned also across all entries of the spatial axes. Like axis, this parameter is not limited to the predefined choices.- verbose name: Learn Scale/bias Across Axes
- default value: feature
- port type: ComboPort
- value type: str (can be None)
-
epsilon
Small value to add to the variance to avoid division by zero.- verbose name: Epsilon
- default value: 1e-05
- port type: FloatPort
- value type: float (can be None)
-
learnable_scale
Whether to learn a trainable scale parameter. Normalizations typically include such a parameter in order to drive the subsequent activation function in a regime that is desirable for downstream computations (e.g., saturating or linear). Note the shape (and thus dimensionality) of the learned parameter is governed by the param_axes (learn scale/bias across axes) parameter.- verbose name: Learnable Scale
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
scale_initializer
Choice of scale initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Scale Initializer
- default value: ones
- port type: ComboPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of the trainable parameters.- verbose name: Layer Name
- default value: rmsnorm
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
RMSPropStep
The RMSProp optimizer step.
Based on Tieleman and Hinton, 2012 and Graves et al. 2013, this optimizer was one of the first successful deep learning optimizers, and remains popular today. This implementation has several options that it is sometimes used with, including momentum and Nesterov acceleration. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
decay
Exponential decay rate for the first moment estimates.- verbose name: Decay
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling.- verbose name: Epsilon
- default value: 1e-08
- port type: FloatPort
- value type: float (can be None)
-
initial_scale
Initial scale. Initial value of accumulators tracking the magnitude of previous updates. Note that PyTorch uses 0.0 here while TensorFlow 1 uses 1. When reproducing results from a paper, verify the value used by the authors.- verbose name: Initial Scale
- default value: 0.0
- port type: FloatPort
- value type: float (can be None)
-
centered
If True, use the centered version of RMSProp.- verbose name: Centered
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
momentum
Optional exponential decay rate for momentum.- verbose name: Optional Momentum
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
nesterov
Whether to use Nesterov acceleration.- verbose name: Use Nesterov Acceleration
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
RandomCapRotationAugmentation
Simulate small rotations of the cap montage to augment neural data.
This node draws random rotations in degrees from a given distribution, scaled by some provided factors, and then simulates a rotation of the montage (e.g., EEG cap) about these angles by interpolating the data to rotated positions. Note that this is best applied to time-domain data and may not give entirely correct results when used with e.g., precomputed frequency spectra or powers. A good starting point is a unit normal or truncated normal distribution, and using the provided scale factors, which simulate a typical level of inaccuracy in cap placement. For data that was collected under highly controlled conditions, you may want to use smaller scale factors. This node requires the data to have a space axis with x/y/z coordinates correctly assigned (relative to head center) for all channels; this can be achieved using the Assign Channel Locations node beforehand, and removing any unlocalized channels using the Remove Unlocalized Channels node. Like most augmentation nodes, this node does not by itself amplify the amount of data, which therefore has to be done beforehand using, for example, the RepeatAlongAxis node. Also As with most augmentation nodes, you need to wire in a random seed (for example using the DrawRandomSeed node, see docs for more info) to ensure reproducibility. You also need to wire a distribution to the dist input to specify the distribution of interest (e.g., NormalDistribution).
Version 0.1.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: Packet (can be None)
- data direction: INOUT
-
seed
Random seed for deterministic results.- verbose name: Seed
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
dist
Distribution to use.- verbose name: Dist
- default value: None
- port type: GraphPort
- value type: Graph
-
is_training
Whether the node is used in training mode.- verbose name: Is Training
- default value: None
- port type: DataPort
- value type: bool (can be None)
- data direction: IN
-
xyz_rot_scale
A comma-separated list of scale factors to apply to the drawn rotation amounts for the x, y, and z axes. The resulting scaled values are then assumed to be in a unit of degrees; therefore, you may either use a distribution with unit standard deviation and then specify the scale factors here in degrees, or your distribution's standard deviation is set to degrees, and you may use [1,1,1] here (or smaller values to reduce or disable the amount of rotation applied for a given axis). A value of 3.0 degrees corresponds to a maximum movement of approx. half a centimeter on the scalp, but when rotating about multiple axes, you may use a slightly lower value such that the different movements combine to approx. the same total. When using non-uniform scaling, verify that x, y, and z in your data correspond to the axes that you expect. The NeuroPype default is x=right, y=front, z=up, so that the corresponding rotations are pitch, roll, and yaw (applied in that order).- verbose name: Scale Random Numbers For X,y,z (Degrees)
- default value: [2.5, 2.5, 1.0]
- port type: ListPort
- value type: list (can be None)
-
num_rotations
Number of rotations to draw from the distribution. This node will initially draw and cache a fixed set of rotations. Then, every single instance in the data will be transformed by a randomly chosen rotation. Note that an excessively large number will result in longer initialization time and more memory use, incl. GPU memory when applied to GPU data.- verbose name: Num Rotations
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
rot_seed
Random seed (int or None) for precomputing candidate rotations. Note that this seed should usually not be driven by a wire but left fixed since it's a precomputation.- verbose name: Rot Seed
- default value: 12345
- port type: IntPort
- value type: int (can be None)
-
bypass
Whether to bypass the augmentation and pass the input data through unchanged.- verbose name: Bypass
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
RandomTimeSliceAugmentation
Extract a randomly offset time slice of the given length from the (already-segmented) input data.
The node adds a random offset drawn from the specified distribution (in the same unit as the length) to the position of the extracted segment, separately per instance, which are therefore jittered. Note that offsets that result in out-of-bounds access are clamped at the edges, i.e., if your distribution has too high variance you will get an excess number of segments that are stuck at the start or end of your original segmented time series (this can be mitigated with a truncated distribution). The slice can either be anchored at the start of the time slice (plus positive jitter offsets) or at the center of the time slice (plus or minus jitter offsets). A good starting point is a centered anchor combined with a zero-mean normal or truncated normal distribution and a standard deviation of +/- 50ms (i.e., 0.05 when using seconds as the time unit). However, the actual range should be adjusted based on the nature of the EEG phenomenon of interest (e.g., an analysis of late EEG responses may benefit from a larger jitter of as much as 100ms, while an analysis of early EEG responses should likely use jitters of no more than 10-20ms). Like most augmentation nodes, this node does not by itself amplify the amount of data, which therefore has to be done beforehand using, for example, the RepeatAlongAxis node. Also As with most augmentation nodes, you need to wire in a random seed (for example using the DrawRandomSeed node, see docs for more info) to ensure reproducibility. You also need to wire a distribution to the dist input to specify the distribution of interest (e.g., NormalDistribution).
Version 0.8.1
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: Packet (can be None)
- data direction: INOUT
-
seed
Random seed for deterministic results.- verbose name: Seed
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
dist
Distribution to use.- verbose name: Dist
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
is_training
Whether the node is used in training mode.- verbose name: Is Training
- default value: None
- port type: DataPort
- value type: bool (can be None)
- data direction: IN
-
length
Length of the time slice to take.- verbose name: Length
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
time_unit
Unit of the jitter and length of the time slice.- verbose name: Time Unit
- default value: seconds
- port type: EnumPort
- value type: str (can be None)
-
anchor
Anchor location relative to which the offset is applied. Can be the start of the time slice (for non-negative distributions) or the center of the time slice.- verbose name: Anchor
- default value: center
- port type: EnumPort
- value type: str (can be None)
-
bypass
Whether to bypass the augmentation and pass the input data through unchanged.- verbose name: Bypass
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
RecurrentLoop
Loop a graph representing a recurrent neural network across an input array (e.g
., a time series). This treats one of the axes in the given input data as the "traversal" axis (e.g., time) along which to step, and applies the the loop body (i.e., the network) to the successive elements of the data along that axis, while carrying over the network activations of any recurrent layers contained in the loop body to the next iteration. This node can be thought of as a "layer" that sweeps across some traversal axis in the input, while carrying over recurrent state across elements. Alternatively, the node can be viewed as a convenience variant of the collecting Fold Loop node (FoldCollect, see documentation), where the loop body is the network, and the state is the carry state of all the recurrent nodes contained in it. In practice, this node is indeed equivalent to such a Fold loop, where the initial state is a dictionary of the initial states of all the recurrent nodes in the body (keyed by the layername of each of the nodes, which must therefore be unique), and the graph has an implicit extra "state" input placeholder taking in that dictionary, splitting it using BreakStructure, and wiring the states to the respective recurrent nodes' carry input. Likewise, the nodes' carry outputs are wired into a CreateDict node, which forms the output state, and the body is made to return a 2-element list of that state dictionary and the regular result of the loop body. Therefore, this node could be replaced by such a setup without loss of functionality, and that can in some cases be useful for additional control, for example if you want to loop other state across the traversal axis, or your graph contains nested recurrent loops, or uses network submodules in it (which are currently not supported by this node).
Version 1.0.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
net
Recurrent network.- verbose name: Net
- default value: None
- port type: GraphPort
- value type: Graph
-
net__signature
Arguments accepted by the recurrent network. If you wish to loop over multiple pieces of data (e.g., arrays, packets) simultaneously (which all must have the same length), you can list additional items here, using any name of your choosing. A recurrent network is specified in NeuroPype quite similarly to a loop body in, e.g., a For Each loop. You start with a Placeholder node whose slotname matches the name listed here, and then follow that with one or more successive NN nodes, which may include any of the recurrent nodes (e.g., GatedRecurrentUnitLayer, LongShortTermMemoryLayer), but you can also use non-recurrent layers, other mathematical operations, or stateless normalizations such as LayerNorm. Conceptually, this loop body is then given the content of a single slice of your data being looped over (along those chosen axis, e.g., time) and generates some outputs for that slice. When the loop body is called on the next slice, the RecurrentLoop node will implicitly pass in the carry state of any recurrent nodes contained in the body from the previous loop iteration. This is a convenience feature that is exactly equivalent to using a FoldCollect node and manually threading through the state of each recurrent layer. The initial state is obtained by running the loop body once on the first slice of the data, in which case each recurrent layer will emit its initial state over the respective carry output (therefore, the manual equivalent would be to manually perform this pre-pass and then constructing a dictionary of initial states from the carry outputs of the loop body's recurrent nodes).- verbose name: Net [Signature]
- default value: (input)
- port type: Port
- value type: object (can be None)
-
data1
Dataset 1 to iterate over.- verbose name: Data1
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
data2
Dataset 2 to iterate over.- verbose name: Data2
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
data3
Dataset 3 to iterate over.- verbose name: Data3
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
data4
Dataset 4 to iterate over.- verbose name: Data4
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
data5
Dataset 5 to iterate over.- verbose name: Data5
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
dataN
Additional datasets.. .- verbose name: Datan
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
initial
Optional initial carry state.- verbose name: Initial
- default value: None
- port type: DataPort
- value type: dict (can be None)
- data direction: IN
-
result
Processed data.- verbose name: Result
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
final
Final carry state after last operation.- verbose name: Final
- default value: None
- port type: DataPort
- value type: dict (can be None)
- data direction: OUT
-
traversal_axis
The axis over which to loop while carrying over the activations in the recurrent nodes. For packet data, this can be one of the named axes, time being the default (while instance is usually reserved as the batch axis). If there are multiple axes of the same type, this will resolve to the first such axis, but you can also write e.g., time.mylabel to specify an axis based on not just its type buy additional its custom label (if any). You may also specify a numeric axis index, although this is more commonly used if you only process numeric arrays. If you process numeric arrays and time is listed, the first axis will be used. In any case, the selected the axis will be omitted from the data that the loop body sees on any given call (i.e., the loop body only sees successive slices of the packet along that axis).- verbose name: Traversal Axis
- default value: time
- port type: ComboPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
reverse
If True, process the iterables in reverse order. The output will then be the state after having processed the leftmost element of the iterables.- verbose name: Process In Reverse Order
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
log_errors
If True, log exceptions occurring in the loop body.- verbose name: Log Errors
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
compile
Whether and how to compile the loop body for more efficient execution. Auto will compile the loop only if it occurs in a context where compilation is necessary, for example inside a model passed to nodes such as DeepLearning, ConvexModel, or one of the Inference nodes. The 'jax' option will generally attempt to compile the loop to run on the jax backend; this comes with a series of limitations -- all nodes in the body have to work with jax data types (numeric values only) and operations, and break/continue nodes cannot be used. 'Off' is the appropriate setting for anything except for the most performance-critical computations consisting of mainly math operations, possibly with some data reformatting. Off can also be useful in a situation where the loop occurs in a compiled context, but the iterable is a static constant (e.g., fixed list) and relatively short; in this case, the loop will be completely unrolled, which can be more efficient than actually looping.- verbose name: Compile
- default value: auto
- port type: EnumPort
- value type: str (can be None)
-
unroll
Optionally the unrolling factor for this loop, if compiling. Typically this is a small power of two such as 4 or 8. This is mainly an efficiency improvement for extremely tight loops that perform very cheap math operations.- verbose name: Unroll
- default value: None
- port type: IntPort
- value type: int (can be None)
SGDStep
The Stochastic Gradient Descent (SGD) optimizer step.
Popularized in its modern incarnation by Sutskever et al. 2013, SGD is a simple yet powerful optimizer that that both can serve as a baseline and sometimes outperforms more complex optimizers, e.g., on reasonably benign network topologies. This implementation includes optional support for momentum and Nesterov acceleration, which are standard practice when optimizing DNNs. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
momentum
Optional exponential decay rate for momentum.- verbose name: Optional Momentum
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
nesterov
Whether to use Nesterov acceleration.- verbose name: Use Nesterov Acceleration
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
mu_precision
Numeric precision for the first-order accumulator. Keep resolves to the precision of the inputs.- verbose name: Mu Precision
- default value: keep
- port type: EnumPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
SM3Step
The SM3 optimizer step.
Based on Anil et al., 2019, SM3 is mainly addressing memory usage for large or very large models, and has rigorous convergence guarantees. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for momentum.- verbose name: Momentum
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates. It may not be necessary to tune this.- verbose name: Beta2
- default value: 1
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator to avoid dividing by zero when rescaling.- verbose name: Epsilon
- default value: 1e-08
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
ScalingStep
Chainable step that scales the gradients by a fixed factor and/or a factor that varies on a schedule.
If both are given, the product of the two is applied. This node can be used to apply things like a fixed learning rate, a learning rate schedule, and/or the sign flip at the end of the chain to turn the processed gradient into an additive update.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
factor_schedule
Optional schedule for the factor.- verbose name: Factor Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
factor
Scaling factor to apply to the gradients.- verbose name: Factor
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
SequenceSchedule
A composite schedule that is a sequence of multiple provided schedules, each with a given starting step count.
This node is used by wiring one or more base schedule nodes, which can be any of the nodes ending in Schedule, into the schedule0, schedule1, etc. inputs (using the "this" output of the respective schedule node). The first provided schedule starts immediately (at step 0), and each subsequent schedule starts at the step count specified by the respective start1, start2, etc. inputs, and its timing is relative to that start point. However, note that the transition durations of the individual steps are not adjusted to match the difference in successive start points, so you need to make sure that the timing of each step is reasonable given the start point of the next step. The most common use cases of this node is to chain a specific warmup curve (e.g., linear) with a plateau and/or falloff step (e.g., cosine decay), although note that there are also ready-made nodes for some of the most common scenarios of this type. Another use case is to chain multiple decaying schedule to get a cyclical behavior, which is sometimes used to prevent the optimization from getting stuck in a suboptimal local minimum. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
schedule1
Schedule 1.- verbose name: Schedule1
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
schedule2
Schedule 2.- verbose name: Schedule2
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
schedule3
Schedule 3.- verbose name: Schedule3
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
schedule4
Schedule 4.- verbose name: Schedule4
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
schedule5
Schedule 5.- verbose name: Schedule5
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
scheduleN
Additional schedules.- verbose name: Schedulen
- default value: None
- port type: DataPort
- value type: list (can be None)
- data direction: IN
-
start1
Starting step for 2nd schedule.- verbose name: Start1
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
start2
Starting step for 3rd schedule.- verbose name: Start2
- default value: None
- port type: IntPort
- value type: int (can be None)
-
start3
Starting step for 4th schedule.- verbose name: Start3
- default value: None
- port type: IntPort
- value type: int (can be None)
-
start4
Starting step for 5th schedule.- verbose name: Start4
- default value: None
- port type: IntPort
- value type: int (can be None)
-
start5
Starting step for 6th schedule.- verbose name: Start5
- default value: None
- port type: IntPort
- value type: int (can be None)
-
startN
Starting steps for additional schedules.- verbose name: Startn
- default value: []
- port type: ListPort
- value type: list (can be None)
-
schedule0
Schedule 0.- verbose name: Schedule0
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
StepApply
Apply an optimizer step to given gradients, prior state, and optionally prior weights.
This node, along with "Init Step" (StepInit) provides a purely functional programming interface to the optimization steps. This is an alternative way of using steps: the simplest way of using a step is to wire data into a step node, and it will behave statefully like any other filter node (e.g., FIRFilter). However, when exactly replicating Python code that uses optax, you may need to follow the functional programming pattern where the state is explicitly passed around. One concrete difference is that providing data to a step node for the first time will first initialize it, and then also update it on the given data before returning it and the processed outputs, whereas "Init Step" will return just the initial state before the first update was applied. Note that the Call function, when given a graph that implements an optimizer step, can stand in for StepApply. The input state then is the graph itself, which is wired into the "function" input of Call, and the output state is the "snapshot" output of Call, which is the graph after it has processed the inputs. The initial state can likewise be obtained by using Call on some initial data (i.e., gradients), and keeping its "snapshot" output as the initial state. Both the StepApply and Call method can also be used with the "Gradient" node to differentiate the optimizer with respect to one or more of its hyper-parameters (e.g., learning rate) at some data point; for this, one would create a graph that accepts and initial state via a GraphPort and which returns a measure of the optimizer's performance along with the updated state (Call "snapshot" output).
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Step to apply.- verbose name: Step
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the step.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
StepInit
Get initial state for an optimization step.
This node, along with "Apply Step" (StepApply) provides a purely functional programming interface to the optimization steps. This is an alternative way of using steps: the simplest way of using a step is to wire data into a step node, and it will behave statefully like any other filter node (e.g., FIRFilter). However, when exactly replicating Python code that uses optax, you may need to follow the functional programming pattern where the state is explicitly passed around. One concrete difference is that providing data to a step node for the first time will first initialize it, and then also update it on the given data before returning it and the processed outputs, whereas "Init Step" will return just the initial state before the first update was applied. The Init/Apply paradigm makes state explicit, but note that the same can also be accomplished with the plain Call node (see StepApply for a discussion).
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Step to initialize.- verbose name: Step
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
params
Example gradients or weights.- verbose name: Params
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Initialized state.- verbose name: State
- default value: None
- port type: DataPort
- value type: dict (can be None)
- data direction: OUT
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
TransposedConvolutionLayer
A 1/2/3/N-D "transposed" (upscaling) convolution layer.
See the "Convolution Layer" node for a general overview of regular convolution operations. In contrast to regular convolution, the upscaling (aka "transposed" convolution or "deconvolution") reverses the interpretation of strides and padding, and generates and output array that is correspondingly larger than the input array rather than smaller. Specifically, the output size is the size that would be necessary to generate the given input size when applying the given kernel size, strides, and padding using a normal "forward" convolution. This is useful for, e.g., upsampling a feature map to a higher resolution or reversing the effect (on sizes) of an equivalent downsampling convolution operation. Kernel dilation is not supported in this context. Note that this is not a true "deconvolution" but merely a special case of a convolution with reversed padding and fractional (1/N) strides. This node will not subsample the spatial input axes, but instead rewrite them with dummy data.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
data
Data to process.- verbose name: Data
- default value: None
- port type: DataPort
- value type: AnyNumeric (can be None)
- data direction: INOUT
-
mask
Mask to apply to the weights.- verbose name: Mask
- default value: None
- port type: DataPort
- value type: AnyArray (can be None)
- data direction: IN
-
w_init
Initializer for the weights.- verbose name: W Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
b_init
Initializer for the bias.- verbose name: B Init
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
w_prior
Optional prior distribution for the weights.- verbose name: W Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
b_prior
Optional prior distribution for the bias.- verbose name: B Prior
- default value: None
- port type: DataPort
- value type: Distribution (can be None)
- data direction: IN
-
sweep_axes
List and order of axes over which the convolution filter kernel is swept. If the input data are packets, this determines the order of these axes in the output data, and the order of the axes in the kernel (for plain array inputs, see end of tooltip). A kernel is a learned array that is shifted over all possible positions in the data (optionally with step size in each dimension, and optionally going past the edges of the data by half the kernel size if padding=same). For each position, the kernel is multiplied by the data in the region covered by the kernel and the resulting (elementwise) product is integrated (summed) to produce a single output score (a measure of match between the kernel and the data in that region). If the input data has an extra feature axis, the kernel will usually have an implicit extra axis to hold weights for each input feature. If the data has an instance axis, each instance will be processed separately (using the same kernels). If the input data are plain arrays, this merely determines the number of spatial axes and the names are just mnemonic and not otherwise used. This can alternatively be given as just a number to set the number of spatial dimensions, corresponding to the N in N-D convolution; for packet data, this will resolve to the last N axes in the data that are neither feature nor instance axes. This parameter is not limited to the predefined options.- verbose name: Axes To Sweep Kernel Over (Convolve)
- default value: time
- port type: ComboPort
- value type: str (can be None)
-
output_features
Number of filter kernels (and features) to learn. This value generally determines the length of the feature axis in the output data (each kernel yields one output feature, representing raw feature detection score produced by that kernel). In classic deep learning, this is also called the number of output channels -- analogous to RGB color channels in a raw image, or generally meant to be an unspecific feature axis in a data array (not to be confused with spatial channels in multi-channel time series, which more commonly treated like the vertical axis in 2d image data).- verbose name: Number Of Filters To Learn
- default value: 1
- port type: IntPort
- value type: int (can be None)
-
kernel_shape
Shape of the convolution filter kernel. This is a list of integers, one for each dimension as given in sweep axes. Can also be given as a single-element list, in which case the kernel is the same size along all of the given spatial dimensions. Note: if you make the kernel as large as the data along some axis, there is only a single valid position for the kernel along that axis (if padding=valid), and consequently the result is an inner product between the data and the kernel, or a matrix multiplication when more kernels are learned. Conversely, if you give the kernel a shape of 1 along an axis, the result is equivalent to processing each element along that axis separately using the same kernel. The latter is the same as not listing the axis in sweep axes, except that the output axis order can be controlled when specifying a 1-sized axis in sweep_axes. Which is more efficient depends on the implementation.- verbose name: Kernel Shape
- default value: [3]
- port type: ListPort
- value type: list (can be None)
-
strides
Step size with which the kernel is swept over the data. This is a list of integers, one for each dimension as given in sweep axes. Can also be given as a single-element list, in which case the same step size is used along all of the specified spatial dimensions. A step size greater than 1 means that the kernel will be shifted by this amount between successive positions; as a result, the amount of compute is lower by this factor, and the output data along this axis will also be shorter by this factor (matching the number of positions at which the kernel is applied).- verbose name: Step Size (Strides)
- default value: [1]
- port type: ListPort
- value type: list (can be None)
-
padding
Padding strategy for the data. This can be either 'valid' or 'same', or a custom list of padding amounts. 'valid' means no padding (i.e., the kernel will not run off the edges of the data, but the output data will be shorter along each axis according to the number of valid positions of the kernel along that axis), and 'same' means that the output will have the same shape as the input (aside from dilation and striding). Can be customized by giving a list[(low, high), ...] pairs, where low is the padding to apply before the data along each axis, and high is the padding to apply after the data along each axis. low and high can also be negative to trim the data instead of padding. If a single [(low, high)] pair is given, it is applied to all axes.- verbose name: Padding
- default value: valid
- port type: ComboPort
- value type: str (can be None)
-
with_bias
Whether to include a bias term. If given, then for each output feature, a bias term is learned and added to the output of the convolution. This increases the flexibility of the learned model, but note that the result is no longer strictly equivalent to e.g., a learned FIR filter applied to time-series data or a learned spatial filter / matrix multiplication applied to spatial data.- verbose name: Learn Bias Term(S)
- default value: True
- port type: BoolPort
- value type: bool (can be None)
-
w_initializer
Choice of weight initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Weight Initializer
- default value: lecun_normal
- port type: ComboPort
- value type: str (can be None)
-
b_initializer
Choice of bias initializer. This can either be one of the provided initializers, or the value "custom", in which case one of the Initializer nodes must be wired into the respective input port. For beginners it is recommended to stick to the defaults, since initialization of deep net layers is nuanced and can be tricky, otherwise be prepared to experiment with different choices. In general, the variance-scaling (lecun, glorot/xavier, he/kaiming) initializers are recommended, except for very simple/small layers where you may have a good default assumption as to the distribution of the weights (e.g., truncated_normal or uniform). Bias layers are typically zero-initialized. For initializers that take arguments, you can also type out the arguments positionally as in "truncated_normal(1.0,0.0)" (note reversed order of stddev, mean). The following initializers have arguments (here listed with their defaults): constant(value), those ending in normal(stddev=1, mean=0), those ending in uniform(min=0,max=1), orthogonal(scale=1,axis=-1), identity(gain=1), and variance_scaling(scale=1, "fan_in" (default)/"fan_avg"/"fan_out", "truncated_normal"(default)/"normal"/"uniform",optional-axis-indices=auto). Note that glorot and xavier are aliases for each other, and likewise he and kaiming are aliases for each other.- verbose name: Bias Initializer
- default value: zeros
- port type: ComboPort
- value type: str (can be None)
-
data_format
Format of the input data. This is only respected when working with plain arrays and is ignored for packet data, which always normalizes the data to 'channels_last' layout. If 'channels_last', the data is assumed to be in the format ({batch}, ..., channels). If 'channels_first', the data is assumed to be in the format ({batch}, channels, ...).- verbose name: Array Data Format
- default value: auto
- port type: EnumPort
- value type: str (can be None)
-
op_precision
Operation precision. This is a compute performance optimization. See jax documentation for details on these options. Note that this only applies to the operation, while the storage precision may be separately configurable depending on the node in question.- verbose name: Operation Precision
- default value: default
- port type: EnumPort
- value type: str (can be None)
-
layername
Name of the layer. Used for naming of weights.- verbose name: Layer Name
- default value: transposed_conv
- port type: StringPort
- value type: str (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
TruncatedNormalInitializer
An initializer that draws initial weights from a truncated Gaussian distribution with a given mean and standard deviation.
The truncation is always at -2,2 std devs.
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
mean
Distribution mean.- verbose name: Mean
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
stddev
Distribution standard deviation.- verbose name: Stddev
- default value: 1
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
TrustRatioScalingStep
A chainable step that scales gradients by the trust ratio (ratio of parameter norm to update norm).
This is the underlying raw scaling rule of the Fromage, LARS, and LAMB optimizers, and not an end-to-end optimizer by itself. See also You et al. (2020) for an analysis.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
min_norm
Minimum gradient norm. This can be used to avoid dividing by zero when rescaling; small gradients are rescaled to at least this value.- verbose name: Min Norm
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
trust_coefficient
Trust coefficient. A multiplier applied to the trust ratio, can be used to scale the update size.- verbose name: Trust Coefficient
- default value: 1
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside to avoid dividing by zero when rescaling.- verbose name: Epsilon
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
UniformInitializer
An initializer that draws initial weights from a uniform distribution with a given minimum and maximum.
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
min
Minimum value of the unform distribution.- verbose name: Min
- default value: 0
- port type: FloatPort
- value type: float (can be None)
-
max
Maximum value of the unform distribution.- verbose name: Max
- default value: 1
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
VarianceScalingInitializer
Initialize weights from a distribution whose scale is adapted to the shape of the initialized array.
This can be configured to obtain a variety of standard initializers, including Glorot (scale=1, mode=fan_avg), LeCun (scale=1, mode=fan_in), and He (scale=2, mode=fan_in), in combination with distribution=uniform or truncated_normal.
Version 0.5.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
scale
Scale to multiply the variance by. This is most commonly 1 (e.g., Glorot and LeCun initialization), but can also be 2 (e.g., for He initialization).- verbose name: Scale Multiplier
- default value: 1
- port type: FloatPort
- value type: float (can be None)
-
mode
Scale the variance based on one of: the number of input units (fan_in, used in LeCun or He initialization), number of output units (fan_out), or the average of the number of input and output units (fan_avg, used in Glorot initialization).- verbose name: Scale By
- default value: fan_in
- port type: EnumPort
- value type: str (can be None)
-
distribution
Distribution from which to draw the initial weights. Typically, this is either truncated normal or uniform. The parameters of the distribution are computed as follows. First, the scale s is computed as: s = sqrt(scale / n) where n is the number of input units for fan_in, the number of output units for fan_out, or the average of the number of input and output units for fan_avg. Then the mean is 0 and the standard deviation is sqrt(s) for normal, and sqrt(adjs) for truncated normal, where adj is an adjustment factor to compensate for the truncation. For uniform, the bounds are [-sqrt(3s), sqrt(3*s)].- verbose name: Distribution Type
- default value: truncated_normal
- port type: EnumPort
- value type: str (can be None)
-
fan_in_axes
Axes to use for computing the number of input units. If None, all but the last dimension are used (default for e.g., convolutional kernels). Otherwise this may need to be set to a list of axis indices (counting from 0). The number of output units is always the remaining axes.- verbose name: Fan-In Axis Indices
- default value: None
- port type: ListPort
- value type: list (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
WarmupCosineDecaySchedule
A linear warmup followed by cosine decay schedule.
Similarly to the "Linear Warmup Exponential Decay" Schedule, this schedule begins with a linear ramp from the initial value to the peak value over the course of warmup_steps steps, and then follows a cosine function (from initial peak to first trough) down to the final value over the course of decay_steps steps. This is one of the most robust learning rate schedules and represents the state of the art along with the linear warmup then exponential decay schedule. However, as for all schedules note that none of the defaults should be used without either making good educated guesses, experimentation, or consuling literature that you are aiming to replicate. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
init_value
Initial parameter value. This is the value at the beginning of the schedule.- verbose name: Initial Value
- default value: 0.0
- port type: FloatPort
- value type: float
-
peak_value
Peak parameter value. This is the value at the peak following the initial warmup, before it is lowered again following a cosine function.- verbose name: Peak Value
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
final_value
Final parameter value. Once the schedule reaches this value, it will remain at this value for the remainder of the optimization process.- verbose name: Final Value
- default value: 0.0
- port type: FloatPort
- value type: float
-
warmup_steps
Number of steps over which to ramp up from the initial value to the peak value. After this, the parameter is lowered again following the shape of a cosine function down to the desired final value value.- verbose name: Warmup Steps
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
decay_steps
The number of steps over which the cosine decay takes place. This is a soft transition following a raised- cosine function from the peak value down to the final value.- verbose name: Decay Steps
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
WarmupExponentialDecaySchedule
A linear warmup followed by an exponential decay parameter schedule.
This schedule is a combination of a linear ramp (from initial value to peak value over the first warmup_steps), followed by an optional constant plateau (if decay_begin>-9), followed by an exponential decay (from peak value to final value at a rate governed by decay_rate and decay_steps, where decay_steps is the number of steps over which the value decays by a factor of decay_rate). This is one of the most robust schedules for training neural networks, and is the current state of the art, but note that none of the default values should be considered anywhere near optimal for a given setup -- these are just example settings that help show how the schedule is configured. The linear warmup ensures that the model does not diverge during early training where weights can be assumed to be random, the peak portion helps the model escape local minima (as in simulated annealing), and the exponential decay phase helps the model converge to a good solution with high precision. Schedule nodes in NeuroPype are used for fine-grained control over how parameters, like the learning rate, should change over time during optimization. Most Step nodes offer a learning_rate_schedule port, into which a Schedule node can be wired to override the otherwise default constant learning rate. However, any other optimizer step parameter can be controlled by a schedule, simply by wiring the schedule node's output into the respective parameter of the Step nodes, and passing the schedule the current iteration (step) count of the optimization process.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
step
Current step (iteration) count.- verbose name: Step
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
value
Schedule value at current step count.- verbose name: Value
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: OUT
-
init_value
Initial parameter value. This is the value at the beginning of the schedule. Note that the default may be application specific. The parameter is then linearly ramped up for warmup_steps and remains constant until transition_begin.- verbose name: Initial Value
- default value: 0.0
- port type: FloatPort
- value type: float
-
peak_value
Peak parameter value. This is the value at the peak plateau of the warmup schedule, before it is annealed again exponentially.- verbose name: Peak Value
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
-
warmup_steps
Number of steps over which to ramp up from the initial value to the peak value. After this, the parameter is held constant until transition_begin, after which it is then exponentially annealed until it reaches the final value.- verbose name: Warmup Steps
- default value: 100
- port type: IntPort
- value type: int (can be None)
-
plateau_steps
Step count after which to begin the transition from the peak value to the final value. The parameter is held at the peak value for this many steps.- verbose name: Plateau Steps
- default value: 0
- port type: IntPort
- value type: int (can be None)
-
decay_steps
The number of steps over which the parameter decays by decay_rate. Note that this is not the total duration of the decay portion; the decay will only finish one the value has reached the specified final value. The basic formula is value = initial_value * decay_rate ^ (count_since_decay_begin / decay_steps), followed by clipping according to the final_value.- verbose name: Decay Steps
- default value: 10
- port type: IntPort
- value type: int (can be None)
-
decay_rate
Decay rate. The parameter value decays by this factor for every decay_steps. This can be between 0 and 1 for a regular decay schedule, or greater than 1 for an exponential growth schedule.- verbose name: Decay (Or Growth) Rate
- default value: 0.9
- port type: FloatPort
- value type: float
-
final_value
Final parameter value. Once the schedule reaches this value, it will remain at this value for the remainder of the optimization process. (if the decay rate is < 1, this is effectively a lower bound on the parameter value, and if the decay rate is > 1, this is an upper bound)- verbose name: Final Value
- default value: 0.0
- port type: FloatPort
- value type: float
-
staircase
If True, the parameter value is decayed in a staircase fashion, i.e ., the parameter changed by exactly decay_rate every decay_steps steps. If False, the parameter value is decayed in a continuous fashion according to the formula given in the docs for decay_steps.- verbose name: Staircase
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
-
step_multiplier
Multiplier for the step count. This value is multiplied with each of the step counts to uniformly speed up or slow down the schedule through a single parameter. When used to define an optimizer used by the DeepModel node, this can also be set to 0.0, in which case the multiplier is chosen such that the schedule reaches its final value at the end of the training process, but note that this is not always possible, namely for schedules that are never reach a final value. Otherwise, to make a schedule dependent on the number of steps done by a node, you may normalize your schedule to eg 1000 steps and then wire a formula that calculates the steps done by some process divided by 1000 into this node.- verbose name: Step Multiplier
- default value: 1.0
- port type: FloatPort
- value type: float (can be None)
WeightDecayStep
Chainable step that applies weight decay (analogous to l2 regularization) to the parameters.
This is typically applied after gradients have been scaled by criteria such as past gradient updates, but is applied before scaling by the learning rate, so that the learning rate does not change the ratio of the weight decay to the gradient update. The weight decay can be used in conjunction with a mask data structure that has the same nested structure as the weights being optimized, but which contains booleans indicating which weights should be decayed.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weight_decay_mask
Mask structure for the weight decay.- verbose name: Weight Decay Mask
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
decay_rate
Weight decay rate. This is typically a small value, such as 1e-4.- verbose name: Decay Rate
- default value: 0.0001
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
YogiStep
The Yogi optimizer step.
Based on Zaheer et al., 2018, Yogi is a modification of the popular Adam optimizer, which addresses the issue of convergence, where the effective learning rate can increase, leading to blowup. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.
Version 0.2.0
Ports/Properties
-
metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
-
gradients
Gradients to be transformed.- verbose name: Gradients
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
weights
Optional current weights.- verbose name: Weights
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: IN
-
state
Explicit state of the node.- verbose name: State
- default value: None
- port type: DataPort
- value type: object (can be None)
- data direction: INOUT
-
learning_rate_schedule
Optional learning rate schedule.- verbose name: Learning Rate Schedule
- default value: None
- port type: DataPort
- value type: BaseNode (can be None)
- data direction: IN
-
learning_rate
Learning rate. A typical choice may be 0.001 here, but this is problem dependent. If a learning rate schedule is provided, this value should be left unspecified.- verbose name: Learning Rate
- default value: None
- port type: FloatPort
- value type: float (can be None)
-
beta1
Exponential decay rate for the first moment estimates.- verbose name: Beta1
- default value: 0.9
- port type: FloatPort
- value type: float (can be None)
-
beta2
Exponential decay rate for the second moment estimates.- verbose name: Beta2
- default value: 0.999
- port type: FloatPort
- value type: float (can be None)
-
epsilon
Small value applied to the denominator outside the square root to avoid dividing by zero when rescaling.- verbose name: Epsilon
- default value: 0.001
- port type: FloatPort
- value type: float (can be None)
-
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)