Module: `deep_learning`

Deep learning nodes.

These nodes can be used to implement deep learning workflows, including training and inference. The main node here is the Deep Model node, which is a simple "one-stop shop" for controlling the training and use of a neural network in a workflow, similarly to conventional machine learning nodes such as LinearDiscriminantAnalysis. This node receives a network (composed of, among others, Layer nodes and other math operations) and may also receive an optimizer step. However, one may also build deep learning workflows from scratch using the Net nodes, Step nodes, and the Gradient and/oror Jacobian nodes (found in the optimization category). The following sets of nodes are provided: - Nodes ending in Layer: these are the traditional neural network layers, which are characterized by containing implicit trainable parameters. - Nodes ending in Initializer: these can be used to initialize the parameters of a layer, but they are less frequently needed in practice, since the initializer can also be specified from a drop-down menu per node (as a string in Python). - Nodes ending in Norm: these are normalization stages that can be interspersed between layers. Some have (non-trainable) state, which needs to be explicitly managed when using low-level optimization primitives. - Node starting with Net: these ar ethe high-level network management nodes, which act on a whole network module (i.e. a set of layers). These are used to define a module, materialize or share it in a larger computational graph, and to obtain initialization and forward-pass functions to perform training. - nodes ending in Step: these are the optimization steps that can be used to train a network. There are two categories: end-to-end optimizer steps such as AdamStep, and partial gradient processing steps such as CenteringStep. - node with Core in the name: these pertain to recurrent cores, that is, the portions of networks that receive (part) of their past output as input. - nodes ending in Schedule: these are used to schedule the learning rate and other hyperparameters during training, which are typically annealed. - other nodes are stateless (pure math) operations that are frequently used in neural networks, e.g., pooling, activation functions, gradient, and so forth. Note that many other nodes from other categories, especially any mathematical operation nodes that have a "backend" parameter can be used in neural nets.

`AMSGradStep`

The AMSGrad optimizer step.

Based on Reddi et al, 2018, AMSGrad is a modification of the popular Adam optimizer, which improves the convergence of the algorithm (guarantees it) by using a long-term memory of past gradients. If Adam fails on a problem (e.g., diverges/explodes), AMSGrad is a useful thing to try. Like all step nodes, this node only processes gradients, and the resulting updates must be applied manually to the weights (this can be accomplished using the Add node). However, you can also pass it to the StepSolver node which implements the full optimization loop. The learning rate can instead be given as a schedule, by wiring one of the Schedule nodes into the learning_rate_schedule port.

Module: deep_learning

AMSGradStep

Ports/Properties

Activation

Ports/Properties

AdaBeliefStep

Ports/Properties

AdaFactorStep

Ports/Properties

AdagradStep

Ports/Properties

AdamStep

Ports/Properties

AdamWStep

Ports/Properties

AdamaxStep

Ports/Properties

AdamaxWStep

Ports/Properties

AdditiveNoiseAugmentation

Ports/Properties

AdditiveNoiseStep

Ports/Properties

AggregateStep

Ports/Properties

ApplyIfFiniteStep

Ports/Properties

BatchFlatten

Ports/Properties

BatchNorm

Ports/Properties

BatchReshape

Ports/Properties

CenteringStep

Ports/Properties

ChainedStep

Ports/Properties

ConstantInitializer

Ports/Properties

ConstantSchedule

Ports/Properties

ConstraintStep

Ports/Properties

ConvolutionLayer

Ports/Properties

CosineDecaySchedule

Ports/Properties

CosineOneCycleSchedule

Ports/Properties

CustomInitializer

Ports/Properties

CustomSchedule

Ports/Properties

CustomStep

Ports/Properties

CyclicCosineDecaySchedule

Ports/Properties

DPSGDStep

Ports/Properties

DeepModel

Ports/Properties

DenseLayer

Ports/Properties

DepthwiseConvolutionLayer

Ports/Properties

DepthwiseSeparableConvolutionLayer

Ports/Properties

Dropout

Ports/Properties

EmbeddingLayer

Ports/Properties

ExponentialDecaySchedule

Ports/Properties

FromageStep

Ports/Properties

GatedRecurrentUnitLayer

Ports/Properties

GradientClippingStep

Ports/Properties

GroupNorm

Module: `deep_learning`

`AMSGradStep`

`Activation`

`AdaBeliefStep`

`AdaFactorStep`

`AdagradStep`

`AdamStep`

`AdamWStep`

`AdamaxStep`

`AdamaxWStep`

`AdditiveNoiseAugmentation`

`AdditiveNoiseStep`

`AggregateStep`

`ApplyIfFiniteStep`

`BatchFlatten`

`BatchNorm`

`BatchReshape`

`CenteringStep`

`ChainedStep`

`ConstantInitializer`

`ConstantSchedule`

`ConstraintStep`

`ConvolutionLayer`

`CosineDecaySchedule`

`CosineOneCycleSchedule`

`CustomInitializer`

`CustomSchedule`

`CustomStep`

`CyclicCosineDecaySchedule`

`DPSGDStep`

`DeepModel`

`DenseLayer`

`DepthwiseConvolutionLayer`

`DepthwiseSeparableConvolutionLayer`

`Dropout`

`EmbeddingLayer`

`ExponentialDecaySchedule`

`FromageStep`

`GatedRecurrentUnitLayer`

`GradientClippingStep`

`GroupNorm`

`IdentityInitializer`

`InstanceNorm`

`LAMBStep`

`LARSStep`

`LayerNorm`

`LinearOneCycleSchedule`

`LinearSchedule`

`LongShortTermMemoryLayer`

`MixupAugmentation`

`MomentumStep`

`MultiHeadAttentionLayer`

`MultiplicativeNoiseAugmentation`

`NetDefine`

`NetForward`

`NetInitialize`

`NetMaterialize`

`NetShare`

`NetTransform`

`NetWeightArray`

`NoisySGDStep`

`NormalInitializer`

`NovoGradStep`

`NullStep`

`OptimisticGDStep`

`OrthogonalInitializer`

`PartitionedStep`

`PiecewiseConstantSchedule`

`PiecewiseInterpolatedSchedule`

`PolynomialSchedule`

`Pooling`

`RAdamStep`

`RMSNorm`

`RMSPropStep`

`RandomCapRotationAugmentation`