Module: bayesian

Nodes for Bayesian inference.

These nodes can be used to specify statistical models and perform Bayesian inference on them using a variety of algorithms. Simply put, the main use case is to quantify and propagate uncertainty through statistical analyses, This yields distributions over resulting quantities which may be further reduced to confidence intervals or other kinds of error bars, or which can also be propagated through most downstream computations. The main nodes to use are the Inference nodes, which perform the actual inference and are given a statistical (generative) model. The model itself is made up of Random Draw nodes interspersed with other nodes, and optionally With Stacked Variables, At Subscripts, and (rarely) Log Probability Term. The Sampler and Approx nodes are only used to override specific defaults used by one's Inference node.

AtSubscripts

Apply subsampling (if any) imposed by enclosing With Stacked Variables nodes (plates) to the given data.

This node is mainly for use in conjunction with plates that apply subsampling along their axis of interest (i.e., have their subsample option set). This is for use with "mini-batch" style stochastic inference, but can also be used to express that data is conceptually subscripted by the plate's index, even if the plate may not have subscripting enabled. The node will automatically detect the enclosing plate context(s), provided that it is situated inside the context as per NeuroPype's usual rules - that is, the node must be situated downstream of the plate subscript placeholder (this is often done with just an update-update edge from the Placeholder to the At Subscripts node). In case of nested plates, it is generally sufficient to connect only the innermost plate's subscript placeholder to the At Subscripts node since the entire context is nested within the outer plates (also note you can always check that the node is properly situated by (shift+)ctrl-clicking on the dotted edge going into the plate node in question). As long as the axes referenced by the enclosing plates are uniquely identified within the data (e.g., the data has no duplicate axes unless they have distinct labels), the operation is guaranteed to apply correctly. The node can be used with plain array data, but this considerably more difficult to configure: the event-space dimensionality must be set correctly (matching that of the distribution used by the Random Draw node that the data is subsequently wired into as an observation), and the axes must be in the positions (relative to event-space dimensions) expected by the enclosing plates, which can mean that the stacking dimension of those plates also needs to be set in accordance with the data. For these reasons it is recommended to use named axes (e.g., a Packet) when possible. The node also has limited support for partial mixing of data with axes passed into random draws that generate plain arrays: if the event-space dimensionality is set correctly, the node will reorder the axes to the positions expected by the plates and random draws, so that the user only needs to get the random draws and distribution parameters right.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • data
    Data to subscript.

    • verbose name: Data
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: INOUT
  • num_event_space_dims
    When applying to plain array data, this is the number of event-space dimensions in the data. This setting is not needed in the common case where the statistical model is set up to use named axes in both the data and any Random Draw nodes that the data will come in contact with.

    The number determines how many dimensions to the right of the data's array shape are set aside as event-space dimensions that are intrinsic to the distribution on hand (e.g., discrete class labels or dimensions of a multivariate distribution), while the rest are considered "batch" dimensions, to which plate subscripts are applied. When the data is a plain array or when the resulting data gets in contact with parts of the statistical model that use plain arrays (namely the random draws and their distributions that the result subscripted data should interact with), this must be set (the only case where this is inferred for plain arrays is when the plates necessarily bind all axes in the array, leaving 0 as the only possible number of event-space dimensions).

    • verbose name: Event-Space Dimensionality
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • on_missing_axes
    How to handle the case where a block or packet is lacking an axis that an enclosing plate context applies to. If set to 'insert', a dummy axis will be inserted into the block or packet to match the plate's expectations; this is most likely what the user wanted, since it ensures that the resulting data will broadcast correctly with other data that has been subscripted by the same plates. The option 'insert (as needed)' is the same but will not prepend any dummy axes; this will still broadcast correctly with other data but results in a "cleaner-looking" array shape. The default is set to 'warn and insert' not because it is generally safer, but because the operation may otherwise silently hide genuine errors where the user accidentally had mismatching axes between plates and the data. Setting this to 'error' can help catch cases where invalid data is somehow making its way into a model that has been set up to not generate these errors, for example in a complex pipeline or a model being used in multiple places.

    • verbose name: On Missing Axes
    • default value: warn and insert
    • port type: EnumPort
    • value type: str (can be None)
  • on_ambiguous_axes
    How to handle the case where a block or packet has an axis referenced by a plate that may be ambiguous (where the plate matches two or more axes in the data). Note that this is only a best practice and the check can have false positives that behave as the user intended: for example, two plates each referencing the same axis will resolve such that the outermost plate will bind the first axis of the type in the data and the next plate will bind the second axis of the type, which can be as intended. Another case is two axes that are entirely interchangeable as long as the operation applies to both of them, like in a covariance matrix. A third case is where one plate binds an axis ambiguously but there is at least one other plate and once those are accounted for, the remaining ambiguity is resolved since there is only one axis that could be matched. The message will reflect the likely degree of ambiguity. The user may set the option to 'ignore' to suppress the warning. The 'warn' option will emit a warning each time the node is executed, while 'warn-once' will emit a one-time warning during the session. The 'error' option will raise a ValueError exception. Ignore will suppress the warning.

    • verbose name: On Ambiguous Axes
    • default value: warn
    • port type: EnumPort
    • value type: str (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

BNAFApprox

A powerful variational approximation of the posterior using block-neural autoregressive flows (BNAF).

This is a comparatively flexible, yet parameter-efficient approximation of the posterior that can capture non-Gaussian, skewed, and possibly multimodal distributions (within limits). The method works by transforming a simple distribution (e.g., a unit Gaussian) into the posterior distribution by a series of invertible transformations. Since the underlying model is a type of neural network, it might be necessary to experiment with the choice of the optimizer and its learning rate to avoid issues with convergence.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • num_flows
    The number of flows to use. The number is best experimentally determined, where 5-10 flows are often a good starting point. In ML settings such parameters can in principle be set using the Parameter Optimziation node, but this will only be practical in cases where inference is very fast.

    • verbose name: Num Flows
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • hidden_units
    Number of hidden units per layer. The number of layers is determined by the length of the list, and the number of units per layer is determined by the values in the list. Good values are dependent on the dimensionality of the posterior N (sum of dimensions of all latent variables), and may be on the order of 2N to 10N.

    • verbose name: Hidden Units
    • default value: [8, 8]
    • port type: ListPort
    • value type: list (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

BarkerSampler

The Barker Metropolis-Hastings sampler is a useful fallback if HMC-type samplers (eg NUTS) diverge on some geometry, but will typically be slower, especially on high-dimensional problems.

The sampler trades robustness for speed by using a skewed proposal distribution according to the gradient of the log posterior. As such, this is a middle ground between classic Metropolis-Hastings and Hamiltonian Monte Carlo both in terms of sampling efficiency and sensitivity to bad geometry, but is mainly suitable for low to medium dimensional problems. The implementation is based on "The Barker proposal: combining robustness and efficiency in gradient-based MCMC" (2022).

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • mass_matrix_shape
    The shape of the inverse mass matrix. Uncorrelated uses a simplifying diagonal approximation, while all-to-all uses a full-rank ("dense") mass matrix. The latter can in principle handle correlated posterior variables more efficiently, but there are tradeoffs in terms of compute cost and required samples for convergence. Block-diagonal mass matrices as in the other samplers are not currently supported.

    • verbose name: Mass Matrix Shape
    • default value: uncorrelated
    • port type: EnumPort
    • value type: str (can be None)
  • desired_accept_prob
    The desired acceptance probability when performing step-size adaptation. The method will adjust the step size such that on average a step is accepted (within the posterior distribution) with this probability. Note that this is lower than in HMC due to the different sampling strategy. Typical values are 0.2-0.4 for high-dimensional problems and 0.4-0.7 for low-dimensional problems.

    • verbose name: Desired Accept Prob
    • default value: 0.4
    • port type: FloatPort
    • value type: float (can be None)
  • step_size
    The initial step size used by the Barker proposal. This normally does not need to be touched unless step size adaptation is disabled, or the adaptation diverges immediately even when using conservative choices for the init strategy in the inference node. The step size is often tuned based on the desired acceptance probability of the sampler (see parameter above for guidance as to what constitutes a good acceptance rate).

    • verbose name: Step Size
    • default value: 1.0
    • port type: FloatPort
    • value type: float (can be None)
  • adapt_step_size
    Whether to adapt the step size during warmup. This uses a dual averaging scheme.

    • verbose name: Adapt Step Size
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • adapt_mass_matrix
    Whether to adapt the mass matrix during warmup. This uses the Welford scheme.

    • verbose name: Adapt Mass Matrix
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

BayesNet

Instantiate a deep network inside a statistical model.

This node allows one to specify a subgraph inside a statistical model that is made up of neural network layers wired into the net input of the node (see documentation for the port for more details). The weights of these layers can then be associated with priors, which then defines a fully Bayesian neural network. Note that another commonly used combination of deep networks and bayesian inference is to use the network as part of a variational approximation to some other (perhaps conventionally defined) model, in which case the weights are treated as variational parameters to optimize, but this mode is not currently exposed since custom approximations cannot yet be wired into the variational inference node. During network bring-up it can be useful to start with a Laplace approximation and a fairly loose prior (e.g., Normal(0,10)) for the weights; the BayesNet node is typically situated in a With Stacked Variables context with subsampling enabled and set to a typical batch size (e.g., 16 or 32), and the inputs wired into the BayesNet data input are subsampled accordingly using the At Subscripts node. Note also that there is direct equivalent to the number of epochs to run; instead, the number of iterations is set in the variational inference node, which multiplies with the batch size to determine the total number of samples to draw (which, when divided by the training set size, gives the equivalent number of epochs). There is also currently no direct support for early stopping, but a (computationally expensive) substitute can be to make the iterations a hyper-parameter optimized via the Parameter Optimization node and set to a p-holdout train/test split.

Version 0.8.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • net
    Neural network graph. This is a graph of nodes starting with at least an input placeholder whose slotname must be set to match the name here (e.g., inputs), and which is followed by some combination of neural net nodes (nodes in the deep learning package) and nodes for math operations and/or data restructuring (no stateful nodes). Random Draw or With Stacked Variables (Plate) nodes cannot be used inside the net graph (if you need these inside the net, you need to declare more than one net, e.g., before and one after the Random Draw draw). The output of the network may be a packet or a plain array.

    • verbose name: Net
    • default value: None
    • port type: GraphPort
    • value type: Graph
  • net__signature
    If you need to accept additional arguments in the network (e.g ., is_training), you can list those in the signature.

    • verbose name: Net [Signature]
    • default value: (inputs)
    • port type: Port
    • value type: object (can be None)
  • seed
    Optional random seed. Only used if the net is applied in-place. If no seed is provided, one will be obtained from the context of the inference engine.

    • verbose name: Seed
    • default value: None
    • port type: DataPort
    • value type: AnyArray (can be None)
    • data direction: IN
  • apply
    Neural network graph that can be called with arguments after instantiation.

    • verbose name: Apply
    • default value: None
    • port type: GraphPort
    • value type: Graph
  • result
    Result of applying the network to the arguments. Only for in-place use of the network (i.e., at the site of declaration.

    • verbose name: Result
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: OUT
  • netname
    Name under which the parameters (and optionally state) of this network appear in the statistical model.

    • verbose name: Name
    • default value: mynet
    • port type: StringPort
    • value type: str (can be None)
  • priors
    Optional dictionary of prior distributions associated with specific named weight arrays of the network. This can either be specified textually as in the example (using the same format and distributions as available in the Random Sample node), or wired in as a dictionary with distribution nodes as the values. The dictionary keys can be wildcard strings, e.g., 'mylayer_*.b' to match specific sets of weights (be sure to review their naming in the learned variables to ensure that the wildcard matches what you expect). The weights being matched are typicalled named as in mylayer_eeg.w or mylayer_eeg.b, where mylayer is the layername, eeg is the name of the stream that the weights apply to, and w is one of the weight names of the layer for this stream. if the weight has more dimensions than the specified prior distribution, the leading dimensions are considered batch dimensions and a batch of samples will be drawn to match the dimensionality of the weight array. You can also use matrix- or tensor-variate prior distributions to draw full weight matrices. Note that you can also wire a prior distribution into the respective w_prior and b_prior ports of individual network layers, which take precedence over the dictionary-supplied priors, and which can be more explicit or less error-prone. Also note also that the prior is different from the weight initialization -- the initialization of the network weights (which needs to be delicately scaled) is unaffected by the presence of the priors. If no priors are specified for a given layer, then the parameters of that layer will be optimized as variational parameters (via variational inference), and if one is specified, then a posterior over the weights of that layer will be inferred.

    • verbose name: Default Weight Priors
    • default value: {'*': 'Normal(0,1)'}
    • port type: DictPort
    • value type: dict (can be None)
  • help
    Help text for the network. This can be used to annotate the purpose/meaning of the network in the context of a statistical model.

    • verbose name: Help
    • default value:
    • port type: StringPort
    • value type: str (can be None)
  • verbose_name
    Optional verbose name for the network. Can be used for augmented human-readable output.

    • verbose name: Verbose Name
    • default value: None
    • port type: StringPort
    • value type: str (can be None)
  • usage
    Intended usage of the network. If the network is used as a model (the default), then each weight tensor in the network must have an associated prior distribution. If instead the network is used as part of a custom variational approximation, then this is not necessary, but still allowed, as long as those distributions are parameterized by variational parameters. For this reason, the priors dictionary, if used, should not use textually defined priors but wired-in distributions. Note that as of NeuroPype 2025, the variational mode is not fully supported.

    • verbose name: Used In
    • default value: model
    • port type: EnumPort
    • value type: str (can be None)
  • arg1
    Positional Argument 1.

    • verbose name: Arg1
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • arg2
    Positional Argument 2.

    • verbose name: Arg2
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • arg3
    Positional Argument 3.

    • verbose name: Arg3
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • arg4
    Positional Argument 4.

    • verbose name: Arg4
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • arg5
    Positional Argument 5.

    • verbose name: Arg5
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • arg6
    Positional Argument 6.

    • verbose name: Arg6
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • arg7
    Positional Argument 7.

    • verbose name: Arg7
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • arg8
    Positional Argument 8.

    • verbose name: Arg8
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • arg9
    Positional Argument 9.

    • verbose name: Arg9
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • argN
    Additional positional arguments.. .

    • verbose name: Argn
    • default value: None
    • port type: DataPort
    • value type: list (can be None)
    • data direction: IN
  • kw_args
    Keyword arguments.

    • verbose name: Kw Args
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: IN
  • arg0
    Positional Argument 0.

    • verbose name: Arg0
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: IN
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

DerivedVariable

Capture a derived value in a statistical model under a given name.

This can be used to record certain derived information computed in the model as a variable that will later be available in some model outputs (e.g., model predictions).

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • value
    Value of the parameter.

    • verbose name: Value
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: INOUT
  • varname
    Name of the variable. The variable can be accessed under this name when analyzing derived distributions such as the predictive distribution.

    • verbose name: Parameter Name
    • default value: myvar
    • port type: StringPort
    • value type: str (can be None)
  • desc
    Description for the variable. This can be used to annotate the purpose/meaning of the variable in the context of a statistical model. This is often a single sentence.

    • verbose name: Description
    • default value:
    • port type: StringPort
    • value type: str (can be None)
  • verbose_name
    Optional verbose name for the variable. Can be used for augmented human-readable output.

    • verbose name: Verbose Name
    • default value: None
    • port type: StringPort
    • value type: str (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

DiscreteHMCGibbsSampler

A hybrid discrete/continuous sampler that uses Gibbs updates to sample from discrete sites along with an underlying Hamiltonian Monte Carlo (HMC)-type sampler for the continuous variables.

Along with MixedHMC, which has a similar structure, DiscreteHMC (DHMC) is one of two options when dealing with models that have both continuous and discrete variables; of the two, DHMC is applicable to models with high-dimensional continuous variables without much tuning, since it is compatible with NUTS (which is relatively tuning-free), but compared to MixedHMC is relatively less efficient on high-dimensional discrete states (e.g., multiple discrete variables, each with several states, which multiply to yield the total state space size), since it uses a more "naive" (or brute-force) update scheme. In contrast, MixedHMC is likely more efficient in such settings, but likely requires careful tuning of the underlying HMC sampler in terms of its trajectory length in particular. Due to these tradeoffs, it is recommended to compare efficiency between these different sampler types for your specific model.

More Info...

Version 0.8.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • method
    Sampler to use for the continuous variables. If NUTS is used, the max tree depth parameters apply, and if HMC is used, the trajectory length applies.

    • verbose name: Method
    • default value: NUTS
    • port type: EnumPort
    • value type: str (can be None)
  • mass_matrix_shape
    The shape of the inverse mass matrix. This determines the efficiency with which the sampler can explore the posterior parameter space in case one or more variables may have highly correlated posterior distributions. The 'uncorrelated' form uses a simplifying diagonal approximation that is computationally cheap per update but will not handle strong correlations very well so may require more updates, while 'all-to-all' uses a full-rank ("dense") mass matrix. The latter can in principle handle correlated posterior variables more efficiently and require fewer updates, but at increased computational cost per update, especially if the parameter space is high dimensional.

    An in-between is the blockwise form: this allows one to express that one or more individual variables in the model that are multivariate may each have a correlated posterior distribution, without also assuming that the variables are necessarily correlated with each other. This is written as in blockwise(myvar1,myvar2) where myvar1 and myvar2 are the names of the random variables in question; all other variables will have diagonal entries in the (overall block-diagonal) mass matrix. The final form is one in which one may denote sets of variables that could have strong mutual correlations (in addition to their own internal correlations), which is written as in blockwise(myvar1/myvar2,myvar3) where myvar1 and myvar2 are assumed to be correlated with each other but not with myvar3, but myvar3 may have its own correlation structure. Any unlisted variables again receive diagonal entries in the mass matrix.

    • verbose name: Mass Matrix Shape
    • default value: uncorrelated
    • port type: ComboPort
    • value type: str (can be None)
  • max_tree_depth
    The maximum depth of the doubling scheme used in this sampler. Models with especially complex parameter spaces or posterior geometry can benefit from a larger value here.

    • verbose name: Max Tree Depth (If Nuts)
    • default value: 10
    • port type: IntPort
    • value type: int (can be None)
  • max_tree_depth_postwarmup
    The maximum depth during the post-warmup phase. If not provided, this is the same as max_tree_depth.

    • verbose name: Max Tree Depth (Post-Warmup)
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • trajectory_length
    The length of the particle trajectory to simulate. This value may need to be chosen carefully to avoid either too short (non-converging) or too long (wasteful) trajectories. The default is 2*pi (a full circle). See tips in the MCMC Inference node for how to tune this parameter using a combination of diagnostics and optionally trace plots of the sampler state history.

    • verbose name: Trajectory Length (If Hmc)
    • default value: 6.283185
    • port type: FloatPort
    • value type: float (can be None)
  • modified
    Use the modified Gibbs sampler aka Metropolised Gibbs sampler.

    • verbose name: Use Modified Gibbs Sampler
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • random_walk
    If enabled, samples are drawn uniformly from the support of the discrete variables. Otherwise the draw is conditioned on the other variables, which tends to be more efficient if the discrete variables are correlated with the continuous variables.

    • verbose name: Random Walk
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • desired_accept_prob
    The desired acceptance probability for step-size adaptation. This will adjust the step size such that on average a step is accepted (within the posterior distribution) with this probability. Increasing this value will lead to a smaller step size, thus the sampling will be slower but more robust.

    • verbose name: Desired Accept Prob
    • default value: 0.8
    • port type: FloatPort
    • value type: float (can be None)
  • num_steps
    Optionally a fixed number of steps to take.

    • verbose name: Num Steps
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • step_size
    The initial step size used by the underlying sampler. This normally does not need to be touched unless step size adaptation is disabled, or the adaptation diverges immediately even when using conservative choices for the init strategy in the inference node, and having tried the step size heuristic option in the sampler. The step size is often tuned based on the acceptance rate of the sampler, where 0.8 is usually a good target (if greater than 0.8, the step size could be decreased and if less than 0.8, the step size could be increased).

    • verbose name: Step Size
    • default value: 1.0
    • port type: FloatPort
    • value type: float (can be None)
  • adapt_step_size
    Whether to adapt the step size during warmup. This uses the dual-averaging scheme.

    • verbose name: Adapt Step Size
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • adapt_mass_matrix
    Whether to adapt the mass matrix during warmup. This uses the Welford scheme.

    • verbose name: Adapt Mass Matrix
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • step_size_heuristic
    Whether to use a heuristic to adjust the step size at the beginning of each adaptation. This uses the doubling/halving strategy proposed in "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo" (2014). This heuristic can speed up initial convergence but introduces its own computational cost. Another use case is that it can help with instant divergence due to a bad choice of initial step size, if the model has steep gradients or other pathological features.

    • verbose name: Pretune Step Size
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • regularize_mass_matrix
    Wether to apply regularization to the mass matrix. This is generally recommended, particularly for higher-dimensional posterior distribution, since otherwise sampling may become unstable.

    • verbose name: Regularize Mass Matrix
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

HMCSampler

The classic Hamiltonian Monte-Carlo sampler (aka Hybrid Monte Carlo) for use with MCMC inference.

This sampler is the "grandfather" of many more modern samplers such as NUTS. Since the latter is typically easier to use out of the box, it is typically preferred when it is applicable, and in contrast, HMC is of interest either when replicating existing literature that employs HMC specifically, or when used in a hybrid sampling context that is not directly compatible with NUTS (the main example being the MixedHMC sampler). In such cases the user should expect to have to tune both the step size and the trajectory length, which can be a bit more involved than with NUTS, and which may require some experimentation. See also notes in the MCMC inference node for how to tune this. HMC exploits gradient information from the posterior distribution to more efficiently explore the parameter space and amounts what is essentially a Hamiltonian dynamics simulation, which can be likened to a rolling ball in a potential energy landscape, where more probable regions of the posterior are lower-lying (i.e., lower energy); cf. also the history variables that can optionally be retrieved after inference (in the MCMC Inference node) for more context. HMC is a very powerful technique that is vastly more efficient than gradient-free samplers such as Metropolis-Hastings. Like NUTS, it is particularly suitable for models with high-dimensional continuous variables, but the efficiency comes at the price of potentially somewhat reduced robustness to bad geometry, which manifests in the form of divergences (e.g., NaN results etc). In such cases, the user should consider falling back to a more robust sampler like Barker. There exist also gradient-free samplers that will work for non-differentiable models, but these are not currently exposed in NeuroPype Note that, if the model contains subsampled plates (via the With Stacked Variables node), this sampler automatically uses an energy-conserving subsampling scheme (HMCECS). Tip: in low-dimensional models, it can be useful to use a full-rank mass matrix, which can help with sampling efficiency.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • trajectory_length
    The length of the particle trajectory to simulate. This value may need to be chosen carefully to avoid either too short (non-converging) or too long (wasteful) trajectories. The default is 2*pi (a full circle). See tips in the MCMC Inference node for how to tune this parameter using a combination of diagnostics and optionally trace plots of the sampler state history.

    • verbose name: Trajectory Length
    • default value: 6.283185
    • port type: FloatPort
    • value type: float (can be None)
  • mass_matrix_shape
    The shape of the inverse mass matrix. This determines the efficiency with which the sampler can explore the posterior parameter space in case one or more variables may have highly correlated posterior distributions. The 'uncorrelated' form uses a simplifying diagonal approximation that is computationally cheap per update but will not handle strong correlations very well so may require more updates, while 'all-to-all' uses a full-rank ("dense") mass matrix. The latter can in principle handle correlated posterior variables more efficiently and require fewer updates, but at increased computational cost per update, especially if the parameter space is high dimensional.

    An in-between is the blockwise form: this allows one to express that one or more individual variables in the model that are multivariate may each have a correlated posterior distribution, without also assuming that the variables are necessarily correlated with each other. This is written as in blockwise(myvar1,myvar2) where myvar1 and myvar2 are the names of the random variables in question; all other variables will have diagonal entries in the (overall block-diagonal) mass matrix. The final form is one in which one may denote sets of variables that could have strong mutual correlations (in addition to their own internal correlations), which is written as in blockwise(myvar1/myvar2,myvar3) where myvar1 and myvar2 are assumed to be correlated with each other but not with myvar3, but myvar3 may have its own correlation structure. Any unlisted variables again receive diagonal entries in the mass matrix.

    • verbose name: Mass Matrix Shape
    • default value: uncorrelated
    • port type: ComboPort
    • value type: str (can be None)
  • desired_accept_prob
    The desired acceptance probability for step-size adaptation. This will adjust the step size such that on average a step is accepted (within the posterior distribution) with this probability. Increasing this value will lead to a smaller step size, thus the sampling will be slower but more robust.

    • verbose name: Desired Accept Prob
    • default value: 0.8
    • port type: FloatPort
    • value type: float (can be None)
  • num_steps
    Optionally a fixed number of steps to take.

    • verbose name: Num Steps
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • step_size
    The initial step size used by the underlying Verlet integrator. This normally does not need to be touched unless step size adaptation is disabled, or the adaptation diverges immediately even when using conservative choices for the init strategy in the inference node, and having tried the step size heuristic option in the sampler. The step size is often tuned based on the acceptance rate of the sampler, where 0.8 is usually a good target (if greater than 0.8, the step size could be decreased and if less than 0.8, the step size could be increased).

    • verbose name: Step Size
    • default value: 1.0
    • port type: FloatPort
    • value type: float (can be None)
  • adapt_step_size
    Whether to adapt the step size during warmup. This uses the dual-averaging scheme.

    • verbose name: Adapt Step Size
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • adapt_mass_matrix
    Whether to adapt the mass matrix during warmup. This uses the Welford scheme.

    • verbose name: Adapt Mass Matrix
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • step_size_heuristic
    Whether to use a heuristic to adjust the step size at the beginning of each adaptation. This uses the doubling/halving strategy proposed in "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo" (2014). This heuristic can speed up initial convergence but introduces its own computational cost. Another use case is that it can help with instant divergence due to a bad choice of initial step size, if the model has steep gradients or other pathological features.

    • verbose name: Pretune Step Size
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • plate_subsampling
    Use energy-conserving subsampling of plates (HMCECS). It is recommended to leave this set to auto, in which case it will automatically apply if the model contains subsampled plates, and otherwise not. One may set it to off to be sure that one uses a plain HMC sampler, where that is of interest. When used, this is according to "Hamiltonian Monte Carlo with energy conserving subsampling" (2019). (Algorithm 1).

    • verbose name: Plate Subsampling
    • default value: auto
    • port type: EnumPort
    • value type: str (can be None)
  • plate_subblocks
    Optionally the number of sub-blocks into which to partition any subsampled plates. This is analogous to the Block Pseudo-Marginal Sampler. This can help with increasing the acceptance rate if some of the sub-blocks are more difficult to sample from than others. If enabled, this follows "The Block Pseudo-Margional Sampler" (2017).

    • verbose name: Plate Subblocks
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • subsample_controL_variate
    The control variate to use when performing plate subsampling. Bardanet is according to "Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach" (2014) and Betancourt is per "The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling" (2015). The latter is the suggested default.

    • verbose name: Subsample Control Variate
    • default value: betancourt
    • port type: EnumPort
    • value type: str (can be None)
  • regularize_mass_matrix
    Wether to apply regularization to the mass matrix. This is generally recommended, particularly for higher-dimensional posterior distribution, since otherwise sampling may become unstable.

    • verbose name: Regularize Mass Matrix
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

LogProbabilityTerm

Insert an additive term into the log probability of the ambient statistical model.

This is equivalent to a multiplicative factor in the model's probability density (or mass) function. This can be used as a shorthand for a perhaps much more complicated calculation or (set of) Random Draw statements involving distributions that may not be directly supported by NeuroPype Implementation-wise this is equivalent to a Random Draw from a special dummy distribution that returns the desired log-probability value.

Version 0.9.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • log_term
    Value of the term (in log space). The corresponding exponentiated value factors into the probability density (or mass) function of the model.

    • verbose name: Log Term
    • default value: None
    • port type: Port
    • value type: AnyNumeric (can be None)
  • varname
    Name of the factor, for bookkeeping during model analysis.

    • verbose name: Factor Name
    • default value: myfactor
    • port type: StringPort
    • value type: str (can be None)
  • desc
    Description for the factor. This can be used to annotate the purpose/meaning of the factor in the context of a statistical model. This is often a single sentence.

    • verbose name: Description
    • default value:
    • port type: StringPort
    • value type: str (can be None)
  • verbose_name
    Optional verbose name for the factor. Can be used for augmented human-readable output.

    • verbose name: Verbose Name
    • default value: None
    • port type: StringPort
    • value type: str (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • axis_pairing
    How to pair axes of the inputs with respect to any ambient plates. In 'positional' mode, axes are paired by their position according to a right alignment, that is, the last axis of the first operand is paired with the last axis of the second operand, and so on, while any missing axes behave as if they were unnamed axes of length 1 (this is the same way plain n-dimensional arrays pair in Python/numpy). In 'matched' mode, axes are paired by their type and optionally label, where the axis order of the first operand is preserved in the output, optionally with additional axes that only occur in the second operand prepended on the left. The other operand then has its axes reordered to match. All axis classes are treated as distinct, except for the plain axis, which is treated as a wildcard axis that can pair with any other axis. See also the 'label_handling' property for how labels are treated in this mode. The 'default' value resolves to a value that may be overridden in special contexts (mainly the ambient Inference node) and otherwise resolves to the setting of the configuration variable default_axis_pairing, which is set to 'positional' in 2024.x. Note that axis pairing can be subtle, and it is recommended to not blindly trust that the default behavior is always what the user intended.

    • verbose name: Axis Pairing
    • default value: default
    • port type: EnumPort
    • value type: str (can be None)
  • label_pairing
    How to treat axis labels when pairing axes in 'matched' mode. In 'always' mode, labels are always considered significant, and axes with different labels are always considered distinct, which means that, if the two operands each have an axis of same type but with different labels, each operand will have a singleton axis inserted to pair with the respective axis in the other operand. In 'ignore' mode, labels are entirely ignored when pairing axes; this means that, if multiple axes of the same type occur in one or more operands, the last space axis in the first operand is paired with the last space axis in the second operand, etc. as in positional mode. In 'auto' mode, labels are only considered significant if they are necessary for distinguishing two or more axes of the same type in any of the operands, or if they occur on a plain axis.

    • verbose name: Label Pairing
    • default value: auto
    • port type: EnumPort
    • value type: str (can be None)

MCLMCInference

Apply Bayesian inference given data and a wired-in statistical model, using the Micro-Canonical Langevin Monte Carlo (MCLMC) approach.

This method has a similar or potentially somewhat better performance characteristic as the MCMC Inference node in conjunction with the NUTS sampler, but the specific behavior will depend on the model and data. MCLMC may also be somewhat more GPU-friendly than the samplers offered by the MCMC node. This makes the MCLMC method a good second choice to try if the NUTS sampler diverges or is too slow, especially with high-dimensional problems and time-consuming analyses. This method cannot currently handle discrete latent variables (but discrete observed variables are fine). The MCLMC node is relatively robust and tuning-free, and comes in two main variants: unadjusted and Metropolis-adjusted. Typically the unadjusted variant is understood to be more efficient, and the variant can be fine tuned to a limited extent via the desired energy variance parameter and a few related parameters (see parameter group). The Metropolis-adjusted variant is instead tuned via the desired acceptance probability and related parameters, and is asymptotically unbiased, but can be computationally more expensive, approaching that of Metropolis-Hastings MCMC (e.g., NUTS). The node offers a range of performance options mainly for the multi-chain mode, which is primarily of interest for convergence diagnostics and monitoring (a good setting is 4 chains). MCLMC provides the usual MCMC-type diagnostics and numerous history fields that play mostly the same role as in MCMC, and which can be useful for understanding the behavior and performance of the sampler. Consider disabling the automatic retries during model development since those can mask other issues. Additional tuning knobs are the number of warmup and posterior samples to generate, and the init strategy. For additional model troubleshooting tips, see also the documentation of the MCMC node. Like the other Inference nodes, this node "updates" a prior distribution (as formalized by a statistical model) given observed data, yielding a (joint) posterior distribution over the latent (unobserved) variables in the model. The posterior can be accessed for further analysis via the node's samples outputs port. The node evaluates diagnostics by default (but keep in mind that the main diagnostics require multiple chains to be configured, which is not the default). Once a posterior has been inferred, the node can also be used for predictve inference of dependent variables when new data is passed in, and these predictions are available over the data output port, so the node can be used as a drop-in replacement for other machine-learning nodes in NeuroPype; the inference result can also be saved and loaded back in via the model port. The statistical model is the portion of the graph that is wired into the "graph" input port and runs under the control of the node; see the "Graph [signature]" documentation (tooltip) for more detail on how statistical models can be built using nodes such as Random Draw or With Stacked Variables. Also like the other inference nodes, the resulting distribution is represented by a collection of samples whose distribution matches that of the posterior; these can be scatter-plotted, summarized via moment or quantile statistics, or propagated through downstream computations. Like other GPU-compatible ML nodes in NeuroPype, the wired-in input data is slightly reorganized for more efficient compute, and among others, the instance axis of your data will appear as a separate stream with a feature axis (indexing the instance-axis fields such as the TargetValue) and a (blank) dummy instance axis indexing the instances. This can be further configured via options in the "Input" category.

More Info...

Version 0.9.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • data
    The data to process.

    • verbose name: Data
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: INOUT
  • samples
    Generated posterior samples in the desired format.

    • verbose name: Samples
    • default value: None
    • port type: DataPort
    • value type: Packet (can be None)
    • data direction: OUT
  • history
    History trace of inference process.

    • verbose name: History
    • default value: None
    • port type: DataPort
    • value type: Packet (can be None)
    • data direction: OUT
  • graph
    The graphical model to use.

    • verbose name: Graph
    • default value: None
    • port type: GraphPort
    • value type: Graph
  • graph__signature
    Argument names accepted by the wired-in graphical model (statistical model). This is a listing of the names of input arguments of your statistical model, which is typically a single argument conventionally named "data" (although you can choose your own name) followed by a catch-all tilde symbol, as in (data,~). This data argument then receives the same data given to the inference node (via its data input), in most cases after minimal restructuring for efficient processing (as per the input options).

    In addition to or instead of this argument, the model may accept keyword arguments (generally listed after a + separator), which may match the name of a stream (such as 'eeg') in the input data packet (in which case a single-stream packet will be assigned to that argument), or the name of an axis field (e.g., 'TargetValue') of an axis that was promoted to a stream (see the corresponding option under Input, which defaults to promoting the instance axis). In such a case, the variable receives that stream's Block, but reduced to only the axis field in question. An example signature that does this is (data, +, TargetValue, ~) and a signature that binds only a stream named 'eeg' to a keyword argument instead of receiving the data as a whole is (+, eeg, TargetValue, ~). The model may also accept an argument conventionally named 'is_training' either as a second positional argument or as a keyword argument, which will receive False if the model is used in a prediction capacity, and True otherwise.

    Your model is generally a graph that begins with one or more Placeholders nodes whose slotname must match a name listed here, and which are followed by a series of nodes that collectively specify your graphical model. The final node of your network must be wired into the "graph" input of the inference node (note that in graphical UIs, the edge that goes into the "graph" input will be drawn in dotted style to indicate that this is not normal forward data flow, but that the graph runs under the control of the inference node). However, the return value of your model has no special meaning since the inference node is generally only concerned with named random variables occurring in the model (i.e., the varnames in Random Draw nodes).

    A simple strategy for building a graphical model is to work backwards from the data that you wish to explain or model, and the core tool at your disposal is the Random Draw node, which specifies that the data you want to model (wired in to the obs input port) is the result of a random draw from some (specifiable) distribution (see Distributions package). Models become hierarchical by using distribution parameters that are themselves the output of a random draw. Other useful nodes are the With Stacked Variables node and the associated At Subscripts node. A wide range of other nodes may be used (e.g., math operations, formatting) to complicate the dependency structure among variables in the model.

    • verbose name: Graph [Signature]
    • default value: (data,~)
    • port type: Port
    • value type: object (can be None)
  • n_eff
    The effective sample size n_eff of the posterior samples.

    • verbose name: N Eff
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: OUT
  • autocorr
    The autocorrelation of the posterior samples.

    • verbose name: Autocorr
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: OUT
  • gelman_rubin
    The Gelman-Rubin diagnostic R-hat of the posterior samples.

    • verbose name: Gelman Rubin
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: OUT
  • split_gelman_rubin
    The Split Gelman-Rubin diagnostic R-hat of the posterior samples.

    • verbose name: Split Gelman Rubin
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: OUT
  • num_warmup
    The number of warmup samples to draw. This is used to tune the momentum decoherence scale L and the step size epsilon. These samples are only used for initial convergence and are discarded from the model, and there is no memory cost associated with them. Burn-in is usually quite fast per sample, and 10k samples can relatively easily be drawn on a GPU.

    • verbose name: Number Of Warmup Samples
    • default value: 10000
    • port type: IntPort
    • value type: int (can be None)
  • num_samples
    The number of samples to draw from the posterior distribution. This does not count the warmup samples. Memory utilization is proportional to this, so high-dimensional models will be limited in this. Note that the Langevin process draws one sample per step, whereas HMC/NUTS take multiple steps until a sample is drawn; therefore, MCLMC is considerably faster at generating a given desired number of samples.

    • verbose name: Number Of Posterior Samples
    • default value: 1000
    • port type: IntPort
    • value type: int (can be None)
  • thinning_factor
    The factor by which to reduce the emitted samples. This can be used to reduce the data volume without compromising the quality of the approximation, since it eliminates serially correlated samples. This is also called "thinning" in MCMC terms.

    • verbose name: Thinning Factor
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • num_chains
    The number of parallel (independent) markov chains to run. This can be used to accelerate the rate at which samples are drawn, but is also a powerful tool for monitoring convergence of all chains to the same distribution.

    • verbose name: Number Of Chains
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • metropolized
    Whether to use the Metropolis-adjusted version of the sampler; yielding a form of Metropolis-Hastings Microcanonical Hamiltonian Monte Carlo (MH-MHMC). Enabling this causes the algorithm to be asymptotically unbiased, at some (perhaps considerable) extra computational cost. Since the bias of unadjusted MCLMC is not typically very large, there is little practical advantage to enabling this step per se, except for the fact that, when enabled, an alternative tuning strategy is used that employs the desired acceptance probability like in other MCMC samplers. That parameter can be easier to interpret or tune than the desired energy variance (which is the tunable parameter that is used with unadjusted MCLMC).

    • verbose name: Metropolis-Adjusted
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • tuning_fractions
    The fractions of the warmup steps to allocate to each of the three stages of parameter tuning. Note the remainder leaves additional samples to burn in the sampler state. These typically do not need adjustment.

    • verbose name: Tuning Fractions
    • default value: [0.1, 0.1, 0.1]
    • port type: ListPort
    • value type: list (can be None)
  • diagonal_preconditioning
    Whether to use diagonal preconditioning for the tuning of the step size. This is generally recommended for high-dimensional problems.

    • verbose name: Diagonal Preconditioning
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • desired_energy_variance
    The desired energy variance for MCLMC to aim for while auto-tuning. It is recommended to start with a small value (like the default) and gradually increase it (for example by doubling it) if you see the chain stalling or taking a very long time.

    • verbose name: Target Energy Variance
    • default value: 0.0005
    • port type: FloatPort
    • value type: float (can be None)
  • trust_in_estimate
    The trust in the estimate of the optimal step size. This impacts how quickly the algorithm will adjust to the estimated step size.

    • verbose name: Trust Factor
    • default value: 1.5
    • port type: FloatPort
    • value type: float (can be None)
  • num_effective_samples
    The number of effective samples to aim for when tuning the step size. A value greater than 100 is generally considered adequate for moderate-dimensional problems at reasonable fidelity. For very high-fidelity fits and post-hoc statistics or when dealing with very high-dimensional models, values as large as 1000 could be used.

    • verbose name: Num Effective Samples
    • default value: 150
    • port type: IntPort
    • value type: int (can be None)
  • desired_accept_prob
    The desired acceptance probability for step-size adaptation. The default is generally a good recommendation, but this parameter can be adapted by the user. This will adjust the step size such that on average a step is accepted (within the posterior distribution) with this probability. Increasing this value will lead to a smaller step size, thus the sampling will be slower but more robust.

    • verbose name: Target Accept Prob.
    • default value: 0.9
    • port type: FloatPort
    • value type: float (can be None)
  • random_trajectory_length
    Whether to use a randomized trajectory length for the Langevin dynamics. Recommended.

    • verbose name: Random Trajectory Length
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • adapt_to
    Whether to adapt to the average-case or worst-case posterior variance. The authors recommend using the average-case adaptation, which achieves a more balanced adaptation too different aspects of the posterior distribution. However, if the inference struggles with one variable in particular, it could be worth a try to switch to worst-case optimization. The rule amounts to whether the average or maximum eigenvalue of the covariance matrix is used for the adaptation.

    • verbose name: Adapt To
    • default value: average-case
    • port type: EnumPort
    • value type: str (can be None)
  • num_adaptation_passes
    The number of adaptation passes ("windows)" to use. Typically one is enough, although a seriously struggling model might benefit from more as a last resort.

    • verbose name: Num Adaptation Passes
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • trajectory_expansion_factor
    The factor by which the trajectory is expanded relative to the empirically estimated trajectory length. This is typically somewhat larger than 1.0 to account for the fact that the empirical estimate may be overly conservative. A larger value will explore the posterior somewhat more aggressively, but may also lead to more divergences. The default is a good starting point, and this parameter may in general not need much (if any) tuning, except as a last resort tuning knob.

    • verbose name: Trajectory Expansion Factor
    • default value: 1.3
    • port type: FloatPort
    • value type: float (can be None)
  • multiprocess_backend
    Backend to use for running multiple chains across multiple (CPU) processes. Multiprocessing is the simple Python default, which is not a bad start. Nestable is a version of multiprocessing that allows your pipeline to itself use parallel computation. If you are getting an error that "daemonic processes cannot have children", the most likely cause is that you have two nested parallel loops, and at least the outer loop needs to be set to nestable. Loky is a fast and fairly stable backend, but it does not support nested parallelism and has different limitations than multiprocessing. It can be helpful to try either if you are running into an issue trying to run something in parallel. Serial means to not run things in parallel but instead in series (even if num_procs is >1), which can help with debugging. Threading uses Python threads in the same process, but this is not recommended for most use cases due to what is known as GIL contention.

    • verbose name: Multi-Chain Backend
    • default value: serial
    • port type: EnumPort
    • value type: str (can be None)
  • num_procs
    Number of processes to use for running parallel chains. If None, the global setting NUM_PROC, which defaults to the number of CPUs on the system, will be used.

    • verbose name: Max Parallel Processes
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • num_threads_per_proc
    Number of threads to use for each process when running parallel chains. This can be used to limit the number of threads by each process to mitigate potential churn.

    • verbose name: Threads Per Process
    • default value: 4
    • port type: IntPort
    • value type: int (can be None)
  • num_procs_per_gpu
    Number of chains to run on each GPU. This is only relevant if you have GPU compute backends enabled. If your GPU(s) are under-utilized during cross-validation, you can increase this to run this many chains on each GPU.

    • verbose name: Processes Per Gpu
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • serial_if_debugger
    If True, then if the Python debugger is detected, the node will run in serial mode, even if multiprocess_backend is set to something else. This is useful for debugging, since the debugger does not work well with parallel processes. This can be disabled if certain steps should nevertheless run in parallel (e.g., to reach a breakpoint more quickly).

    • verbose name: Serial If Debugger
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • conserve_memory
    The auto option defaults to 'on' if the multi0chain backend is set to multiprocessing or nestable, and 'off' otherwise. If enabled, then the memory will be cleared periodically across loop iterations. This is useful if you are running out of memory during a run, but it will slow down the computation somewhat. The aggressive mode will force memory to be cleared after each iteration, which has the highest overhead, but guarantees a minimal memory footprint. However, the multiprocessing backend is known to occasionally hang in this mode so this may be reverted to on if you experience hangs. The off mode will leave memory reclamation to the individual nodes and scheduler behavior within the loop body, which may be further tuned using e.g. the default_conserve_memory config-file setting or per-node settings.

    • verbose name: Conserve Memory
    • default value: auto
    • port type: EnumPort
    • value type: str (can be None)
  • max_retries
    The maximum number of times to retry the fitting in case of divergence. Note that this can mask problems with the model, so it is a good idea to monitor the logs for potential retries having happened.

    • verbose name: Max Retries
    • default value: 5
    • port type: IntPort
    • value type: int (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • init_strategy
    Override the initialization strategy for distribution parameters, optionally for individual (sets of) variables. Some choices accept a parameter that can also be omitted (the default is shown in the drop-down examples). Uniform(r) initializes each parameter to a uniform range within -r to +r in the "unconstrained" domain, which is very frequently a good default, although at times the system may have to make multiple attempts to find a feasible starting point. At times, models can require a more conservative choice, for example median(n) initializes to the median of n samples drawn from the prior distribution, mean uses the expected value, sample draws a random sample, feasible initializes to a fixed trivially feasible point (e.g., 0), and value(x) initializes to a given fixed value x. This can also be specified on a per-variable basis (but typically not necessary for well-specified models), using a dictionary syntax where the variable name (or a wildcard pattern) is used as the key and the value is the initialization form. See also drop-down options for concrete examples.

    • verbose name: Init Strategy
    • default value: uniform
    • port type: ComboPort
    • value type: str (can be None)
  • promote_axis
    If given, promote axes of the specified type to their own stream using the Promote Axis to Stream node. This adds a new stream to the data packet that holds the data of the selected axis; It has a feature axis that indexes the fields of the axis (e.g., 'TargetValue', 'times', etc) and a (blank) dummy copy of the axis as the second dimension, and the axis "payload" (i.e. numeric content) is then accessible in the stream's two-dimensional data array like any other data modality. This representation is generally used in contexts that employ high-performance (multicore or GPU-accelerated) computation, including the Deep Model, Convex Model, and Inference nodes. This can be disabled by leaving the option blank (but note that doing so can incur heavy performance overhead each time the axis content changes from one use of the node to another, e.g., in a cross-validation). You can also disable this you wish to manually use the Promote Axis to Stream or other formatting node beforehand. The new stream is by convention named the same as the axis (i.e., 'instance'), unless there are multiple different streams with matching axes that are not equal to each other, in which case one stream per source stream is created and is named 'streamname-axisname'. The original axis will be cleared to all-defaults for the purpose of running the model graph, but will be restored afterwards from the stream (if modified during inference) or the original axis.

    • verbose name: Promote Axis To Stream
    • default value: instance
    • port type: ComboPort
    • value type: str (can be None)
  • pass_metadata
    Graph accesses marker-stream or props metadata. If this is selected, then marker streams and stream (chunk) properties will be preserved when the input data enters the model graph. If this is deselected (the default), both any marker streams and all but a few essential chunk properties will be removed for the purposes of inference (which avoids unnecessary recompilation of the model graph, but the model will not have access to these details) and are restored afterwards.

    • verbose name: Pass Marker/props Metadata
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • data_format
    The type of statistics to output in the main data output packet (for prediction use cases). This can either be a comma-separated list of summary statistics, in which case the data output will contain a statistic axis along which the respective statistics are itemized, or it can be set to distribution, in which case the output will have a distribution axis along which the desired number of posterior samples are enumerated. The former is useful for a simple estimate-with-error-bars type of representation, which can then be used for plotting etc. If a single location statistic is used (e.g., mean or median), then the output will be largely compatible with the output of a conventional machine learning node (which tend to output a posterior mode or maximum-likelihood estimate depending on the type of model), and the inference node can be used as a drop-in replacement for such nodes. When error statistics are included (e.g., stddev, mad, or ci90), only a few successor nodes will correctly interpret or preserve this output, for example some of the plotting nodes that can display error bars; other numeric nodes will either ignore these statistics (e.g., MeasureLoss) or will act on all statistics using the same operation, which is typically not correct. Instead, if further downstream computations are to be performed with the output of the inference node, it is recommended to use the distribution output, since posterior samples will propagate through most mathematical operations correctly, i.e., the result will be a distribution over the result of the successor operation(s).

    • verbose name: Data Output Statistics
    • default value: distribution
    • port type: ComboPort
    • value type: str (can be None)
  • num_pred_samples
    Number of samples to use when computing predictive statistics (content of the data output). Note that for machine-learning use cases, i.e., generation of test-set predictions, not many samples are typically needed, especially when the predictive distribution is reduced via summary statistics. A sensible choice may be a power of two, e.g., between 32 and 256, although this will depend on the dimensionality of the output space.

    • verbose name: Num Prediction Samples
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • data_vars
    Optionally a listing of variables for which to generate predictions. By default this will be all variables that are not included in the posterior distribution (what is returned by the dist and samples outputs), which in practice amounts to all observable variables in the model. In general, the node will name the streams in the packet based on the variable (e.g., 'y') and if the respective Random Draw node was configured via a like= input to output a Packet, then this stream name will be concatenated to the variable name (e.g., 'y_eeg'). To take control of the output stream names, one may also specify this as a dictionary, where the keys are the variable names whose data to emit and the values are the desired output stream names (e.g., {'y': 'eeg']).

    • verbose name: Data Output Variables
    • default value: None
    • port type: Port
    • value type: object (can be None)
  • exclude_from_posterior
    Optionally a listing of latent variable names or patterns to exclude from the posterior. It is recommended to use patterns that end in a * to also cover cases where additional related variables, such _decentered, are present in the model. Among others, this can be used with latent variables that take on a different shape at prediction time to prevent shape mismatches. Such variables will then be distributed as per their governing distribution parameters (or their posterior distributions).

    • verbose name: Exclude From Posterior
    • default value: None
    • port type: ListPort
    • value type: list (can be None)
  • vectorized_prediction
    The way in which samples are handled during prediction from new data and generation of predictive statistics. Note that both modes are compiled and will run efficiently, but the vectorized mode can be considerably faster with small (low-dimensional / few samples) models. For high-dimensional models, the memory requirements of the vectorized mode can be very large and the mode is not necessarily faster, so the serial mode is recommended in such cases.

    • verbose name: Vectorized Prediction
    • default value: serial
    • port type: EnumPort
    • value type: str (can be None)
  • canonicalize_output_axes
    Whether to canonicalize the data output axes of the node to match the expected output axes of other machine-learning nodes. This only applies to the data output (predictions) and only if the model outputs data with an instance axis. This is mainly useful if you use the Inference node in an ML workflow (MeasureLoss, Crossvalidation, Parameter Optimization) and ensures that the output has a feature axis that properly encodes what the outputs represent. If the observable variables in your model have highly custom axes (other than one instance and optionally a feature axis), you may need to do some extra reformatting between the Inference node and any downstream ML workflow node.

    • verbose name: Canonicalize Output Axes
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • seed
    Seed for any pseudo-random choices during training. This can be either a splittable seed as generated by Create Random Seed or a plain integer seed. If left unspecified, this will resolve to either the current 0-based "task number" if used in a context where such a thing is defined (for example cross-validation or a parallel for loop), or 12345 as a generic default.

    • verbose name: Random Seed
    • default value: None
    • port type: Port
    • value type: AnyNumeric (can be None)
  • update_on
    Update the model on the specified data. This setting controls how the node behaves when it is repeatedly invoked with data and can usually be left at its default. For use cases in which the node is invoked only a single time in a pipeline, all settings are equivalent. Scenarios where a difference arises include 1) real-time (streaming, aka "online") processing, where initially a non-streaming ("offline") dataset is passed in or a previously trained model is loaded in), and subsequently streaming data is passed through the node, 2) machine-learning style offline analysis, where the node is first given an offline dataset to train on, and then subsequently given a (still offline) test dataset to evaluate the trained model on, or 3) when the model is adapted over the course of multiple successive invocations with potentially different datasets. The settings are then as follows: "initial offline" will adapt the model only on the first non-streaming data that it receives (note that whether data is considered streaming or not is a matter of whether its is_streaming flag is set, rather than whether it comes in in actual real time). The "successive offline" mode will keep updating the model on any data marked non-streaming (this will use the model's prior state if none is wired in, or the wired-in state, which therefore must come from a previous invocation of the model). The "offline and streaming" mode will train on any data, regardless of whether it is offline or streaming (e.g., for adaptive real-time training). Note that not all inference nodes will support the offline and streaming mode -- specifically, MCMC Inference does not. The parameter can also be changed at runtime, to switch from one mode to another.

    • verbose name: Update On
    • default value: initial offline
    • port type: EnumPort
    • value type: str (can be None)
  • dont_reset_model
    Do not reset the model when the preceding graph is changed. Normally, when certain parameters of preceding nodes are being changed, the model will be reset. If this is enabled, the model will persist, but there is a chance that the model is incompatible when input data format to this node has changed.

    • verbose name: Do Not Reset Model
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • verbosity
    Verbosity level. Higher numbers will produce more extensive printed output.

    • verbose name: Verbosity Level
    • default value: 1
    • port type: EnumPort
    • value type: str (can be None)
  • axis_pairing
    Axis pairing override to apply within the context of the model. This will cause all nodes in the model graph that have an axis_pairing option set to 'default' to use this setting instead. It is recommended to leave this to 'matched'; if set to positional, several options in model nodes need to be carefully chosen in correspondence with the axis position in the data, including the number of event-space dimensions in AtSubscript nodes, and the dimension index (relative to the event-space dimension) for any With Stacked Variables nodes. One also needs to be careful with whether distributions have a multivariate event space (e.g., Multivariate Normal) or not (most other distributions).

    • verbose name: Axis Pairing In Model
    • default value: matched
    • port type: EnumPort
    • value type: str (can be None)
  • check_diagnostics
    Diagnostics that should be automatically checked to ensure convergence; note the diagnostics are also available as outputs on the node for analysis by your own pipeline. General guidelines are: Gelman-Rubin (R) around 1.00 (ideally below 1.01) indicates good convergence, less than 1.05 is acceptable but careful inspection (eg history plots) is recommended, and greater than 1.1 indicates poor convergence or "mixing" of chains (steady-state exploration of the posterior distribution). For split Gelman-Rubin criteria are similar, but additionally, if this is significantly greater than 1 but R is close to 1, this can indicate lack of stationarity. An effective sample size (ESS) of 1000 or more is esxcellent, greater than 100 is generally considered acceptable, and ESS less than 100 suggests high autocorrelation, slow mixing, and potential issues with the efficiency of the sampler. The relative ESS (rESS) is mostly a measure of your sampler's efficiency, where a value greater than 0.5 is excellent, greater than 0.1 is reasonable, and less than 0.1 indicates inefficient exploration of the posterior and high autocorrelation. These diagnostics can also be used to hand-tune the trajectory length when using HMC-type samplers like HMC or MixedHMC; if R is greater than 1.05 or ESS is less than 100, consider increasing the trajectory length.

    • verbose name: Check Diagnostics
    • default value: ['gelman-rubin', 'eff. sample size', 'relative ESS']
    • port type: SubsetPort
    • value type: list (can be None)
  • differentiation_mode
    The differentiation mode to use. Forward is supported by all constructs that may occur in a model. Reverse can be more efficient depending on the ratio of data points to parameters, but is not supported for example when Fold Loops are used in the model (e.g., for time-series models), and will throw an error in such cases.

    • verbose name: Differentiation Mode
    • default value: reverse
    • port type: EnumPort
    • value type: str (can be None)
  • history_fields
    Additional diagnostic history to collect and output. The MCLMC sampler traverses a potential energy function (negation of the log posterior density) and uses coordinates that are in an unconstrained transformation of the model's latent-variable space. Unlike HMC, the sampler proceeds primarily along contours of the potential energy (with random energy-preserving perturbations to explore different level sets). The associated state variables are z (the current coordinate in unconstrained space), potential_energy (the potential value at z where lower means more probable regions of the posterior), z_grad (the gradient of the potential energy with respect to z), and r (the current momentum vector). Specific to the unadjusted MCLMC sampler, the history can also include energy_change (the change in total energy), and kinetic_change (the change in the kinetic energy); for the adjusted MCLMC sampler, the history can include accept_prob (the acceptance probability of the step), diverging (a boolean indicating whether the step was divergent), num_steps (the number of integration steps taken in the trajectory), and energy (sum of potential and kinetic energy).

    The results are generally available via the output port called history in packet form, and have a leading unnamed axis of length num_chains with label 'chains' to index the chain, a time axis that indexes the (wall-clock) timeline of the history trace, and any other axes relevant for the variable in question, if it is multivariate. Note also that the k'th sample in the returned samples corresponds to the k's history entry (along its time axis) and specifically the k'th z value, where z lives in unconstrained space and the sample lives in the original parameter space of the model.

    • verbose name: Return History Of
    • default value: []
    • port type: SubsetPort
    • value type: list (can be None)

MCMCInference

Apply Bayesian inference given data and a wired-in statistical model, using a Markov Chain Monte-Carlo (MCMC) approach.

This is the most reliable and flexible of the inference nodes and should be the starting point of most Bayesian analyses. Like the other Inference nodes, this node "updates" a prior distribution (as formalized by a statistical model) given observed data, yielding a (joint) posterior distribution over the latent (unobserved) variables in the model. The posterior can be accessed for further analysis via the samples output ports The node prints basic diagnostics and suggestions by default, but note that the main diagnostics require multiple chains to be configured. Once a posterior has been inferred, the node can also be used for predictve inference of dependent variables when new data is passed in, and these predictions are available over the data output port, so the node can be used as a drop-in replacement for other machine-learning node in NeuroPype; the inference result can also be saved and loaded back in via the model port. The statistical model is the portion of the graph that is wired into the "graph" input port and runs under the control of the node; see the "Graph [signature]" documentation (tooltip) for more detail on how statistical models can be built using nodes such as Random Draw or With Stacked Variables. Also like the other inference nodes, the resulting distribution is represented by a collection of samples whose distribution matches that of the posterior; these can be scatter-plotted, summarized via moment or quantile statistics, or propagated through downstream computations. The MCMC node is quite robust and efficient and requires little tuning when used with the default "NUTS" sampler option (sampler type parameter). The sampler type may have to be changed if your model contains discrete latent variables (e.g., is a mixture model), or if you have robustness issues (e.g., nan's). Other tuning knobs are the posterior correlations parameter, the number of warmup and posterior samples to generate, and the init strategy. You can also retrieve the convergence history to find out how well or how quickly your inference is converging via the history output port. For most uses it is recommended to set the number of chains greater than the default of 1 (e.g., 4), which enables better diagnostics. One can also wire in a separate Sampler node for more control over options of the sampler and more comprehensive documentation. Other remedies for convergence issues are close inspection of the model for signs of mis-specification such as non-identifiable variables, funnel geometries (Neal's funnel), or variables that can use reparameterization (see option in the Random Draw node), or using more benign distributions or generally starting with a simpler model. The node supports two alternative sampling engines (numpyro and blackjax) of which the former is more comprehensively tested and thus preferred, but one may experiment with falling back to blackjax if issues with numpyro are suspected (e.g., crashes, GPU hangs, or etc). Like other GPU-compatible ML nodes in NeuroPype, the wired-in input data is slightly reorganized for more efficient compute, and among others, the instance axis of your data will appear as a separate stream with a feature axis (indexing the instance-axis fields such as the TargetValue) and a (blank) dummy instance axis indexing the instances. This can be further configured via options in the "Input" category.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • data
    The data to process.

    • verbose name: Data
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: INOUT
  • samples
    Generated posterior samples in the desired format.

    • verbose name: Samples
    • default value: None
    • port type: DataPort
    • value type: Packet (can be None)
    • data direction: OUT
  • history
    History trace of inference process.

    • verbose name: History
    • default value: None
    • port type: DataPort
    • value type: Packet (can be None)
    • data direction: OUT
  • graph
    The graphical model to use.

    • verbose name: Graph
    • default value: None
    • port type: GraphPort
    • value type: Graph
  • graph__signature
    Argument names accepted by the wired-in graphical model (statistical model). This is a listing of the names of input arguments of your statistical model, which is typically a single argument conventionally named "data" (although you can choose your own name) followed by a catch-all tilde symbol, as in (data,~). This data argument then receives the same data given to the inference node (via its data input), in most cases after minimal restructuring for efficient processing (as per the input options).

    In addition to or instead of this argument, the model may accept keyword arguments (generally listed after a + separator), which may match the name of a stream (such as 'eeg') in the input data packet (in which case a single-stream packet will be assigned to that argument), or the name of an axis field (e.g., 'TargetValue') of an axis that was promoted to a stream (see the corresponding option under Input, which defaults to promoting the instance axis). In such a case, the variable receives that stream's Block, but reduced to only the axis field in question. An example signature that does this is (data, +, TargetValue, ~) and a signature that binds only a stream named 'eeg' to a keyword argument instead of receiving the data as a whole is (+, eeg, TargetValue, ~). The model may also accept an argument conventionally named 'is_training' either as a second positional argument or as a keyword argument, which will receive False if the model is used in a prediction capacity, and True otherwise.

    Your model is generally a graph that begins with one or more Placeholders nodes whose slotname must match a name listed here, and which are followed by a series of nodes that collectively specify your graphical model. The final node of your network must be wired into the "graph" input of the inference node (note that in graphical UIs, the edge that goes into the "graph" input will be drawn in dotted style to indicate that this is not normal forward data flow, but that the graph runs under the control of the inference node). However, the return value of your model has no special meaning since the inference node is generally only concerned with named random variables occurring in the model (i.e., the varnames in Random Draw nodes).

    A simple strategy for building a graphical model is to work backwards from the data that you wish to explain or model, and the core tool at your disposal is the Random Draw node, which specifies that the data you want to model (wired in to the obs input port) is the result of a random draw from some (specifiable) distribution (see Distributions package). Models become hierarchical by using distribution parameters that are themselves the output of a random draw. Other useful nodes are the With Stacked Variables node and the associated At Subscripts node. A wide range of other nodes may be used (e.g., math operations, formatting) to complicate the dependency structure among variables in the model.

    • verbose name: Graph [Signature]
    • default value: (data,~)
    • port type: Port
    • value type: object (can be None)
  • n_eff
    The effective sample size n_eff of the posterior samples.

    • verbose name: N Eff
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: OUT
  • autocorr
    The autocorrelation of the posterior samples.

    • verbose name: Autocorr
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: OUT
  • gelman_rubin
    The Gelman-Rubin diagnostic R-hat of the posterior samples.

    • verbose name: Gelman Rubin
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: OUT
  • split_gelman_rubin
    The Split Gelman-Rubin diagnostic R-hat of the posterior samples.

    • verbose name: Split Gelman Rubin
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: OUT
  • sampler
    Optionally the sampler to use for inference.

    • verbose name: Sampler
    • default value: None
    • port type: DataPort
    • value type: BaseNode (can be None)
    • data direction: IN
  • sampler_type
    Type of MCMC sampler (Markov transition kernel) to use. Briefly, for models with only continuous latent variables, NUTS should be the default choice as it is efficient and nearly tuning-free. For models of low to moderate dimensionality that have problematic posterior geometry (e.g., a mix of steep and flat regions), the Barker sampler is a more robust but slower fallback. For models with discrete latent variables, the DiscreteHMC sampler is a good choice, unless the number of discrete states is very high, in which case the MixedHMC sampler may be more efficient, but this sampler along with the plain HMC sampler require manual tuning of the HMC pathlength parmameter below. All samplers can also instead be instantiated as a node (one of the Sampler nodes) and wired into the sampler port, in which case this setting should be set to "provided". The sampler nodes each expose a number of options that give complete control over the method used along with documentation. As another option, see also the MCLMC Inference node, which has a similar performance characteristic to MCMC with NUTS, but may handle some models better and others less well.

    • verbose name: Sampler Type
    • default value: NUTS
    • port type: ComboPort
    • value type: str (can be None)
  • trajectory_length
    The length of the particle trajectory to simulate. Only used when choosing either the HMC or MixedHMC sampler (but not DiscreteHMC, and ignored if a sampler node is wired in). This value may need to be chosen carefully to avoid either too short (taking very long to converge to a steady state) or too long (well-mixing but generating few samples per amount of compute) trajectories. See also the documentation on diagnostics and history for tips on tuning this.

    • verbose name: Trajectory Length (If Hmc/mixed Hmc)
    • default value: 6.283185
    • port type: FloatPort
    • value type: float (can be None)
  • modified_gibbs
    Use the modified Gibbs sampler aka Metropolised Gibbs sampler to sample from discrete latent variables. This is only respected when using either the DiscreteHMC or MixedHMC methods; also ignored if a sampler node is wired in.

    • verbose name: Use Modified Gibbs (If Discrete Latents)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • num_warmup
    The number of warmup samples to draw. These samples are only used for initial convergence and are discarded from the model. The warmup phase should be set long enough so that the sampling process has settled into a steady state before samples are collected for use as the posterior. There are several diagnostics that can spot insufficiently initialized models ("poorly mixing chains"), and you can also plot the timelines of various model parameters available via the history output.

    • verbose name: Number Of Warmup Samples
    • default value: 1000
    • port type: IntPort
    • value type: int (can be None)
  • num_samples
    The number of samples to draw from the posterior distribution. This does not count the warmup samples. This may have to be increased if a sampler is not exploring the parameter space very efficiently (e.g., moving slowly), and likely needs to be higher than 1000 for samplers other than NUTS and HMC, unless you have a low-dimensional model. The resulting increase in memory usage can be counteracted by increasing the "thinning" factor, which subsamples the data.

    • verbose name: Number Of Posterior Samples
    • default value: 1000
    • port type: IntPort
    • value type: int (can be None)
  • num_chains
    The number of parallel (independent) markov chains to run. This can be used to accelerate the rate at which samples are drawn, but is also a powerful tool for monitoring convergence of all chains to the same distribution.

    • verbose name: Number Of Chains
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • thinning_factor
    The factor by which to reduce the emitted samples. This can be used to reduce the data volume without compromising the quality of the approximation, since it eliminates serially correlated samples. This is also called "thinning" in MCMC terms.

    • verbose name: Thinning Factor
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • freeze_parameters
    Whether to freeze parameters into the model, causing the model to be recompiled each time the data changes. This can be enabled if the model fails to compile as a result of having some structural dependency on model arguments (e..g, sometimes parametric plate sizes could cause this).

    • verbose name: Freeze All Parameters
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • backend_engine
    The underlying implementation to use for the MCMC sampler. Numpyro is the recommended default as it natively supports all samplers and their options, and all parts of the modeling workflow, while blackjax serves as a fallback implementation in case of performance issues or other problems with numpyro. Currently blackjax only supports the NUTS, HMC, and Barker samplers.

    • verbose name: Sampling Engine
    • default value: numpyro
    • port type: EnumPort
    • value type: str (can be None)
  • multiprocess_backend
    Backend to use for running multiple chains across multiple (CPU) processes. Serial means to not run things in parallel but instead in series (even if num_procs is >1), which can help with debugging, memory usage, or running multiple chains inside a parallel outer loop (e.g., parallel cross-validation, parameter optimization, or parallel foreach). Parallel means to run things across multiple processes using an engine-specific default (for numpyro, this is currently jax.pmap, and for blackjax it is multiprocessing). Vectorized means to run multiple chains in parallel on a single device (CPU or GPU) using SIMD vectorization. The other backends are implementation specific, and currently do not work with both engines; of those, the numpyro engine can be used with jax.pmap, and blackjax can be used with multiprocessing and loky. jax.pmap is a multi-process backend that is mainly configured via jax settings and ignores the other performance options below. Multiprocessing is the simple Python default, but is not the most robust. Loky is a fast and fairly stable backend, but it does not support nested parallelism and has different limitations than multiprocessing. It can be helpful to try either if you are running into an issue trying to run something in parallel. Tip: if you are getting an error that "daemonic processes cannot have children", the most likely cause is that you have two nested parallel loops using multiprocessing, and at least the outer loop needs to be set to nestable.

    • verbose name: Multi-Chain Backend
    • default value: serial
    • port type: EnumPort
    • value type: str (can be None)
  • num_procs
    Number of processes to use for running parallel chains. If None, the global setting NUM_PROC, which defaults to the number of CPUs on the system, will be used.

    • verbose name: Max Parallel Processes
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • num_threads_per_proc
    Number of threads to use for each process when running parallel chains. This can be used to limit the number of threads by each process to mitigate potential churn.

    • verbose name: Threads Per Process
    • default value: 4
    • port type: IntPort
    • value type: int (can be None)
  • num_procs_per_gpu
    Number of chains to run on each GPU. This is only relevant if you have GPU compute backends enabled. If your GPU(s) are under-utilized during cross-validation, you can increase this to run this many chains on each GPU.

    • verbose name: Processes Per Gpu
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • conserve_memory
    The auto option defaults to 'on' if the multi0chain backend is set to multiprocessing or nestable, and 'off' otherwise. If enabled, then the memory will be cleared periodically across loop iterations. This is useful if you are running out of memory during a run, but it will slow down the computation somewhat. The aggressive mode will force memory to be cleared after each iteration, which has the highest overhead, but guarantees a minimal memory footprint. However, the multiprocessing backend is known to occasionally hang in this mode so this may be reverted to on if you experience hangs. The off mode will leave memory reclamation to the individual nodes and scheduler behavior within the loop body, which may be further tuned using e.g. the default_conserve_memory config-file setting or per-node settings.

    • verbose name: Conserve Memory
    • default value: auto
    • port type: EnumPort
    • value type: str (can be None)
  • serial_if_debugger
    If True, then if the Python debugger is detected, the node will run in serial mode, even if multiprocess_backend is set to something else. This is useful for debugging, since the debugger does not work well with parallel processes. This can be disabled if certain steps should nevertheless run in parallel (e.g., to reach a breakpoint more quickly).

    • verbose name: Serial If Debugger
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • posterior_correlations
    Assumed covariance structure among posterior variables. This setting is used by the inference methods to improve the tractability of posterior inference: 'uncorrelated' assumes that all posterior variables are uncorrelated (i.e., diagonal posterior covariance); 'all-to-all' does not restrict the covariance structure in any way but can be computationally costly, and 'blockwise' allows one to specify which variables in the model may be mutually correlated and or have internal correlations (if multivariate). For example, the syntax 'blockwise(myvar1/myvar2,myvar3)' expresses that myvar1 and myvar2 are assumed to be correlated with each other but not with myvar3, but each of the three variables may have its own internal crrelation structure, while any variables not listed are treated as uncorrelated. Note that, while this setting improves the efficiency of the inference (in terms of the effective sample size per unit of compute) by determining the form of the "mass matrix", it does not otherwise impact the fidelity of the final posterior once inferred.

    • verbose name: Posterior Correlations
    • default value: uncorrelated
    • port type: ComboPort
    • value type: str (can be None)
  • init_strategy
    Override the initialization strategy for distribution parameters, optionally for individual (sets of) variables. Some choices accept a parameter that can also be omitted (the default is shown in the drop-down examples). Uniform(r) initializes each parameter to a uniform range within -r to +r in the "unconstrained" domain, which is very frequently a good default, although at times the system may have to make multiple attempts to find a feasible starting point. At times, models can require a more conservative choice, for example median(n) initializes to the median of n samples drawn from the prior distribution, mean uses the expected value, sample draws a random sample, feasible initializes to a fixed trivially feasible point (e.g., 0), and value(x) initializes to a given fixed value x. This can also be specified on a per-variable basis (but typically not necessary for well-specified models), using a dictionary syntax where the variable name (or a wildcard pattern) is used as the key and the value is the initialization form. See also drop-down options for concrete examples.

    • verbose name: Init Strategy
    • default value: uniform
    • port type: ComboPort
    • value type: str (can be None)
  • promote_axis
    If given, promote axes of the specified type to their own stream using the Promote Axis to Stream node. This adds a new stream to the data packet that holds the data of the selected axis; It has a feature axis that indexes the fields of the axis (e.g., 'TargetValue', 'times', etc) and a (blank) dummy copy of the axis as the second dimension, and the axis "payload" (i.e. numeric content) is then accessible in the stream's two-dimensional data array like any other data modality. This representation is generally used in contexts that employ high-performance (multicore or GPU-accelerated) computation, including the Deep Model, Convex Model, and Inference nodes. This can be disabled by leaving the option blank (but note that doing so can incur heavy performance overhead each time the axis content changes from one use of the node to another, e.g., in a cross-validation). You can also disable this you wish to manually use the Promote Axis to Stream or other formatting node beforehand. The new stream is by convention named the same as the axis (i.e., 'instance'), unless there are multiple different streams with matching axes that are not equal to each other, in which case one stream per source stream is created and is named 'streamname-axisname'. The original axis will be cleared to all-defaults for the purpose of running the model graph, but will be restored afterwards from the stream (if modified during inference) or the original axis.

    • verbose name: Promote Axis To Stream
    • default value: instance
    • port type: ComboPort
    • value type: str (can be None)
  • pass_metadata
    Graph accesses marker-stream or props metadata. If this is selected, then marker streams and stream (chunk) properties will be preserved when the input data enters the model graph. If this is deselected (the default), both any marker streams and all but a few essential chunk properties will be removed for the purposes of inference (which avoids unnecessary recompilation of the model graph, but the model will not have access to these details) and are restored afterwards.

    • verbose name: Pass Marker/props Metadata
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • data_format
    The type of statistics to output in the main data output packet (for prediction use cases). This can either be a comma-separated list of summary statistics, in which case the data output will contain a statistic axis along which the respective statistics are itemized, or it can be set to distribution, in which case the output will have a distribution axis along which the desired number of posterior samples are enumerated. The former is useful for a simple estimate-with-error-bars type of representation, which can then be used for plotting etc. If a single location statistic is used (e.g., mean or median), then the output will be largely compatible with the output of a conventional machine learning node (which tend to output a posterior mode or maximum-likelihood estimate depending on the type of model), and the inference node can be used as a drop-in replacement for such nodes. When error statistics are included (e.g., stddev, mad, or ci90), only a few successor nodes will correctly interpret or preserve this output, for example some of the plotting nodes that can display error bars; other numeric nodes will either ignore these statistics (e.g., MeasureLoss) or will act on all statistics using the same operation, which is typically not correct. Instead, if further downstream computations are to be performed with the output of the inference node, it is recommended to use the distribution output, since posterior samples will propagate through most mathematical operations correctly, i.e., the result will be a distribution over the result of the successor operation(s).

    • verbose name: Data Output Statistics
    • default value: distribution
    • port type: ComboPort
    • value type: str (can be None)
  • num_pred_samples
    Number of samples to use when computing predictive statistics (content of the data output). Note that for machine-learning use cases, i.e., generation of test-set predictions, not many samples are typically needed, especially when the predictive distribution is reduced via summary statistics. A sensible choice may be a power of two, e.g., between 32 and 256, although this will depend on the dimensionality of the output space.

    • verbose name: Num Prediction Samples
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • data_vars
    Optionally a listing of variables for which to generate predictions. By default this will be all variables that are not included in the posterior distribution (what is returned by the dist and samples outputs), which in practice amounts to all observable variables in the model. In general, the node will name the streams in the packet based on the variable (e.g., 'y') and if the respective Random Draw node was configured via a like= input to output a Packet, then this stream name will be concatenated to the variable name (e.g., 'y_eeg'). To take control of the output stream names, one may also specify this as a dictionary, where the keys are the variable names whose data to emit and the values are the desired output stream names (e.g., {'y': 'eeg']).

    • verbose name: Data Output Variables
    • default value: None
    • port type: Port
    • value type: object (can be None)
  • exclude_from_posterior
    Optionally a listing of latent variable names or patterns to exclude from the posterior. It is recommended to use patterns that end in a * to also cover cases where additional related variables, such _decentered, are present in the model. Among others, this can be used with latent variables that take on a different shape at prediction time to prevent shape mismatches. Such variables will then be distributed as per their governing distribution parameters (or their posterior distributions).

    • verbose name: Exclude From Posterior
    • default value: None
    • port type: ListPort
    • value type: list (can be None)
  • vectorized_prediction
    The way in which samples are handled during prediction from new data and generation of predictive statistics. Note that both modes are compiled and will run efficiently, but the vectorized mode can be considerably faster with small (low-dimensional / few samples) models. For high-dimensional models, the memory requirements of the vectorized mode can be very large and the mode is not necessarily faster, so the serial mode is recommended in such cases.

    • verbose name: Vectorized Prediction
    • default value: serial
    • port type: EnumPort
    • value type: str (can be None)
  • canonicalize_output_axes
    Whether to canonicalize the data output axes of the node to match the expected output axes of other machine-learning nodes. This only applies to the data output (predictions) and only if the model outputs data with an instance axis. This is mainly useful if you use the Inference node in an ML workflow (MeasureLoss, Crossvalidation, Parameter Optimization) and ensures that the output has a feature axis that properly encodes what the outputs represent. If the observable variables in your model have highly custom axes (other than one instance and optionally a feature axis), you may need to do some extra reformatting between the Inference node and any downstream ML workflow node.

    • verbose name: Canonicalize Output Axes
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • seed
    Seed for any pseudo-random choices during training. This can be either a splittable seed as generated by Create Random Seed or a plain integer seed. If left unspecified, this will resolve to either the current 0-based "task number" if used in a context where such a thing is defined (for example cross-validation or a parallel for loop), or 12345 as a generic default.

    • verbose name: Random Seed
    • default value: None
    • port type: Port
    • value type: AnyNumeric (can be None)
  • update_on
    Update the model on the specified data. This setting controls how the node behaves when it is repeatedly invoked with data and can usually be left at its default. For use cases in which the node is invoked only a single time in a pipeline, all settings are equivalent. Scenarios where a difference arises include 1) real-time (streaming, aka "online") processing, where initially a non-streaming ("offline") dataset is passed in or a previously trained model is loaded in), and subsequently streaming data is passed through the node, 2) machine-learning style offline analysis, where the node is first given an offline dataset to train on, and then subsequently given a (still offline) test dataset to evaluate the trained model on, or 3) when the model is adapted over the course of multiple successive invocations with potentially different datasets. The settings are then as follows: "initial offline" will adapt the model only on the first non-streaming data that it receives (note that whether data is considered streaming or not is a matter of whether its is_streaming flag is set, rather than whether it comes in in actual real time). The "successive offline" mode will keep updating the model on any data marked non-streaming (this will use the model's prior state if none is wired in, or the wired-in state, which therefore must come from a previous invocation of the model). The "offline and streaming" mode will train on any data, regardless of whether it is offline or streaming (e.g., for adaptive real-time training). Note that not all inference nodes will support the offline and streaming mode -- specifically, MCMC Inference does not. The parameter can also be changed at runtime, to switch from one mode to another.

    • verbose name: Update On
    • default value: initial offline
    • port type: EnumPort
    • value type: str (can be None)
  • dont_reset_model
    Do not reset the model when the preceding graph is changed. Normally, when certain parameters of preceding nodes are being changed, the model will be reset. If this is enabled, the model will persist, but there is a chance that the model is incompatible when input data format to this node has changed.

    • verbose name: Do Not Reset Model
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • verbosity
    Verbosity level. Higher numbers will produce more extensive printed output.

    • verbose name: Verbosity Level
    • default value: 1
    • port type: EnumPort
    • value type: str (can be None)
  • axis_pairing
    Axis pairing override to apply within the context of the model. This will cause all nodes in the model graph that have an axis_pairing option set to 'default' to use this setting instead. It is recommended to leave this to 'matched'; if set to positional, several options in model nodes need to be carefully chosen in correspondence with the axis position in the data, including the number of event-space dimensions in AtSubscript nodes, and the dimension index (relative to the event-space dimension) for any With Stacked Variables nodes. One also needs to be careful with whether distributions have a multivariate event space (e.g., Multivariate Normal) or not (most other distributions).

    • verbose name: Axis Pairing In Model
    • default value: matched
    • port type: EnumPort
    • value type: str (can be None)
  • check_diagnostics
    Diagnostics that should be automatically checked to ensure convergence; note the diagnostics are also available as outputs on the node for analysis by your own pipeline. General guidelines are: Gelman-Rubin (R) around 1.00 (ideally below 1.01) indicates good convergence, less than 1.05 is acceptable but careful inspection (eg history plots) is recommended, and greater than 1.1 indicates poor convergence or "mixing" of chains (steady-state exploration of the posterior distribution). For split Gelman-Rubin criteria are similar, but additionally, if this is significantly greater than 1 but R is close to 1, this can indicate lack of stationarity. An effective sample size (ESS) of 1000 or more is esxcellent, greater than 100 is generally considered acceptable, and ESS less than 100 suggests high autocorrelation, slow mixing, and potential issues with the efficiency of the sampler. The relative ESS (rESS) is mostly a measure of your sampler's efficiency, where a value greater than 0.5 is excellent, greater than 0.1 is reasonable, and less than 0.1 indicates inefficient exploration of the posterior and high autocorrelation. These diagnostics can also be used to hand-tune the trajectory length when using HMC-type samplers like HMC or MixedHMC; if R is greater than 1.05 or ESS is less than 100, consider increasing the trajectory length.

    • verbose name: Check Diagnostics
    • default value: ['gelman-rubin', 'eff. sample size', 'relative ESS']
    • port type: SubsetPort
    • value type: list (can be None)
  • differentiation_mode
    The differentiation mode to use. Forward is supported by all constructs that may occur in a model. Reverse can be more efficient depending on the ratio of data points to parameters, but is not supported for example when Fold Loops are used in the model (e.g., for time-series models), and will throw an error in such cases.

    • verbose name: Differentiation Mode
    • default value: reverse
    • port type: EnumPort
    • value type: str (can be None)
  • history_fields
    Additional diagnostic history to collect and output. Note that not every sampler type supports every field. All samplers (NUTS, HMC, Barker) traverse a potential energy function (negation of the log posterior density), and do so in an unconstrained transformation of the model's latent-variable space. The associated state variables are z (the current coordinate in unconstrained space), potential_energy (the potential value at z where lower means more probable regions of the posterior), z_grad (the gradient of the potential energy with respect to z), r (the current momentum vector for HMC/NUTS), energy (total of potential and kinetic energy, only for HMC/NUTS; expected to be constant except for discontinuities whenever the sampler chooses a new trajectory), the used trajectory length (HMC only), the acceptance probability of the current proposal and the mean acceptance probability from start to now during warmup (higher is better). For DiscreteHMC and MixedHMC, the fields (except z and accept_prob, for MixedHMC) pertain to the underlying HMC sampler's state. The diverging flag indicates whether the current trajectory is diverging.

    The results are generally available via the output port called history in packet form, and have a leading unnamed axis of length num_chains with label 'chains' to index the chain, a time axis that indexes the (wall-clock) timeline of the history trace, and any other axes relevant for the variable in question, if it is multivariate. Note also that the k'th sample in the returned samples corresponds to the k's history entry (along its time axis) and specifically the k'th z value, where z lives in unconstrained space and the sample lives in the original parameter space of the model.

    If you are using plain HMC or MixedHMC, these state traces can be used to tune the trajectory length: if you observe chains that seem to get "stuck" in one regime of the parameter space for extended periods before moving to other regimes and back, you should increase the trajectory length. If you chains are in a steady-state regime (mixing well), you could potentially reduce the trajectory length and check whether the sampler is still mixing well with a lower setting, which will be more computationally efficient.

    • verbose name: Return History Of
    • default value: []
    • port type: SubsetPort
    • value type: list (can be None)

MeanFieldApprox

An approximation of the posterior that assumes that all variables are independent of each other and can each be approximated by a univariate Gaussian.

The analytic variant is very fast to compute due to its use of closed-form expressions for the KL divergence, and is the historically most common form of variational inference, before the advent of stochastic variational inference. The basic variant is mathematically equivalent but does not use the closed-form expressions, and is therefore simpler and potentially more robust to numerical issues or model misspecification. However, both approximations have in common that they can be seriously biased if the posterior has strong correlations between variables, followes a skewed or otherwise non-Gaussian distribution, or has multiple modes. For this reason this method should only be used if the posterior has been verified to be well explained by these assumptions (using a sample-based method such as MCMC); otherwise the user should consider upgrading to a better approximation such as the Multivariate Normal or BNAF approximations.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • initial_scale
    The initial standard deviation for the variational distribution.

    • verbose name: Initial Scale
    • default value: 0.1
    • port type: FloatPort
    • value type: float (can be None)
  • variant
    The variant to use. The analytic variant is the classic textbook mean-field variational inference, which uses closed-form expressions for the KL divergence; This variant is therefore more efficient and has somewhat better site names than the 'basic' variant. The basic variant is the simplest variational approximation and does not make many distributional assumptions, and is therefore a good starting point for testing or debugging of inference issues.

    • verbose name: Variant
    • default value: analytic
    • port type: EnumPort
    • value type: str (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

MixedHMCSampler

The Mixed Hamiltonian Monte-Carlo sampler for use with MCMC inference.

This sampler is suitable for models with both continuous and discrete variables where there are potentially many unique discrete states, and will sample stochastically from those variables. The sampler uses an underlying HMC sampler whose main parameter, the trajectory length, will likely need to be tuned by the user; this may require some experimentation. For this reason it is strongly recommended to compare also with the DiscreteHMC sampler which is compatible with the relatively tuning-free underlying NUTS sampler (the default), but which may in contrast explore the discrete variables less efficiently.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • trajectory_length
    The length of the particle trajectory to simulate. This value may need to be chosen carefully to avoid either too short (non-converging) or too long (wasteful) trajectories. The default is 2*pi (a full circle). See tips in the MCMC Inference node for how to tune this parameter using a combination of diagnostics and optionally trace plots of the sampler state history.

    • verbose name: Trajectory Length
    • default value: 6.283185
    • port type: FloatPort
    • value type: float (can be None)
  • mass_matrix_shape
    The shape of the inverse mass matrix. This determines the efficiency with which the sampler can explore the posterior parameter space in case one or more variables may have highly correlated posterior distributions. The 'uncorrelated' form uses a simplifying diagonal approximation that is computationally cheap per update but will not handle strong correlations very well so may require more updates, while 'all-to-all' uses a full-rank ("dense") mass matrix. The latter can in principle handle correlated posterior variables more efficiently and require fewer updates, but at increased computational cost per update, especially if the parameter space is high dimensional.

    An in-between is the blockwise form: this allows one to express that one or more individual variables in the model that are multivariate may each have a correlated posterior distribution, without also assuming that the variables are necessarily correlated with each other. This is written as in blockwise(myvar1,myvar2) where myvar1 and myvar2 are the names of the random variables in question; all other variables will have diagonal entries in the (overall block-diagonal) mass matrix. The final form is one in which one may denote sets of variables that could have strong mutual correlations (in addition to their own internal correlations), which is written as in blockwise(myvar1/myvar2,myvar3) where myvar1 and myvar2 are assumed to be correlated with each other but not with myvar3, but myvar3 may have its own correlation structure. Any unlisted variables again receive diagonal entries in the mass matrix.

    • verbose name: Mass Matrix Shape
    • default value: uncorrelated
    • port type: ComboPort
    • value type: str (can be None)
  • num_discrete_updates
    The number of discrete updates to perform. The default is the number of discrete variables.

    • verbose name: Num Discrete Updates
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • modified
    Use the modified Gibbs sampler aka Metropolised Gibbs sampler.

    • verbose name: Use Modified Gibbs Sampler
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • random_walk
    If enabled, samples are drawn uniformly from the support of the discrete variables. Otherwise the draw is conditioned on the other variables, which tends to be more efficient if the discrete variables are correlated with the continuous variables.

    • verbose name: Random Walk
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • desired_accept_prob
    The desired acceptance probability for step-size adaptation. This will adjust the step size such that on average a step is accepted (within the posterior distribution) with this probability. Increasing this value will lead to a smaller step size, thus the sampling will be slower but more robust.

    • verbose name: Desired Accept Prob
    • default value: 0.8
    • port type: FloatPort
    • value type: float (can be None)
  • num_steps
    Optionally a fixed number of steps to take.

    • verbose name: Num Steps
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • step_size
    The initial step size used by the underlying HMC sampler. This normally does not need to be touched unless step size adaptation is disabled, or the adaptation diverges immediately even when using conservative choices for the init strategy in the inference node, and having tried the step size heuristic option in the sampler. The step size is often tuned based on the acceptance rate of the sampler, where 0.8 is usually a good target (if greater than 0.8, the step size could be decreased and if less than 0.8, the step size could be increased).

    • verbose name: Step Size
    • default value: 1.0
    • port type: FloatPort
    • value type: float (can be None)
  • adapt_step_size
    Whether to adapt the step size during warmup. This uses the dual-averaging scheme.

    • verbose name: Adapt Step Size
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • adapt_mass_matrix
    Whether to adapt the mass matrix during warmup. This uses the Welford scheme.

    • verbose name: Adapt Mass Matrix
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • step_size_heuristic
    Whether to use a heuristic to adjust the step size at the beginning of each adaptation. This uses the doubling/halving strategy proposed in "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo" (2014). This heuristic can speed up initial convergence but introduces its own computational cost. Another use case is that it can help with instant divergence due to a bad choice of initial step size, if the model has steep gradients or other pathological features.

    • verbose name: Pretune Step Size
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • regularize_mass_matrix
    Wether to apply regularization to the mass matrix. This is generally recommended, particularly for higher-dimensional posterior distribution, since otherwise sampling may become unstable.

    • verbose name: Regularize Mass Matrix
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

MultivariateNormalApprox

A multivariate normal variational approximation.

This approximation is the simplest that can capture correlations between variables in the posterior distribution, but it is not robust as the simple mean-field approximation (which uses a diagonal normal distribution) and more computationally costly. Particularly in high dimensions, this family of distributions may be challenging to fit, and you may need to experiment with optimizers to avoid issues. For higher-dimensional data, it is recommended to consider using the low-rank approximation, which can capture the dominant correlation structure in the data; see the max rank parameter for this.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • initial_scale
    The initial standard deviation for the variational distribution.

    • verbose name: Initial Scale
    • default value: 0.1
    • port type: FloatPort
    • value type: float (can be None)
  • max_rank
    Optionally the maximum rank of the distribution, in which case a low-rank variant will be used. If this is not set, the full-rank variant will be used. Good values may be 5-10 or as much as 20 for very high-dimensional posterior distributions. One may also consider a rank that is on the order of log(N) where N is the dimensionality of the posterior distribution (sum of dimensions of all latent variables).

    • verbose name: Maximum Rank
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

NUTSSampler

The No U-Turn Sampler (NUTS) for use with MCMC inference.

This is the main workhorse for MCMC inference in NeuroPype, especially for models that do not contain discrete variables, and is suitable for high-dimensional models. The main strength of the NUTS sampler compared to its predecessor HMC is that the latter requires manual tuning of the "trajectory length" parameter, which can require some experimentation. Currently, for models that involve discrete latent variables, the user may have to use the DiscreteHMC sampler, which is a hybrid sampler that includes NUTS as a building block, or alternatively the MixedHMC sampler (which does not enjoy the the relatively tuning-free nature of NUTS). Note that, if the model contains subsampled plates (via the With Stacked Variables node), this sampler automatically uses an energy-conserving subsampling scheme (HMCECS); see the subsampling parameter of WithStackedVariables for more details. Tip: in low-dimensional models, it can be useful to use a full-rank mass matrix, which can help with sampling efficiency.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • mass_matrix_shape
    The shape of the inverse mass matrix. This determines the efficiency with which the sampler can explore the posterior parameter space in case one or more variables may have highly correlated posterior distributions. The 'uncorrelated' form uses a simplifying diagonal approximation that is computationally cheap per update but will not handle strong correlations very well so may require more updates, while 'all-to-all' uses a full-rank ("dense") mass matrix. The latter can in principle handle correlated posterior variables more efficiently and require fewer updates, but at increased computational cost per update, especially if the parameter space is high dimensional.

    An in-between is the blockwise form: this allows one to express that one or more individual variables in the model that are multivariate may each have a correlated posterior distribution, without also assuming that the variables are necessarily correlated with each other. This is written as in blockwise(myvar1,myvar2) where myvar1 and myvar2 are the names of the random variables in question; all other variables will have diagonal entries in the (overall block-diagonal) mass matrix. The final form is one in which one may denote sets of variables that could have strong mutual correlations (in addition to their own internal correlations), which is written as in blockwise(myvar1/myvar2,myvar3) where myvar1 and myvar2 are assumed to be correlated with each other but not with myvar3, but myvar3 may have its own correlation structure. Any unlisted variables again receive diagonal entries in the mass matrix.

    • verbose name: Mass Matrix Shape
    • default value: uncorrelated
    • port type: ComboPort
    • value type: str (can be None)
  • max_tree_depth
    The maximum depth of the doubling scheme used in this sampler. Models with especially complex parameter spaces or posterior geometry can benefit from a larger value here.

    • verbose name: Max Tree Depth
    • default value: 10
    • port type: IntPort
    • value type: int (can be None)
  • max_tree_depth_postwarmup
    The maximum depth during the post-warmup phase. If not provided, this is the same as max_tree_depth.

    • verbose name: Max Tree Depth (Post-Warmup)
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • desired_accept_prob
    The desired acceptance probability for step-size adaptation. This will adjust the step size such that on average a step is accepted (within the posterior distribution) with this probability. Increasing this value will lead to a smaller step size, thus the sampling will be slower but more robust.

    • verbose name: Desired Accept Prob
    • default value: 0.8
    • port type: FloatPort
    • value type: float (can be None)
  • step_size
    The initial step size used by the underlying Verlet integrator. This normally does not need to be touched unless step size adaptation is disabled, or the adaptation diverges immediately even when using conservative choices for the init strategy in the inference node, and having tried the step size heuristic option in the sampler. The step size is often tuned based on the acceptance rate of the sampler, where 0.8 is usually a good target (if greater than 0.8, the step size could be decreased and if less than 0.8, the step size could be increased).

    • verbose name: Step Size
    • default value: 1.0
    • port type: FloatPort
    • value type: float (can be None)
  • adapt_step_size
    Whether to adapt the step size during warmup. This uses the dual-averaging scheme.

    • verbose name: Adapt Step Size
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • adapt_mass_matrix
    Whether to adapt the mass matrix during warmup. This uses the Welford scheme.

    • verbose name: Adapt Mass Matrix
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • step_size_heuristic
    Whether to use a heuristic to adjust the step size at the beginning of each adaptation. This uses the doubling/halving strategy proposed in "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo" (2014). This heuristic can speed up initial convergence but introduces its own computational cost. Another use case is that it can help with instant divergence due to a bad choice of initial step size, if the model has steep gradients or other pathological features.

    • verbose name: Pretune Step Size
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • plate_subsampling
    Use energy-conserving subsampling of plates (HMCECS). It is recommended to leave this set to auto, in which case it will automatically apply if the model contains subsampled plates, and otherwise not. One may set it to off to be sure that one uses a plain HMC sampler, where that is of interest. When used, this is according to "Hamiltonian Monte Carlo with energy conserving subsampling" (2019). (Algorithm 1).

    • verbose name: Plate Subsampling
    • default value: auto
    • port type: EnumPort
    • value type: str (can be None)
  • plate_subblocks
    Optionally the number of sub-blocks into which to partition any subsampled plates. This is analogous to the Block Pseudo-Marginal Sampler. This can help with increasing the acceptance rate if some of the sub-blocks are more difficult to sample from than others. If enabled, this follows "The Block Pseudo-Margional Sampler" (2017).

    • verbose name: Plate Subblocks
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • subsample_controL_variate
    The control variate to use when performing plate subsampling. Bardanet is according to "Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach" (2014) and Betancourt is per "The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling" (2015). The latter is the suggested default.

    • verbose name: Subsample Control Variate
    • default value: betancourt
    • port type: EnumPort
    • value type: str (can be None)
  • regularize_mass_matrix
    Wether to apply regularization to the mass matrix. This is generally recommended, particularly for higher-dimensional posterior distribution, since otherwise sampling may become unstable.

    • verbose name: Regularize Mass Matrix
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

PosteriorModeApprox

An approximation of the posterior mode location (MAP), optionally along with a rough estimate of the variance (Laplace).

While these modes are not usually considered a variational approximation, both variants are very efficient and robust (more so when using regularization), and can be used during model bring-up. The maximum a posteriori (MAP) estimate only returns the location of the posterior maximum (mode) without any uncertainty estimates. This has some connections with classic regularized maximum likelihood estimators (when the regularizer can be interpreted as the negative log density of a specific prior distribution) and this can be used to check equivalence when porting models between Bayesian and non-Bayesian (e.g., convex or deep learning) forms. The Laplace approximation has a long history in Bayesian analysis and can be quite useful in practice, especially for nearly Gaussian posteriors, cases where the tail of the posterior distribution is less important, or cases where the posterior mode is of interest. This variant returns the same mode location as MAP, but also estimates the curvature of the (log-)posterior distribution at the mode (precision matrix), which approximates the variance of the posterior distribution if it is (near-)Gaussian. However, since this quantity is derived from only the mode, if the true distribution is non-Gaussian, then this method can easily under- or over-estimate the true variance of the posterior distribution. In such cases, a mean-field or multi-variate normal approximation will be more faithful.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • variant
    The variant to use. The MAP variant yields the basic maximum a posteriori estimate, i.e., only the location of the posterior mode without any uncertainty estimates ("error bars"). This type of approximation is in many cases equivalent to what is oftentimes obtained with regularized maximum-likelihood estimation (if the regularizer corresponds to a specific prior distribution). The Laplace variant also finds the same posterior mode, and in addition estimates the curvature of the posterior distribution at the mode (precision matrix), which quantifies to some extent the uncertainty in the estimate, although potentially less accurately than a mean-field or multivariate normal approximation.

    • verbose name: Variant
    • default value: Laplace
    • port type: EnumPort
    • value type: str (can be None)
  • hessian_approx
    Type of Hessian matrix approximation to use, if computing the Laplace approximation. Both are mathematically equivalent, but the 'hessian' mode uses forward- and reverse-mode differentiation, while the 'reverse-mode' mode uses only reverse-mode differentiation. Depending on the ratio of data points to parameters, these two can have different computational costs; also forward-mode is not universally supported for all models (namely those that may contain control flow such as If/Else or loops).

    • verbose name: Estimator (Laplace Only)
    • default value: hessian
    • port type: EnumPort
    • value type: str (can be None)
  • regularization
    Optional reularization parameter for the hessian matrix. Only used with the Laplace variant. This can be set to a small number such as 1e-3 to robustify the computation.

    • verbose name: Regularization
    • default value: None
    • port type: FloatPort
    • value type: float (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

RandomDraw

Draw a random sample from a distribution, resulting in a random variable.

This node is used to realize a random variable in a statistical model. The role of this node is that a statistical model prescribes a generative process (a randomized "simulation") that characterizes how some data may have been generated, and this is modeled as a series of random draws from specific distributions, which can be chained, hierarchical, or depend on unmodeled observed data (sometimes called "independent" variables). So-called "dependent" variables in the data (data that is explained by the model) are wired as a side input into the 'observation' port of the respective Random Draw node that is meant to model them. The resulting overall model characterizes a "prior" distribution over both the latent random variables and the (observable) dependent variables, and which is optionally parameterized by the independent variables (these are supplied from Placeholder nodes inside the model). An Inference node, when given a model and invoked with some data (via its own 'data' port), passes any independent variables through the model, and propagates information from the dependent variables back through the model to infer the most likely "posterior" distribution over latent variables given the observed data. The model, once a posterior has been inferred, can also be used to generate posterior predictions for any of the dependent variables (posterior predictive distribution) given new values for independent variables. The inference node will do this automatically whenever it is given new data to predict on. Random variables are generally named (see the "variable name" setting of this node), and the random draw itself is done according to a distribution, which can either be specified in the node (via the "distribution" setting) or instead one of the Distribution nodes (in the distributions package) can be wired into the "dist" input port of this node. The latter allows one to create dependencies between random variables, by using the output of one draw, wiring it into a parameter of a Distribution node, and then wiring that distribution into another Random Draw node. The node also offers limited support for reparameterization "tricks" via the reparameterize option; which by default will apply decentering in the most common cases. Besides the optional observation input, a random draw can take in additional side information: this includes an optional mask array (which can be used to mask out certain values in the observation, which are then imputed during inference), and an optional weight array: the weight behaves similarly to the 'sample weight' in a machine learning context and can be used to account for class imbalance or other applicable data weightings. In many cases, a Random Draw will yield multivariate data, and in such cases it is important that unique axes (or customs labels, if an array has multiple of the same axis) are used for different dimensions as needed to avoid ambiguities. Axes can occur in either the like input (which can be used to match the axes of some other data) or by using a distribution that was itself constructed from parameters that had axes. One may also use the shape input, but this is less commonly done (see notes in the setting); with these provisions, axes typically flow forward from the input data through the model and can be managed like in any other NeuroPype pipeline. I It is also possible to imbue random draws with additional non-singeton axes using a With Stacked Variables context (see node for more details), which is simlar to Bayesian plate notation. By default, random draws used in conjunction with inference will use 'matched' axis pairing, which guarantees that all operands from any source will have their axes mutually lined up by type and optionally by axis label. In this context, the output axis order will be dictated by the like input (if given), otherwise by applicable plates (With Stacked Variable contexts, if present) and otherwise will follow from a combination of the distribution, and the shape setting (if given). The observation, mask, and weight axes do not control the order of the output axes but will be reordered to match the node. When the node is used with only plain arrays or undifferentiated axes, great care must consequently be taken to line up positions of dimensions between the various inputs of the node, the distributions, and the applicable plate contexts, which can be exceedingly difficult for large models; therefore this is not recommended. Also note that, unlike other neuropype nodes, each random draw yields only a single array and a distribution only specifies a single array, and therefore the like input, if given as a packet, may only contain no more than one stream.

More Info...

Version 1.0.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • sample
    The generated sample.

    • verbose name: Sample
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: OUT
  • dist
    Optionally a distribution to sample from.

    • verbose name: Dist
    • default value: None
    • port type: DataPort
    • value type: Distribution (can be None)
    • data direction: IN
  • observation
    Optional observed value for this variable. Information provided by concrete observations is propagated at inference time to deduce the most likely distribution of all variables in a given statistical model. This can be used as side information to help the inference engine infer the answer to, roughly, "what is the most likely shape of distributions if we observed this value at this site?" Must have a shape compatible with what is specified via the shape parameter, but is not limited to arrays (e.g., can be a Packet).

    • verbose name: Observation
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: IN
  • mask
    Optional mask that can be used to mask out certain values in the observation. The masked values may then be imputed (estimated) during inference. Note that, if the underlying distribution has a non-trivial event space, the mask will not include the event space axes.

    • verbose name: Mask
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: IN
  • weight
    Optional weight for the random draw. Note that, if the underlying distribution has a non-trivial event space, the weight will not include the event space axes.

    • verbose name: Weight
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: IN
  • seed
    Optional random seed to force a fixed result. Must have been created with one of the RandomSeed nodes. Unlike other Random nodes, this node is normally NOT used with a fixed seed since it will receive one from the context within which it is used (e.g., one of the Inference nodes).

    • verbose name: Seed
    • default value: None
    • port type: DataPort
    • value type: AnyArray (can be None)
    • data direction: IN
  • like
    Optional object (e.g ., Packet, Block, or array) whose shape and data format to adopt when generating the sample.

    • verbose name: Like
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: IN
  • varname
    Name of the resulting random variable. The variable can be accessed under this name when working with derived distributions such as the posterior distribution for a given model. Note that, if a packet is wired into the like input, the stream name in that packet will be suffixed onto the variable name with an underscore separator (e.g., 'myvariable_eeg').

    • verbose name: Random Variable Name
    • default value: myvariable
    • port type: StringPort
    • value type: str (can be None)
  • distribution
    Optionally a distribution name from which to draw the variable. If a distribution is provided via the dist input, this must be left at the default value 'provided'. You can name of any of NeuroPype's Distribution nodes here (without the "Distribution" suffix), and parameters can be specified positionally in the same order as defined in the node; remaining parameters retain their defaults (see drop-down for examples).

    • verbose name: Distribution
    • default value: provided
    • port type: ComboPort
    • value type: str (can be None)
  • shape
    Shape of the sample to draw. Can be a list of integers or axes. Note that if the distribution has a multivariate "event space" (e.g., a multivariate normal or some discrete distributions) then the sample shape concatenates to the left onto the dimensionality of that event space. Note that A more commonly used scenario is instead to wire an object into the like input, whose shape (and data type, i.e. array, chunk, block, or packet) will then be matched by the generated data of the random draw.

    • verbose name: Sample Shape
    • default value: []
    • port type: ListPort
    • value type: list (can be None)
  • reparameterize
    Whether and how to reparameterize the variable. This can be used to improve the convergence of the inference algorithm, for example to avoid funnel geometries (see Neal's funnel). Several inference algorithms will work better if variables have mean zero and unit standard deviation, and this can be achieved by running the algorithm on a transformed version of the variable and automatically back-transforming the result afterwards. The following options are available: decenter enables location/scale decentering, which is the most common reparameterization. The minimal option will by default apply reparameterization for a select few distributions that would otherwise lead to an error, specifically the VonMises and ProjectedNormal distributions. Note this option will NOT automatically detect when to perform decentering for any other distributions. The heuristic option will select either minimal or decenter based on a heuristic that aims to determine whether decentering is applicable (currently, a variable is automatically decentered if a distribution is wired in, that distribution has both a location and scale parameter (and optionally other shape parameters), and has unbounded support, and the location or scale are driven by other random variables); note this heuristic may be refined in subsequent NeuroPype versions. The remaining options are specific to certain distributions: transformed dist. can be used if the distribution was an instance of a transformed distribution, projected normal. dist. can be used if the distribution is of projected-normal type, and circular dist. can be used if the distribution is circular (i.e., von mises). The variationally inferred decentering is not exposed in NeuroPype 2025 and behaves like fixed decentering, but this may change in future versions.

    • verbose name: Reparameterize
    • default value: heuristic
    • port type: EnumPort
    • value type: str (can be None)
  • desc
    Description text for the variable. This can be used to annotate the purpose/meaning of the variable in the context of a statistical model. This is often a single sentence.

    • verbose name: Description
    • default value:
    • port type: StringPort
    • value type: str (can be None)
  • verbose_name
    Optional verbose name for the variable. Can be used for augmented human-readable output.

    • verbose name: Verbose Name
    • default value: None
    • port type: StringPort
    • value type: str (can be None)
  • axis_pairing
    How to pair axes of the inputs. In 'positional' mode, axes are paired by their position according to a right alignment, that is, the last axis of the first operand is paired with the last axis of the second operand, and so on, while any missing axes behave as if they were unnamed axes of length 1 (this is the same way plain n-dimensional arrays pair in Python/numpy). In 'matched' mode, axes are paired by their type and optionally label, where the axis order of the first operand is preserved in the output, optionally with additional axes that only occur in the second operand prepended on the left. The other operand then has its axes reordered to match. All axis classes are treated as distinct, except for the plain axis, which is treated as a wildcard axis that can pair with any other axis. See also the 'label_handling' property for how labels are treated in this mode. The 'default' value resolves to a value that may be overridden in special contexts (mainly the ambient Inference node) and otherwise resolves to the setting of the configuration variable default_axis_pairing, which is set to 'positional' in 2024.x. Note that axis pairing can be subtle, and it is recommended to not blindly trust that the default behavior is always what the user intended.

    • verbose name: Axis Pairing
    • default value: default
    • port type: EnumPort
    • value type: str (can be None)
  • label_pairing
    How to treat axis labels when pairing axes in 'matched' mode. In 'always' mode, labels are always considered significant, and axes with different labels are always considered distinct, which means that, if the two operands each have an axis of same type but with different labels, each operand will have a singleton axis inserted to pair with the respective axis in the other operand. In 'ignore' mode, labels are entirely ignored when pairing axes; this means that, if multiple axes of the same type occur in one or more operands, the last space axis in the first operand is paired with the last space axis in the second operand, etc. as in positional mode. In 'auto' mode, labels are only considered significant if they are necessary for distinguishing two or more axes of the same type in any of the operands, or if they occur on a plain axis.

    • verbose name: Label Pairing
    • default value: auto
    • port type: EnumPort
    • value type: str (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)

StochasticVariationalInference

Apply Bayesian inference given data and a wired-in statistical model, using Stochastic Variational Inference (SVI).

SVI fits a distribution function of a pre-specified type (e.g., Gaussian or a non-parametric form) to the posterior using a choice of stochastic gradient descent method, and subject to a criterion like the KL divergence (which can be either analytic or empirical). These amount to the main options of the node, see below for more details. SVI can be very fast compared to MCMC, but the choice of posterior approximation will introduce a bias into the results, unless it is well matched to the actual posterior (which has rarely a "simple" mathematical form, except in special model types), although some flexible approximations like BNAF can mitigate this. Therefore, Bayesian analysis frequently starts with an MCMC approach and may progress to SVI once the form of the posterior is well understood, except when working with specialized models that are designed or known to be a good fit (e.g., Kalman filters, Gaussian processes, etc) or when inference is otherwise prohibitively costly. Use cases are deployed settings with tight compute/memory budgets (on-device machine learning, cloud services), large-scale inference, and certain Bayesian time-series models. Due to its high performance, SVI is also practical for doing inference over deep neural network weights and predictions, when a deep network is embedded in the model (see the Bayes Net node for this). Like the other Inference nodes, SVI "updates" a prior distribution (as formalized by a statistical model) given observed data, yielding a (joint) posterior distribution over the latent (unobserved) variables in the model. The posterior can be accessed for further analysis via the samples outputs port. Unlike MCMC, the node does not generate much in terms of diagnostics, although the loss history can be retrieved via the history port. Once a posterior has been inferred, the node can also be used for predictve inference of dependent variables when new data is passed in, and these predictions are available over the data output port, so the node can be used as a drop-in replacement for other machine-learning nodes in NeuroPype; the inference result can also be saved and loaded back in via the model port. The statistical model is the portion of the graph that is wired into the "graph" input port and runs under the control of the node; see the "Graph [signature]" documentation (tooltip) for more detail on how statistical models can be built using nodes such as Random Draw or With Stacked Variables. Unlike the MCMC node, SVI does not currently support discrete latent variables (e.g., mixture models), but discrete dependent variables pose no problem. Like all inference nodes, the node generates an output distribution in the form of a collection of samples whose distribution matches that of the posterior; these can be scatter-plotted, summarized via moment or quantile statistics, or propagated through downstream computations. The node does not currently generate analytic posterior (predictive) probability density functions, but the variational parameters (e.g., location and scales in case of a Gaussian approximation) can be retrieved via the params port and used for this purpose. The main configuration option is the type of posterior approximation, which can be Gaussian (in which case the posterior correlations setting applies) or the flexible BNAF type, but analytic mean-field, Laplace, and MAP approxmation can also be chosen. Among the tuning options to improve convergence or debug issues are the type of cost function ("variational objective") to optimize and the type of optimizer to use for it (which can also be wired in as a node). One can also configure the number of "particles" (a type of batch size, but not the same as the mini-batch size in deep learning, the latter of which in bayesian models is rather configured via the subsample option when using a With Stacked Variables (Plate) node. The optimizers are generally interchangeable, but a skilled choice can help with models that do not converge, converge slowly, or result in NaN's or infinities. Another potential remedy is overriding the init strategy with more conservative settings. However, if this seems incurable after a handful of optimizers have been tried, the root cause is very likely a mis-specified or ill-conditioned model, or possibly bad data. For general model troubleshooting tips, see also the documentation of the MCMC node. If you get an error related to jax "tracers" or "concretization", it may be that your model has a parameter that is used for some type of control flow or structural choice (e.g., the replication count in a plate nodes), and you may have to provide this as part of the frozen model arguments instead (under miscellaneous). Like other GPU-compatible ML nodes in NeuroPype, the wired-in input data is slightly reorganized for more efficient compute, and among others, the instance axis of your data will appear as a separate stream with a feature axis (indexing the instance-axis fields such as the TargetValue) and a (blank) dummy instance axis indexing the instances. This can be further configured via options in the "Input" category.

More Info...

Version 0.9.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • data
    The data to process.

    • verbose name: Data
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: INOUT
  • samples
    Generated posterior samples in the desired format.

    • verbose name: Samples
    • default value: None
    • port type: DataPort
    • value type: Packet (can be None)
    • data direction: OUT
  • history
    History trace of inference process.

    • verbose name: History
    • default value: None
    • port type: DataPort
    • value type: Packet (can be None)
    • data direction: OUT
  • graph
    The graphical model to use.

    • verbose name: Graph
    • default value: None
    • port type: GraphPort
    • value type: Graph
  • graph__signature
    Argument names accepted by the wired-in graphical model (statistical model). This is a listing of the names of input arguments of your statistical model, which is typically a single argument conventionally named "data" (although you can choose your own name) followed by a catch-all tilde symbol, as in (data,~). This data argument then receives the same data given to the inference node (via its data input), in most cases after minimal restructuring for efficient processing (as per the input options).

    In addition to or instead of this argument, the model may accept keyword arguments (generally listed after a + separator), which may match the name of a stream (such as 'eeg') in the input data packet (in which case a single-stream packet will be assigned to that argument), or the name of an axis field (e.g., 'TargetValue') of an axis that was promoted to a stream (see the corresponding option under Input, which defaults to promoting the instance axis). In such a case, the variable receives that stream's Block, but reduced to only the axis field in question. An example signature that does this is (data, +, TargetValue, ~) and a signature that binds only a stream named 'eeg' to a keyword argument instead of receiving the data as a whole is (+, eeg, TargetValue, ~). The model may also accept an argument conventionally named 'is_training' either as a second positional argument or as a keyword argument, which will receive False if the model is used in a prediction capacity, and True otherwise.

    Your model is generally a graph that begins with one or more Placeholders nodes whose slotname must match a name listed here, and which are followed by a series of nodes that collectively specify your graphical model. The final node of your network must be wired into the "graph" input of the inference node (note that in graphical UIs, the edge that goes into the "graph" input will be drawn in dotted style to indicate that this is not normal forward data flow, but that the graph runs under the control of the inference node). However, the return value of your model has no special meaning since the inference node is generally only concerned with named random variables occurring in the model (i.e., the varnames in Random Draw nodes).

    A simple strategy for building a graphical model is to work backwards from the data that you wish to explain or model, and the core tool at your disposal is the Random Draw node, which specifies that the data you want to model (wired in to the obs input port) is the result of a random draw from some (specifiable) distribution (see Distributions package). Models become hierarchical by using distribution parameters that are themselves the output of a random draw. Other useful nodes are the With Stacked Variables node and the associated At Subscripts node. A wide range of other nodes may be used (e.g., math operations, formatting) to complicate the dependency structure among variables in the model.

    • verbose name: Graph [Signature]
    • default value: (data,~)
    • port type: Port
    • value type: object (can be None)
  • approx
    Approximating variational distribution family.

    • verbose name: Approx
    • default value: None
    • port type: DataPort
    • value type: BaseNode (can be None)
    • data direction: IN
  • optstep
    Optional optimizer step node. This is one of the Step nodes (nodes ending in Step). Note that not all step nodes implement end-to-end optimizers but implement special gradient processing steps (such as clipping); if in doubt, review the documentation of the respective step nodes. Composite steps can be defined using either the ChainedStep or CustomStep nodes (the latter for a fully custom step graph).

    • verbose name: Optstep
    • default value: None
    • port type: DataPort
    • value type: BaseNode (can be None)
    • data direction: IN
  • params
    The estimated parameters of the model.

    • verbose name: Params
    • default value: None
    • port type: DataPort
    • value type: dict (can be None)
    • data direction: OUT
  • posterior_approx
    Type of approximating distribution family to use for the posterior. The Gaussian approximation is a diagonal, full-rank, or low-rank approximation as per the posterior correlations setting. The mean-field approximation is a very fast method using analytic KL divergence calculations that is mathematically equivalent to the diagonal Gaussian approximation, save for these implementation differences. The Laplace approximation is a rough approximation that inspects only the cuvature at the mode of the posterior to deduce the overall covariance (this only holds for nearly Gaussian posteriors, and the error will be larger than with the other Gaussian approximations, which account for probability mass away from the mode). This can also be given as in Laplace(1e-3) to specify a regularization parameter for additional robustness in high-dimensional settings. The MAP approximation only estimates the location of the posterior maximum (aka mode) without any Bayesian uncertainty estimates ("error bars"). The BNAF (block neural auto-regressive flow) approximation is the most flexible in this listing, and can fit non-Gaussian, skewed, and even multimodal distributions (within limits). The first parameter is the number of flows to use, and the remaining are the hidden units per layer, for an arbitrary number of layers (see also the BNAF Approximation node). The 'provided' option allows you to wire in a custom approximation family (any of the Approx nodes) using the approx port; these may offer more control over the parameters, and additional approximations beyond those listed here may be available.

    • verbose name: Posterior Approximation
    • default value: Gaussian
    • port type: ComboPort
    • value type: str (can be None)
  • objective
    Type of variational objective function to optimize. Basic ELBO ("evidence lower bound") is the simplest SVI cost function and amounts to a stochastic approximation of the KL divergence; this will work with the widest range of models. The Rao-Blackwellized formulation has lower variance but is more complex and has somewhat more limited applicability (e.g., it does not currently support networks including the Bayes Net node). The Mean-Field (Analytic) objective uses analytic (as opposed to stochastic) formulas for the KL divergence, and is consequently very efficient, but is mainly meant for use when paired with the Mean-Field (Analytic) posterior approximation. The auto setting uses Mean-Field (Analytic) if the posterior approximation is set to that, otherwise Basic ELBO if the model contains a Bayes Net node, and otherwise Rao-Blackwellized. In case of inference issues it can be useful to override this and start with either Basic ELBO as a simple starting point, and only go to Rao-Blackwellized once any issues have been resolved.

    • verbose name: Variational Objective
    • default value: auto
    • port type: ComboPort
    • value type: str (can be None)
  • initial_stddev
    Initial standard deviation of posterior approximation. This should be set a small value (e.g., 0.05 or 0.1), and only applies to the normal approximations (Gaussian and Mean-Field).

    • verbose name: Initial Stddev
    • default value: 0.1
    • port type: FloatPort
    • value type: float (can be None)
  • optimizer
    Optimizer to use for fitting the variational approximation to the posterior distribution (as per the chosen objective function). Adam or similar algorithms are generally great choices. The default learning rate for all optimizers is set relatively high here, therefore if the optimization diverges, you may need to lower the learning rate (first parameter, e.g. by 10x). Note that while one can remedy some issues with tweaks on the optimizer side, the problem may well lie in the model (e.g., not reparameterized, badly configured distributions, model not matching the data, non-identifiable parameters, Funnel geometries, etc), the variational approximation (e.g., too flexible), or in the data (e.g., a common problem is failing to standardize the explanatory variables to reasonable scales); for these reasons it can be a good idea to first test the model with an MCMC approach, and going through tips on addressing Bayesian model mis-specification.

    If Adam diverges, one can also try one of AdaBelief, AMSGrad, Fromage or Yogi. For models with a high level of gradient variance, one might also try to pass in the beta1 parameter to Adam-like optimizers (see their documentation, usually second parameter) and set it to 0.95 instead of 0.9, although this should only rarely be necessary. However, particularly when using some of the more advanced approximating familities like BNAF, or models that are extremely complex or nonlinear (like deep bayesian neural networks), it may be necessary to experiment somewhat with different optimizers. For example, if you are running out of memory with very large models, you may test AdaFactor and SM3, and if you are using extremely large batch sizes (numbers of particles in the 100s or more, for example for models that combine variational inference with deep learning), you could try out Lamb or Lars. For information on the available optimizers, see the individual Step nodes in NeuroPype's deep learning category. Optimizers can be configured with parameters here, by appending them in parentheses after the name and separated by commas, in the order of appearance in the respective step node. For example, 'Adam(0.001, 0.9, 0.999)' specifies the Adam optimizer with a learning rate of 0.001, beta1 of 0.9, and beta2 of 0.999.

    To achieve the highest possible accuracy in the final solution, the learning rate may be decayed over time following a schedule to improve robustness, convergence and final-solution accuracy. To specify such a learning rate schedule one needs use a custom optimizer. This is done by choosing the "provided" optimizer setting and then wiring one of the Step nodes corresponding to an optimizer (e.g., AdamStep) into the "optstep" port (using the step node's "this" output port). Finally one wires one of the Schedule nodes (e.g., "Linear Warmup Exponential Decay Schedule" (WarmupExponentialDecaySchedule)) into the into the step node's learning rate schedule port. This also allows one to override parameters of the optimizer in oner pipeline graph rather than in textual form. When doing so, note that not all Step node are end-to-end optimizers - if in doubt, go by the optimizers in this listing or read the optimizer's documentation text (the icon will also have the text "opt step" in it). It is also possible to use a more custom optimizer using either the ChainedStep node or the CustomStep node, which gives one even finer-grained control (see documentation of these two nodes for more details on their use).

    • verbose name: Optimizer
    • default value: Adam(0.01)
    • port type: ComboPort
    • value type: str (can be None)
  • num_particles
    The number of samples (aka particles) to use in each stochastic update. This is a type of batch size in stochastic gradient descent, and larger values reduce the variance of the gradient estimate and can mitigate any divergence issues, but also increase the computational cost (although on the GPU, low values may be essentially free). Note that a separate type of batch size is related to data subsampling, and this is can be enabled by using a With Stacked Variables node in the model and setting subsampling to the desired batch size. For deep learning inside the model, it can be a good idea to experiment with the number of particles (e.g., going up to 10), while for more conventional models, 1 particle is often sufficient.

    • verbose name: Number Of Particles
    • default value: 1
    • port type: IntPort
    • value type: int (can be None)
  • max_iter
    Maximum number of iterations to run. 1000 is a good value, but when performing inference in deployed settings, lower values could be explored on a case-by-case basis depending on the demands of the data and model, but their fidelity should be validated.

    • verbose name: Max Iterations
    • default value: 1000
    • port type: IntPort
    • value type: int (can be None)
  • use_gradient_clipping
    Whether to use an update rule that clips large gradients. This can improve robustness of the estimator.

    • verbose name: Use Gradient Clipping
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • use_stable_update
    Whether to use an update rule that backtracks from invalid values. This can fix some spurious divergence and may be a good choice in production settings, while during research it can mask issues that are best addressed using other means.

    • verbose name: Backtrack From Invalid Values
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • frozen_args
    Optional dictionary of model arguments that are "frozen" in that the model will be recompiled if they change. These can be used for certain arguments that cannot be handled dynamically.

    • verbose name: Frozen Model Args
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • auto_prefix
    Name prefix for used for variational parameters (parameters governing the variational approximations such as location and scales of Gaussians). If using custom approximation nodes, this is not necessarily respected.

    • verbose name: Auto Prefix
    • default value: vari_
    • port type: StringPort
    • value type: str (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • posterior_correlations
    Assumed covariance structure among posterior variables. This setting determines the fidelity of the variational approximation (if Gaussian is selected) and affects the tractability and efficiency of posterior inference: 'uncorrelated' assumes that all posterior variables are uncorrelated (i.e., diagonal posterior covariance); 'lowrank(N)' assumes that correlations are dominated and explained well by the largest N factors (plus a diagonal term), and 'all-to-all' does not restrict the covariance structure in any way but can have tractability/performance issues on high-dimensional data.

    • verbose name: Posterior Correlations
    • default value: uncorrelated
    • port type: ComboPort
    • value type: str (can be None)
  • init_strategy
    Override the initialization strategy for distribution parameters, optionally for individual (sets of) variables. Some choices accept a parameter that can also be omitted (the default is shown in the drop-down examples). Uniform(r) initializes each parameter to a uniform range within -r to +r in the "unconstrained" domain, which is very frequently a good default, although at times the system may have to make multiple attempts to find a feasible starting point. At times, models can require a more conservative choice, for example median(n) initializes to the median of n samples drawn from the prior distribution, mean uses the expected value, sample draws a random sample, feasible initializes to a fixed trivially feasible point (e.g., 0), and value(x) initializes to a given fixed value x. This can also be specified on a per-variable basis (but typically not necessary for well-specified models), using a dictionary syntax where the variable name (or a wildcard pattern) is used as the key and the value is the initialization form. See also drop-down options for concrete examples.

    • verbose name: Init Strategy
    • default value: uniform
    • port type: ComboPort
    • value type: str (can be None)
  • promote_axis
    If given, promote axes of the specified type to their own stream using the Promote Axis to Stream node. This adds a new stream to the data packet that holds the data of the selected axis; It has a feature axis that indexes the fields of the axis (e.g., 'TargetValue', 'times', etc) and a (blank) dummy copy of the axis as the second dimension, and the axis "payload" (i.e. numeric content) is then accessible in the stream's two-dimensional data array like any other data modality. This representation is generally used in contexts that employ high-performance (multicore or GPU-accelerated) computation, including the Deep Model, Convex Model, and Inference nodes. This can be disabled by leaving the option blank (but note that doing so can incur heavy performance overhead each time the axis content changes from one use of the node to another, e.g., in a cross-validation). You can also disable this you wish to manually use the Promote Axis to Stream or other formatting node beforehand. The new stream is by convention named the same as the axis (i.e., 'instance'), unless there are multiple different streams with matching axes that are not equal to each other, in which case one stream per source stream is created and is named 'streamname-axisname'. The original axis will be cleared to all-defaults for the purpose of running the model graph, but will be restored afterwards from the stream (if modified during inference) or the original axis.

    • verbose name: Promote Axis To Stream
    • default value: instance
    • port type: ComboPort
    • value type: str (can be None)
  • pass_metadata
    Graph accesses marker-stream or props metadata. If this is selected, then marker streams and stream (chunk) properties will be preserved when the input data enters the model graph. If this is deselected (the default), both any marker streams and all but a few essential chunk properties will be removed for the purposes of inference (which avoids unnecessary recompilation of the model graph, but the model will not have access to these details) and are restored afterwards.

    • verbose name: Pass Marker/props Metadata
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • data_format
    The type of statistics to output in the main data output packet (for prediction use cases). This can either be a comma-separated list of summary statistics, in which case the data output will contain a statistic axis along which the respective statistics are itemized, or it can be set to distribution, in which case the output will have a distribution axis along which the desired number of posterior samples are enumerated. The former is useful for a simple estimate-with-error-bars type of representation, which can then be used for plotting etc. If a single location statistic is used (e.g., mean or median), then the output will be largely compatible with the output of a conventional machine learning node (which tend to output a posterior mode or maximum-likelihood estimate depending on the type of model), and the inference node can be used as a drop-in replacement for such nodes. When error statistics are included (e.g., stddev, mad, or ci90), only a few successor nodes will correctly interpret or preserve this output, for example some of the plotting nodes that can display error bars; other numeric nodes will either ignore these statistics (e.g., MeasureLoss) or will act on all statistics using the same operation, which is typically not correct. Instead, if further downstream computations are to be performed with the output of the inference node, it is recommended to use the distribution output, since posterior samples will propagate through most mathematical operations correctly, i.e., the result will be a distribution over the result of the successor operation(s).

    • verbose name: Data Output Statistics
    • default value: distribution
    • port type: ComboPort
    • value type: str (can be None)
  • num_pred_samples
    Number of samples to use when computing various kinds of post-hoc statistics. This affects alll downstream statistics, including the fidelity of the posterior approximation available via the samples output, as well as any predictions available through the data output. Note that when using low-dimensional predictions in a machine-learning context, quite low values such as 32-128 may be sufficient for the desired accuracy and will reduce the compute and memory requirements in prediction workflows.

    • verbose name: Num Stats Samples
    • default value: 256
    • port type: IntPort
    • value type: int (can be None)
  • data_vars
    Optionally a listing of variables for which to generate predictions. By default this will be all variables that are not included in the posterior distribution (what is returned by the dist and samples outputs), which in practice amounts to all observable variables in the model. In general, the node will name the streams in the packet based on the variable (e.g., 'y') and if the respective Random Draw node was configured via a like= input to output a Packet, then this stream name will be concatenated to the variable name (e.g., 'y_eeg'). To take control of the output stream names, one may also specify this as a dictionary, where the keys are the variable names whose data to emit and the values are the desired output stream names (e.g., {'y': 'eeg']).

    • verbose name: Data Output Variables
    • default value: None
    • port type: Port
    • value type: object (can be None)
  • exclude_from_posterior
    Optionally a listing of latent variable names or patterns to exclude from the posterior. It is recommended to use patterns that end in a * to also cover cases where additional related variables, such _decentered, are present in the model. Among others, this can be used with latent variables that take on a different shape at prediction time to prevent shape mismatches. Such variables will then be distributed as per their governing distribution parameters (or their posterior distributions).

    • verbose name: Exclude From Posterior
    • default value: None
    • port type: ListPort
    • value type: list (can be None)
  • vectorized_prediction
    The way in which samples are handled during prediction from new data and generation of predictive statistics. Note that both modes are compiled and will run efficiently, but the vectorized mode can be considerably faster with small (low-dimensional / few samples) models. For high-dimensional models, the memory requirements of the vectorized mode can be very large and the mode is not necessarily faster, so the serial mode is recommended in such cases.

    • verbose name: Vectorized Prediction
    • default value: serial
    • port type: EnumPort
    • value type: str (can be None)
  • canonicalize_output_axes
    Whether to canonicalize the data output axes of the node to match the expected output axes of other machine-learning nodes. This only applies to the data output (predictions) and only if the model outputs data with an instance axis. This is mainly useful if you use the Inference node in an ML workflow (MeasureLoss, Crossvalidation, Parameter Optimization) and ensures that the output has a feature axis that properly encodes what the outputs represent. If the observable variables in your model have highly custom axes (other than one instance and optionally a feature axis), you may need to do some extra reformatting between the Inference node and any downstream ML workflow node.

    • verbose name: Canonicalize Output Axes
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • seed
    Seed for any pseudo-random choices during training. This can be either a splittable seed as generated by Create Random Seed or a plain integer seed. If left unspecified, this will resolve to either the current 0-based "task number" if used in a context where such a thing is defined (for example cross-validation or a parallel for loop), or 12345 as a generic default.

    • verbose name: Random Seed
    • default value: None
    • port type: Port
    • value type: AnyNumeric (can be None)
  • update_on
    Update the model on the specified data. This setting controls how the node behaves when it is repeatedly invoked with data and can usually be left at its default. For use cases in which the node is invoked only a single time in a pipeline, all settings are equivalent. Scenarios where a difference arises include 1) real-time (streaming, aka "online") processing, where initially a non-streaming ("offline") dataset is passed in or a previously trained model is loaded in), and subsequently streaming data is passed through the node, 2) machine-learning style offline analysis, where the node is first given an offline dataset to train on, and then subsequently given a (still offline) test dataset to evaluate the trained model on, or 3) when the model is adapted over the course of multiple successive invocations with potentially different datasets. The settings are then as follows: "initial offline" will adapt the model only on the first non-streaming data that it receives (note that whether data is considered streaming or not is a matter of whether its is_streaming flag is set, rather than whether it comes in in actual real time). The "successive offline" mode will keep updating the model on any data marked non-streaming (this will use the model's prior state if none is wired in, or the wired-in state, which therefore must come from a previous invocation of the model). The "offline and streaming" mode will train on any data, regardless of whether it is offline or streaming (e.g., for adaptive real-time training). Note that not all inference nodes will support the offline and streaming mode -- specifically, MCMC Inference does not. The parameter can also be changed at runtime, to switch from one mode to another.

    • verbose name: Update On
    • default value: initial offline
    • port type: EnumPort
    • value type: str (can be None)
  • dont_reset_model
    Do not reset the model when the preceding graph is changed. Normally, when certain parameters of preceding nodes are being changed, the model will be reset. If this is enabled, the model will persist, but there is a chance that the model is incompatible when input data format to this node has changed.

    • verbose name: Do Not Reset Model
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)
  • verbosity
    Verbosity level. Higher numbers will produce more extensive printed output.

    • verbose name: Verbosity Level
    • default value: 1
    • port type: EnumPort
    • value type: str (can be None)
  • axis_pairing
    Axis pairing override to apply within the context of the model. This will cause all nodes in the model graph that have an axis_pairing option set to 'default' to use this setting instead. It is recommended to leave this to 'matched'; if set to positional, several options in model nodes need to be carefully chosen in correspondence with the axis position in the data, including the number of event-space dimensions in AtSubscript nodes, and the dimension index (relative to the event-space dimension) for any With Stacked Variables nodes. One also needs to be careful with whether distributions have a multivariate event space (e.g., Multivariate Normal) or not (most other distributions).

    • verbose name: Axis Pairing In Model
    • default value: matched
    • port type: EnumPort
    • value type: str (can be None)
  • vectorized_inference
    Whether to vectorize computation across particles. This will be more efficient, but also uses more memory during inference.

    • verbose name: Vectorized Inference
    • default value: vectorized
    • port type: EnumPort
    • value type: str (can be None)
  • differentiation_mode
    The differentiation mode to use. Forward is supported by all constructs that may occur in a model. Reverse can be more efficient depending on the ratio of data points to parameters, but is not supported for example when Fold Loops are used in the model (e.g., for time-series models), and will throw an error in such cases.

    • verbose name: Differentiation Mode
    • default value: reverse
    • port type: EnumPort
    • value type: str (can be None)

WithStackedVariables

Context inside of which each Random Draw node behaves like a stack of N independent draws, indexed by a subscript, and where each draw appears stacked along a new axis.

This is analogous to a "plate" context in graphical models (see also URL at bottom), wherein each random variable inside the plate area behaves like a stack of N subscripted variables that are conditionally independent (conditional on the parameters of the distribution). This is an alternative to relying on the replicated axis being present in either the like input of Random Draw nodes or in parameters of the underlying distributions. While the latter have the same effect of causing the draw to be "batched" or broadcast across the additional axis, the plate notation can be more explicit and make the model easier to interpret or follow. Which form is preferable depends on the specific use case: plates are very useful for implementing published models that often come already specified in plate notation, while the implicit approach can result in a more compact graph with fewer nodes or interactions. The node is configured by specifying an axis type for the new axis (which can also have a label) and a repeat count that is either given as an integer, or one may pass data into the node's like input, whose length along the given axis is used as the count (the latter is preferred when the data has the desired axis, which is often the case). An important feature of this node is that it optionally allows for random subsampling of the index range (along the axis) during the inference, analogous to "mini-batch" updates in stochastic gradient descent. This can greatly speed up inference when the index range is large as with long time series or many data instances, but it can also introduce noise (variance) into the inference. The batch size is specified via the "subsample to" argument. When concrete data (independent or dependent variables) enters the plate context and interacts with random draws therein, the subsampling has to be applied to the data, along the correct axis; this is conveniently done using the At Subscripts node, which automatically detects the enclosing plate contexts and applies the correct index to the right axes of the data. For this reason it is recommended to use the At Subscripts node whenever subsampling is used, and for this to work the node needs to be downstream of the placeholder that receives the subscript (from the innermost plate context if nesting is used), usually via an update-update edge. The manual alternative to this is to use a Select Range node that has the subscript Placeholder wired into its selection range input, and which has the correct axis set (and to do that separately for each applicable nesting level), but this is more error-prone and less convenient. See also the documentation for the "plate body [subscript name]" setting in the node for more intuition on structuring the portion of the graph that shall be nested inside the plate. The user may notice that the presence of a With Stacked Variables node will cause the affected axis to move into a specific position in the output of each contained Random Draw. This is not an error but a side effect of the underlying implementation, where each such context uses a different dimension index. The index can also be overridden via the "stacking dimension" argument, but this is difficult to use correctly by hand since it is relative to the event space dimensions of each random variable (e.g., multivariate normal or certain discrete distributions). It is possible to use plates, random draws, distributions, and at subscript without axes (i.e., using plain arrays for everything), in which case guarantees that axes will be lined up correctly do not apply and the user needs to pre-plan the shapes of all arrays and distributions and work through the respective broadcasting semantics (there is an article on "How Dimensions get Managed by the With Stacked Variables Node, Distributions, and Random Draws" with further details on this use case). This is however not a recommended path since it is extremely error-prone and quite difficult to debug for all but the simplest model.

More Info...

Version 0.9.0

Ports/Properties

  • metadata
    User-definable meta-data associated with the node. Usually reserved for technical purposes.

    • verbose name: Metadata
    • default value: {}
    • port type: DictPort
    • value type: dict (can be None)
  • like
    Data whose axis should be indexed by the plate.

    • verbose name: Like
    • default value: None
    • port type: DataPort
    • value type: AnyNumeric (can be None)
    • data direction: IN
  • body
    Plate body.

    • verbose name: Plate Body
    • default value: None
    • port type: GraphPort
    • value type: Graph
  • body__signature
    Name of the index variable (subscript) passed to the plate body. The usage of this node is somewhat analogous to how one would use a For Each loop node, where there is a loop body that is to be repeated N times, and which begins with a Placeholder node that receives the loop variable (e..g, i). Then any nodes downstream of this placeholder implicitly becomes part of the loop body and will be evaluated N times (in parallel), and the result of the body (final node's output) is wired back into the "body" input of the loop node. That node then outputs the results stacked along a new array axis.

    The fully accurate mental model is that any Random Draw nodes that are directly or indirectly downstream of this placeholder (e.g., connected to it via an update-update edge) behave as if each of them was really a stack of N independent copies of the random draw, as suggested by the node icon. These operations are vectorized and it still the same random draw node that now simply returns an array with an extra axis of length N of a specified position and type (and can also accept arrays with an axis of this type). This makes the plate a "batching" context that is used to model an array of (conditionally) independent random draws. This context only directly affects a small set of Bayesian nodes, including Random Draw, At Subscript, and Optimizable Parameter -- but note that the batched nature of data carries forward through any subsequent array operations, so that the entire computation downstream of the batched draws also tends to becomes batched. By (shift+)control clicking on the edge going into the "plate body" port of the node (which shows in dotted style in UIs) one can see the full set of nodes that are formally part of the plate context (and nested contexts therein).

    See main documentation of this node for additional background and the At Subscripts node on how to handle concrete data (independent or dependent variables) that need to interact with any of the batched random variables.

    • verbose name: Plate Body [Subscript Name]
    • default value: (i)
    • port type: Port
    • value type: object (can be None)
  • values
    Stacked results of the random draws.

    • verbose name: Values
    • default value: None
    • port type: DataPort
    • value type: object (can be None)
    • data direction: OUT
  • is_training
    Whether we're in a training context.

    • verbose name: Is Training
    • default value: None
    • port type: DataPort
    • value type: bool (can be None)
    • data direction: INOUT
  • axis
    Named axis to which the subscript and plate indexing applies. This uniquely associates the replicated dimension of the plate with a named axis of the given type in all data and any distribution or random draw node that interacts with the plate context. This in turn guarantees that axes line up correctly between observable data, distribution parameters, and the new dimension introduced by the plate. When using the At Subscripts node, this also ensures that the data is correctly subsampled along this axis. This can be left unspecified when the node is used with plain arrays, but then it is the user's responsibility to ensure consistent axis alignment across all nodes in the plate context (in such cases it can also help to override the stacking dimension argument to achieve the desired shape of the randomly drawn arrays, which is otherwise not necessary).

    • verbose name: Data Axis
    • default value: unspecified
    • port type: ComboPort
    • value type: str (can be None)
  • count
    Number of times the random variables will be replicated. If this value is not specified, some data must be wired into the "like" input, and an axis must be specified; the length of the data along that axis is then used as the count. Random draws occurring in the stacked body will be independently replicated this many times, conceptually yielding N subscripted instances of the random variable. This is also the length of the resulting outcome array(s) along the stacked axis.

    • verbose name: Repeat Count
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • stack_dim
    The array dimension index along which the outcome array(s) returned by any Random Draw nodes will be stacked. This option is mainly for use with statistical models that are formulated in terms of plain arrays, where the axes are not explicitly named, but note that this is an error-prone way to work with statistical models and not a recommended workflow. When working with plain arrays it is relatively easy to end up with shape conflicts between dimensions that the plates allocate, that the observed data has, and that the distributions were parameterized with or generate, or that the random draw was parameterized with (via its shape argument); in such cases this parameter allows for fine tuning of which exact dimension offsets (relative to the event space dimensions of each random draw) will be used for stacking by this plate. For additional technical details on the internal default behavior (when this is not specified), see the document on "How Dimensions get Managed by the With Stacked Variables Node, Distributions, and Random Draws".

    • verbose name: Stacking Dimension
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • subsample_to
    Batch size to which to subsample the indices along the plate dimension to speed up inference. This is useful for very large collections of indexed variables, where subsampling is analogous to mini-batch updates in stochastic inference or deep learning. However, note that this can introduce noise (i.e., additional posterior variance) into the inference process depending on the specifics of the inference algorithm. When using subsampling, any data that interacts with any random draws in the plate context must also be subsampled along the correct axis, which is conveniently done using the At Subscripts node (see node for more details).

    • verbose name: Subsample To
    • default value: None
    • port type: IntPort
    • value type: int (can be None)
  • no_subsample_when_predicting
    Whether to disable subsampling at prediction time. This is typically desirable since otherwise only a random subset of the data would be used for prediction.

    • verbose name: No Subsample When Predicting
    • default value: True
    • port type: BoolPort
    • value type: bool (can be None)
  • set_breakpoint
    Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.

    • verbose name: Set Breakpoint (Debug Only)
    • default value: False
    • port type: BoolPort
    • value type: bool (can be None)