Module: `machine_learning`

Machine learning algorithms and related nodes.

The majority of these nodes are one-stop shops that can be inserted after data has been appropriately preprocessed and labeled, and output predictions, but others fulfill other special management roles. The most important specialty nodes are Assign Targets, Accumulate Calibration Data, Measure Loss, and Crossvalidation. The most important classic ML techniques are Logistic Regression, Linear Discriminant Analysis and Convex Model, and some of the Classification and Regression nodes for categorical or continuous-valued outputs, respectively. Machine learning methods typically expect their inbound chunks to have an instance axis. Most existing nodes implement supervised machine learning algorithms: these nodes will only output predictions after they have been trained. To train a machine learning node, it needs to receive a packet that has both data and target labels. A node will usually output the predictions for every chunk that it receives once trained, including for the initial training chunk. Also, note that most of these nodes, with few exceptions, will only adapt themselves on non-streaming chunks (flag is_streaming is not set to True). Some nodes in here provide important related functionality, such as nodes for annotation with target labels and and management of training/calibration data, as well as cross-validation, which is an essential validation tool for machine learning.

`AccumulateCalibrationData`

Accumulate calibration data and emit it in one large chunk.

This node is for setups where you have streaming data and some of your processing nodes need to be calibrated, and you intend to collect the calibration data on the fly at the beginning of the live session, instead of using a recording of previously collected calibration data (in the latter case you would use the Inject Calibration Data node instead of this node). If all your processing nodes are capable of incremental calibration on streaming data, you do not need this node -- however, only very few nodes can do that: instead, most adaptive processing nodes require a long chunk that has all the calibration data, on which they then perform a one-step calibration calculation. This node handles the task of buffering up your streaming calibration data into one chunk, and then emits it all in one go, so that subsequent nodes can adapt themselves on it. For this to work, the accumulate node needs to know where the calibration data begin, and where they end: you are responsible for telling it by inserting special marker in the data at the beginning of the calibration period, and another special marker at the end (the marker strings to look for can be customized). This node also supports a few options only visible in expert mode to decide what happens to any streaming data prior to the beginning of the calibration section: usually you want to drop such data since your pipeline would not be able to process it yet anyway, but this can be overridden with a parameter. The other option is what happens when there is another calibration section some time after the first one: you may choose to either ignore that section, or to trigger another recalibration. Also, other than the mere length of the chunk, the way in which the calibration data that this node emits is distinguished from the regular streaming data which it also emits, is that this node marks the calibration chunk that it outputs as 'non-streaming' by setting the corresponding flag on the packet (since most adaptive nodes will only be able to update themselves on non-streaming data). Limitations: While it is possible to use this node to demarcate multiple successive calibration windows, and it will then emit a calibration chunk for each of them, this node basically assumes that a single input chunk is intersected by at most one such calibration time window. If that assumption is violated, this node will default to fusing the successive calibration windows into a longer one that covers all calibration data. Warnings will be generated in such cases.

Version 1.0.0

Ports/Properties

metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.
- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
data
Data to process.
- verbose name: Data
- default value: None
- port type: DataPort
- value type: Packet (can be None)
- data direction: INOUT
begin_marker
Marker string that indicates that calibration data is beginning. As explained in the help text for the node, a marker is necessary to inform the node about when the data that it should accumulate begins (and ends). The recommended way to get this marker into the data is by emitting it from the same program that also generates any other calibration-relevant markers (e.g., those that are picked up in the Assign Targets Markers node). Can also be a time offset, see Marker mode argument.
- verbose name: Begin Marker
- default value: calib-begin
- port type: Port
- value type: object (can be None)
end_marker
Marker string that indicates that calibration data is ending. As for the begin marker, the best way to get it embedded into the data stream is to emit it from the program that manages the calibration process (that program would usually emit markers that are used e.g., by the Segmentation or the machine learning nodes). Can also be a time offset, see Marker mode argument.
- verbose name: End Marker
- default value: calib-end
- port type: Port
- value type: object (can be None)
marker_mode
How to interpret the begin_marker and end_marker. In the default setting 'markers', the are matched to event markers that are assumed to be present in the data. In the 'relative-times' mode, they are interpreted as time offsets counting from the beginning of the data, in seconds.
- verbose name: Marker Mode
- default value: markers
- port type: EnumPort
- value type: str (can be None)
print_markers
Print markers. This prints markers during the calibration period for debugging/inspection.
- verbose name: Print Markers
- default value: False
- port type: BoolPort
- value type: bool (can be None)
verbose
Verbose output.
- verbose name: Verbose
- default value: False
- port type: BoolPort
- value type: bool (can be None)
emit_calib_data
Emit the calibration data. If set to False, the calibration data portion is dropped.
- verbose name: Emit Calibration Data
- default value: True
- port type: BoolPort
- value type: bool (can be None)
emit_predict_data
Emit the non-calibration ('streaming') data. If set to False, the data outside the calibration markers will be dropped.
- verbose name: Emit Streaming Data
- default value: True
- port type: BoolPort
- value type: bool (can be None)
calibration_first
Do not emit any streaming data before the first calibration chunk has ended. This is needed for many methods that can only predict after they have been calibrated.
- verbose name: No Output Before Calibration Data
- default value: True
- port type: BoolPort
- value type: bool (can be None)
can_recalibrate
Allow re-calibration. If false, only a single calibration period will be allowed and subsequent calibration markers will be ignored.
- verbose name: Allow Recalibration
- default value: False
- port type: BoolPort
- value type: bool (can be None)
suppress_empty_packets
Do not emit packets that contain only empty data. In cases where this node would emit a packet with all- empty data, enabling this option will cause the node to emit None instead.
- verbose name: Suppress Empty Packets
- default value: False
- port type: BoolPort
- value type: bool (can be None)
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.
- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)

`AssignTargets`

Select which markers contain event-related signal activity, and optionally assign numeric target values to these markers for use in machine learning.

This node is part of a standard workflow for analysis of event-related signal activity, and optionally for machine learning on that activity. First of all, this node allows you to define which of the (possibly many) markers in the data should be used for event-related analysis. Second, you can assign different numeric values to different events (since each event has an associated string, you can give a mapping that assigns different values to different strings). There is also an option to accept any marker strings that can be converted into numbers, and take these numbers as the target values (this is useful for the case where regression targets are being specified, instead of classification). Once your markers are annotated in this way in the data, subsequent nodes will act on that subset of markers (e.g., the Segmentation node will extract segments around only the target markers), and if you have assigned numeric target values to specific markers, any subsequent machine learning nodes will interpret these values as the desired output values (or "labels") that the machine learning node is supposed to predict whenever it sees data that looks like what it observed around those markers in its training data. Thus, a typical workflow is to have an Assign Target Markers node, followed by a Segmentation node, optionally followed by some segment processing, followed by a machine learning node; usually you also need a way to feed both training and test/live data into this chain of nodes, e.g., using the Inject Calibration Data node or the Accumulate Calibration Data node prior to the Assign Target Markers node. Tip: since this node only distinguishes markers based on exact string matching, you may need to preprocess your marker strings beforehand using other nodes. It can also be helpful to insert new markers in the data based on custom criteria using, e.g., the Insert Markers node, before applying this node.

Version 1.0.2

Ports/Properties

metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.
- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
data
Data to process.
- verbose name: Data
- default value: None
- port type: DataPort
- value type: Packet (can be None)
- data direction: INOUT
mapping
Mapping of matching criteria to target values. Any instances that match a given criterion (e.g., marker name or name pattern) will be assigned the associated target value. The format of the criteria can be overridden by mapping_format. The mapping can be given either fully explicit as a dictionary of {crit1: target-value1, crit2: target-value2, ...}, or using the shorthand list notation [crit1, crit3, crit3], which is equivalent to {crit1:0, crit2:1, crit3:3, ...}. An unorderered set {crit1, crit2, crit3} can be given to simply set target values for any matching criterion to 1.
- verbose name: Value Assignment
- default value: {'M1': 0, 'M2': 1}
- port type: Port
- value type: object (can be None)
mapping_format
Format of the criterion strings. If set to 'names', each instance (e.g., marker string) needs to match the provided string exactly. If set to 'wildcards', the criterion is a wildcard expression that may include * or ? characters. If set to 'conditions', the mapping can be a restricted Python expression that may refer to other instance fields (e.g., "Marker == 'left' and Duration > 4.0", provided that the the instances have fields named Marker and Duration). See NeuroPype's QueryGrammar for more details on the available functions. Also, in this mode the mapping targets are allowed to be strings, which are then evaluated as formulas (possibly dependent on other instance fields) to calculate the target value. The special format 'passthrough-numbers' ignores the mapping entirely, and simply converts the marker strings to numbers, and uses those as target values. The 'compat' format is primarily for backwards compatibility with the settings of some deprecated fields. It is recommended to instead always select the syntax that you're using explicitly.
- verbose name: Mapping Format
- default value: compat
- port type: EnumPort
- value type: str (can be None)
iv_column
Choose which column of the instance axis data table to use for mapping, if mapping is 'names' or 'wildcards'. This will almost always be 'Marker' (the default).
- verbose name: Default Condition Field
- default value: Marker
- port type: StringPort
- value type: str (can be None)
is_categorical
If set then the TargetValue column in the IV table will be marked as categorical.
- verbose name: Is Categorical
- default value: False
- port type: BoolPort
- value type: bool (can be None)
also_legacy_output
Also write the target values in the legacy location. The target values will also be written into the data tensor (block.data).
- verbose name: Also Legacy Output
- default value: False
- port type: BoolPort
- value type: bool (can be None)
use_numbers
Alternatively convert number strings to target values. If this is checked, the marker assignment is ignored, and the node will treat any marker string that can be converted to a number as a target marker, and use the corresponding number as the target value. This is useful when regression targets are encoded in marker strings.
- verbose name: Use Numbers Instead (Regression)
- default value: False
- port type: BoolPort
- value type: bool (can be None)
support_wildcards
Support wildcard matching.
- verbose name: Support Wildcards
- default value: False
- port type: BoolPort
- value type: bool (can be None)
verbose
Enable verbose output.
- verbose name: Verbose
- default value: False
- port type: BoolPort
- value type: bool (can be None)
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.
- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)

`BalanceClasses`

Balance the per-class trial counts in the data.

A class here refers to a numerical group of trials/instances in some data (e.g., class 0 may be one set of experimental conditions, and class 1 may be another). The node ensures that the data contain equal proportions of trials across these classes, which is sometimes necessary to ensure that downstream statistics and/or machine learning are sound (e.g., not unfairly biased towards an over-represented class, or accurately quantifying things like the error rate of ML models when all classes are being equally likely to occur in the data). This node should be called after AssignTargets (or an equivalent node that associates instances with numeric classes), and can be used on either continuous or segmented data. Note that, if you use this node on continuous data, this node will drop all event markers that are not of one of the designated target classes (as set via e.g., AssignTargets). Typically that is all markers where the TargetValue is set to the special value nan.

Version 1.2.0

Ports/Properties

metadata
User-definable meta-data associated with the node. Usually reserved for technical purposes.
- verbose name: Metadata
- default value: {}
- port type: DictPort
- value type: dict (can be None)
data
Data to process.
- verbose name: Data
- default value: None
- port type: DataPort
- value type: Packet (can be None)
- data direction: INOUT
strategy
Strategy to apply. Duplicate will only duplicate under-represented trials, and drop will only drop excess over-represented trials. Mixed sets the target number of trials to the mean of the class trial counts, and duplicates or drops trials for each class accordingly to reach it. With target, under-represented trials will be duplicated and over-represented trials will be dropped to meet the target trial count specified in the target_count property. Bypass causes the node to do nothing.
- verbose name: Strategy
- default value: duplicate
- port type: EnumPort
- value type: str (can be None)
max_factor
Maximum factor by which to duplicate trials if the duplicate strategy is selected, or by which to reduce trials if the drop strategy is selected, after which the other class(es) or dropped or duplicated, respectively, until the classes are balanced. Ignored if None or 0 or the mixed strategy is used.
- verbose name: Max Factor
- default value: None
- port type: IntPort
- value type: int (can be None)
target_count
Target number of trials per class. This only applies if the strategy is set to target. In this case, all classes will be duplicated or dropped as needed in order to reach the target count per class. Ignored if None or 0.
- verbose name: Target Count
- default value: None
- port type: IntPort
- value type: int (can be None)
field_name
Name of field containing the classes to rebalance. In the output, values of this field will occur approximately equally in the given instances. Only change this from TargetValue if the named field has been added to the Instance axis upstream.
- verbose name: Field Name
- default value: TargetValue
- port type: StringPort
- value type: str (can be None)
binning_field
Optional name of field to bin on. This will, for each unique value in this field, perform the balancing within all instances where the field takes on the same value.
- verbose name: Binning Field
- default value:
- port type: StringPort
- value type: str (can be None)
randseed
Optionally the random seed to use to get deterministic results.
- verbose name: Randseed
- default value: 12345
- port type: IntPort
- value type: int (can be None)
verbose
Print info and warning messages. 0: no output; 1: print results only; 2: print errors/warnings; 3: print all.
- verbose name: Verbose
- default value: 3
- port type: EnumPort
- value type: str (can be None)
set_breakpoint
Set a breakpoint on this node. If this is enabled, your debugger (if one is attached) will trigger a breakpoint.
- verbose name: Set Breakpoint (Debug Only)
- default value: False
- port type: BoolPort
- value type: bool (can be None)

`BayesianRidgeRegression`

Estimate a continuous output value from features using Bayesian Ridge Regression.

Bayesian Ridge regression is a an elegant Bayesian method to learn a linear mapping between input data and desired output values from training data, which is closely related to ridge regression. The main difference to ridge regression is that in ridge regression, there is a tunable parameter that controls how strongly the method should be regularized. This parameter controls effectively how flexible or complex the solution may be, to prevent over-fitting to random details of the data and therefore to improve the generalization to new data. In ridge regression, this parameter is tuned using cross-validation, that is, by empirical testing on held-out data. In the Bayesian variant, there is such a parameter as well, however, the optimal degree of regularization is estimated from the data as well in a theoretically clean and principled fashion. This method assumes that both inputs and outputs are Gaussian distributed, that is, have no or very few major statistical outliers. If the output follows a radically different distribution, for instance between 0 and 1, or nonnegative, or discrete values, then different methods may be more appropriate (for instance, classification methods for discrete values). To ameliorate the issue of outliers in the data, the raw data can be cleaned of artifacts with various artifact removal methods. To the extent that the assumptions hold true, this method is highly competitive with other linear methods. Like all machine learning methods, this method needs to be calibrated ("trained") before it can make any predictions on data. For this, the method requires training instances and associated training labels. The typical way to get such labels associated with time-series data is to make sure that a marker stream is included in the data, which is usually imported together with the data using one of the Import nodes, or received over the network alongside with the data, e.g., using the LSL Input node (with a non-empty marker query). These markers are then annotated with target labels using the Assign Targets node. To generate instances of training data for each of the training markers, one usually uses the Segmentation node to extract segments from the continuous time series around each marker. Since this machine learning method is not capable of being trained incrementally on streaming data, the method requires a data packet that contains the entire training data; this training data packet can either be accumulated online and then released in one shot using the Accumulate Calibration Data node, or it can be imported from a separate calibration recording and then spliced into the processing pipeline using the Inject Calibration Data, where it passes through the same nodes as the regular data until it reaches the machine learning node, where it is used for calibration. Once this node is calibrated, the trainable state of this node can be saved to a model file and later loaded for continued use.

Module: machine_learning

AccumulateCalibrationData

Ports/Properties

AssignTargets

Ports/Properties

BalanceClasses

Ports/Properties

BayesianRidgeRegression

Ports/Properties

ClassifierThresholdTuning

Ports/Properties

ClearTargets

Ports/Properties

ConvexModel

Ports/Properties

CovarianceMDM

Ports/Properties

Crossvalidation

Ports/Properties

ElasticNetRegression

Ports/Properties

EnsemblePredictor

Ports/Properties

HierarchicalDiscriminantComponentAnalysis

Ports/Properties

InjectCalibrationData

Ports/Properties

KNNImputation

Ports/Properties

LASSORegression

Ports/Properties

LarsRegression

Ports/Properties

LearningMethod

Ports/Properties

LinearDiscriminantAnalysis

Ports/Properties

LinearSupportVectorClassification

Ports/Properties

LinearToProbabilities

Ports/Properties

LogisticRegression

Ports/Properties

MeasureLoss

Ports/Properties

NaiveBayes

Ports/Properties

ProbabilityCalibration

Ports/Properties

QuadraticDiscriminantAnalysis

Ports/Properties

RegularizedLogisticRegression

Ports/Properties

RidgeRegression

Ports/Properties

SparseBayesianRegression

Ports/Properties

StochasticGradientDescentClassification

Ports/Properties

StochasticGradientDescentRegression

Ports/Properties

SupportVectorClassification

Ports/Properties

SupportVectorRegression

Ports/Properties

TrialAggregatePredictor

Ports/Properties

Module: `machine_learning`

`AccumulateCalibrationData`

`AssignTargets`

`BalanceClasses`

`BayesianRidgeRegression`

`ClassifierThresholdTuning`

`ClearTargets`

`ConvexModel`

`CovarianceMDM`

`Crossvalidation`

`ElasticNetRegression`

`EnsemblePredictor`

`HierarchicalDiscriminantComponentAnalysis`

`InjectCalibrationData`

`KNNImputation`

`LASSORegression`

`LarsRegression`

`LearningMethod`

`LinearDiscriminantAnalysis`

`LinearSupportVectorClassification`

`LinearToProbabilities`

`LogisticRegression`

`MeasureLoss`

`NaiveBayes`

`ProbabilityCalibration`

`QuadraticDiscriminantAnalysis`

`RegularizedLogisticRegression`

`RidgeRegression`

`SparseBayesianRegression`

`StochasticGradientDescentClassification`

`StochasticGradientDescentRegression`

`SupportVectorClassification`

`SupportVectorRegression`

`TrialAggregatePredictor`