Select NeuroPype Example Pipelines

Neural oscillatory state classification

Neural oscillations indicate various neural activities or neural states, such as, relaxation, emotions, cognitive processing and motor control. The following pipelines provide some essential tools to harness such neural oscillatory patterns for the purpose of predicting various kinds of cognitive state.

A basic pipeline used for neural oscillatory state classification is named OscillatoryClassificationExample.pyp and can be found in the examples\ folder:

image alt text

The main node of this pipeline is the Common Spatial Pattern (CSP) filter, which is used to retrieve the components or patterns in the signal that are most suitable to represent desired categories or classes. CSP and its various extensions (available through NeuroPype) provide a powerful tool for building applications based on neural oscillations. Some of the available CSP algorithms in NeuroPype are: CSP and Filterbank CSP, which are mostly used for classification tasks (discrete classes of cognitive state, such as alert versus sleepy), and Source Power Comodulation (SPoC) and Filterbank SPoC which are used for regression tasks (continuous-valued cognitive state).

This pipeline can be divided into 4 main parts, which we discuss in the following:

Data Acquisition

Includes : Import Data, LSL Input and Inject Calibration Data nodes

In general, the data is communicated in and out of the pipeline through LSL Input and LSL Output nodes. This is mostly the case when you are running a pipeline on the streamed data online. However. for testing and developing purposes you may want to run the process offline using a prerecorded file. The Import Data nodes (here titled Import Test Data and Import Calibration Data) are used to connect the pipeline to a file.

As discussed in getting started notes, the recorded data can also be streamed and sent via a combination of LSL Output and LSL Input nodes into the pipeline. This can be achieved by the following combination of nodes:

image alt text

If you are sending markers make sure to check the send marker option in LSL output node.

image alt text

The Inject Calibration Data node is used to pass the initial calibration data into the pipeline before the actual data is processed. The calibration data, Calib Data, is used by adaptive and machine learning algorithms to train and set their parameters initially. The main data is connected to the Streaming Data port.

In case you would like to train and test your pipeline using a file (without using streaming node), you need to set the Delay streaming packets in the Inject Calibration Data node. This enables this node to buffer the test data that is pushed into it for one cycle and transfer it to the output port in the next cycle. It should be noted that the first cycle is used to push the calibration data through the pipeline.

image alt text

Data Preprocessing

Includes: Assign Targets, Select Range, FIR filter and Segmentation nodes

The Assign Target node is mostly useful for the supervised learning algorithms, where target values are assigned to specific markers present in the EEG signal. In order for this node to operate correctly you need to know the label for the markers in the data.

The Select Range node is used to specify certain parts of the data stream. For example, if we have a headset that contains certain bad channels, you can manually remove them here. That is the case for our example here with data recorded with Cognionics headset that the last 5 channels are not used and are removed.

The FIR Filter node is used to remove the unwanted signals components outside of the EEG signal frequencies, e.g. to keep the 8-28 Hz frequency window.

The Segmentation node performs the epoching process, where the streamed data is divided into segments of the pre-defined window-length around the markers on the EEG data.

In Segmentation node, the epoching process can be either done relative to the marker or the time window. When Processing a big data buffer, you should set the epoching relative to markers and while processing the streaming data, you should set it to sliding which chooses a single window at the end of the data.

Feature Extraction

Includes: Filter Bank Common Spatial Patterns (FBCSP) node

As discussed above, the spectral and spatial patterns in the data can be extracted by the CSP filters and its extensions. In the FBCSP method, multiple frequency bands can be defined and then desired number of filters are designed for each frequency band. These filters are then applied to the data to extract the features corresponding to model patterns. You can define the frequency bands of interest for this node. Also, you can choose different windows for frequency calculation in order to avoid the boundary effect.

Classification

Includes: Variance, Logarithm, Logistic Regression and Measure Loss

The Logistic Regression node is used to perform the classification, where supervised learning methods is used to train the classifier. In this node you can also choose the type of regularization, the value of the regularization coefficient and the number of the folds for cross-validation.

The Measure Loss node is used to measure various performance criteria. For example you can use the misclassification rate (MCR) or area under the curve (AUC).

Neural oscillatory state regression

A simple pipeline used for neural oscillatory state regression is named OscillatoryRegressionExample.pyp and can be found in the examples\ folder:

image alt text

The main node of this pipeline is the Source Power Comodulation (SPoC) filter, which is used to retrieve the components or patterns in the EEG data. Available such methods in Neuropype are Source Power Comodulation (SPoC) and Filterbank SPoC which are targeted for regression algorithms.

This pipeline can be divided into 4 main parts, which we discuss in the following:

Data Acquisition

Includes : Import Data, LSL Input and Inject Calibration Data nodes

In general, the data is communicated in and out of the pipeline through LSL Input and LSL Output nodes. This is mostly the case when you are running a pipeline on the streamed data online. However. for testing and developing purposes you may want to run the process offline using a prerecorded file. The Import Data nodes (here titled Import Test Data and Import Calibration Data) are used to connect the pipeline to a file.

As discussed in getting started notes, the recorded data can also be streamed and sent via a combination of LSL Output and LSL Input into the pipeline. This can be achieved by the following combination of nodes:

image alt text

If you are sending markers make sure to check the send marker option in LSL output node.

image alt text

The Inject Calibration Data node is used to pass the initial calibration data into the pipeline before the actual data is processed. The calibration data, Calib Data, is used by adaptive and machine learning algorithms to train and set their parameters initially. The main data is connected to the Streaming Data port.

In case you would like to train and test your pipeline using files (without using streaming node), you need to set the Delay streaming packets in the Inject Calibration Data node. This enables this node to buffer the test data that is pushed into it for one cycle and transfer it to the output port in the next cycle. It should be noted that the first cycle is used to push the calibration data through the pipeline.

image alt text

Data Preprocessing

Includes: Assign Targets, Select Range, FIR filter and Segmentation nodes

The Assign Target node is mostly useful for the supervised learning algorithms, where target values are assigned to specific markers present in the EEG signal. For this pipeline, where the valence values are estimated based on the ridge regression, the target values relate to the valence values for each emotion. In order for this node to operate correctly you need to know the label for the markers in the data.

The Select Range node is used to specify certain parts of the data stream. For example if we have a headset that contain certain bad channels, you can manually remove them here.

The FIR Filter node is used to remove the unwanted signals components outside of the EEG signal frequencies, e.g. to keep the 8-28Hz frequency window.

The Segmentation node performs the epoching process, where the streamed data is divided into segments of the pre-defined window-length around markers on the EEG data.

In Segmentation node, the epoching process can be either done relative to the marker or the time window. When Processing a big data buffer, you should set the epoching relative to markers and while processing the streaming data, you should set it to sliding which chooses a single window at the end of the data.

Feature Extraction

Includes: Filter Bank Source Power Comodulation (FBSPoC) node

The Filter Bank Source Power Comodulation (FBSPoC) node is used to decompose the EEG data into a set of source components, this decomposition is guided by using the information in the target variable.

Regression

Includes: Ridge Regression and Measure Loss

The Ridge Regression node is used to solves the regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. The regularization effect can be controlled by alphas values which are regularization weights.

The Measure Loss node is used to measure various performance criteria. For example you can use the MSE (mean-squared error) or MAE (mean absolute error) etc.

A basic pipeline used for event related potential (ERP) classification is named EventRelatedPotentialExample.pyp and can be found in the examples\ folder:

The event related potential (ERP) experiments measure the brain response to a stimulus or an event. Since the brain response to a single stimulus is usually not very visible in a single EEG trial, multiple trails have to be performed and the signal averaged in order to increase the signal-to-noise ratio.

The example pipeline used for ERP classification is shown below:

image alt text

The following pipeline is designed to predict the brain response in the presence of certain stimulus. This pipeline can be divided into 4 main parts, which we discuss in the following:

Data Acquisition

Includes : Import Data, LSL Input and Inject Calibration Data nodes

In general, the data is communicated in and out of the pipeline through LSL Input and LSL Output nodes. This is mostly the case when you are running a pipeline on the streamed data online. However. for testing and developing purposes you may want to run the process offline using a prerecorded file. The Import Data nodes (here titled Import Test Data and Import Calibration Data) are used to connect the pipeline to a file.

As discussed in getting started notes, the recorded data can also be streamed and sent via a combination of LSL Output and LSL Input nodes into the pipeline. This can be achieved by the following combination of nodes:

image alt text

If you are sending markers make sure to check the send marker option in LSL output node.

image alt text

The Inject Calibration Data node is used to pass the initial calibration data into the pipeline before the actual data is processed. The calibration data, Calib Data, is used by adaptive and machine learning algorithms to train and set their parameters initially. The main data is connected to the Streaming Data port.

In case you would like to train and test your pipeline using files (without using streaming node), you need to set the Delay streaming packets in the Inject Calibration Data node. This enables this node to buffer the test data that is pushed into it for one cycle and transfer it to the output port in the next cycle. It should be noted that the first cycle is used to push the calibration data through the pipeline.

image alt text

Data Preprocessing

Includes: Assign Targets, Select Range and Segmentation nodes

The Assign Target node is mostly useful for the supervised learning algorithms, where target values are assigned to specific markers present in the EEG signal. In order for this node to operate correctly you need to know the label for the markers in the data.

The Select Range node is used to specify certain parts of the data stream. For example if we have a headset that contain certain bad channels, you can manually remove them here. In this case we decrease the complexity of the system by choosing the first 19 channels of data from high density EEG recording.

The Segmentation node performs the epoching process, where the streamed data is divided into segments of the pre-defined window-length around markers on the EEG data.

In the Segmentation node, the epoching process can be either done relative to the marker or the time window. When Processing a big data buffer, you should set the epoching relative to markers and while processing the streaming data, you should set it to sliding which chooses a single window at the end of the data.

Feature Extraction

The FIR Filter and Decimate nodes are used together to perform the downsampling of the EEG data, to reduce the dimensionality of the signal. Note that in order to avoid adding aliasing effect to your signal the bandwidth of the lowpass FIR filter should choosen based on the initial sampling rate of the signal and the decimation rate. For example if the initial sampling rate of the signal is 10kHz and decimating by 5, the signal must be filtered to less than 1kHz.

Decimate node only accepts integer values.

The Spectral Selection node is used to implement spectral filtering of the data. In this case, it is used to isolate slow cortical potentials for ERP from higher-frequency signal content.

Component Analysis

Includes: Hierarchical Discriminant Component Analysis and Measure Loss

The Hierarchical Discriminant Component Analysis node (HDCA) is a two stage hierarchical classifier, which is based on a group of logistic regression classifiers. The basic idea is that features are partitioned into blocks of a given size, and a classifier is trained for each block, followed by a classifier that acts on the linearly mapped output of the per-block classifiers. You can set two regularization coefficient for each of the classifiers.

The Measure Loss node is used to measure various performance criteria. For example you can use the MCR (miss-classification ratio).

Source Localization (sLORETA)

An example pipeline used for source localization using the sLORETA method is named SourceEstimation sLORETA.pyp and can be found in the examples\ folder. The purpose of this pipeline is to estimate neural activity on the brain surface from EEG sensor readings. This estimate is also call the Current Source Density or CSD. Different methods can be used for this, and sLORETA is one option (another similar one would be eLORETA).

The brain activity can then be used to drive, for instance, visualizations, as shown in this pipeline, or alternatively one can estimate the activity in one or more regions of interest (ROIs), and then perform further processing of those ROI time series, for instance to drive neurofeedback or brain-computer interfaces.

The following figure shows the main pathway of the pipeline:

image alt text

The operation of this pipeline can be divided into the following stages, which will be discussed in the following: data acquisition, pre-processing, source estimation, source processing, and output.

Data Acquisition

This pipeline receives EEG data from the lab streaming layer using the LSL Input node, which is here fed by the three playback nodes at the bottom of the pipeline. To use an actual live EEG system, you can remove these three nodes, and edit the LSL input node's Query parameter into the string type='EEG' instead. See also documentation of that node for further information.

Preprocessing

What follows is the essential pre-processing of the EEG data to make it fit for use in source estimation. First, we need to remove any channels that are not EEG, since these would otherwise throw off the rest of the pipeline. This can be accomplished using the Select Range pipeline. Double-click it to check the current settings, which will open a dialog like the following.

image alt text

Here we remove the last 5 channels of the data, which in this case happen to be garbage channels, such as accelerometer, trigger, and so on.

Since we need to localize signals based on the activity in the EEG channels, we need to know the 3d locations of the channels. The Assign Channel Locations node will look up the default locations for the EEG channel labels. If your labels are in the international 10-20 system, this will work out of the box. If you have custom labels, you may need to have a custom lookup table for your headset and label system, which can be found in the NeuroPype Enterprise\resources\montages folder. If your EEG data stream is already annotated with 3d location meta-data because the LSL driver took care of it, you are lucky and this node will do nothing.

Next, we remove slow drifts and electromagnetic noise using a FIR filter. Following this filter you could also insert various artifact removal nodes. We also decimate here the signal to a lower sampling rate, to speed up processing of the source-space time series, which can be quite computationally demanding due to the large number of sources to deal with. Have a look at the documentation of the Decimate node -- it will tell you that it needs to be preceded by a FIR lowpass or bandpass filter that removes high frequencies that are no longer representable after decimation, this is called anti-aliasing, and is the other reason for the FIR filter.

The last, and one of the most important, pre-processing steps is re-referencing. This is required here because the source localization method happens to make the assumption that the EEG has been re-referenced in a certain way (specifically, common-average referenced, which means that the average of all channels has been subtracted from each channel). If the data had been re-referenced in any other way, the source localization would be off and give incorrect results. Note that you can ignore however your EEG coming out of the amplifier was originally referenced if you then re-reference the way you need it, as done here.

Source Estimation

Source estimation here is done by a single node, the sLORETA node. This node has one main option, namely the head model to use. The NeuroPype Suite ships with a standard head model based on the Colin27 brain, which is set as the model to use in this pipeline. There is also a tunable parameter that can be used to adjust the smoothness of the source reconstruction. There is also an experimental feature that allows to adapt the headmodel to channel labels and locations that are not in the 10-20 system, using the interpolate checkbox. If this applies to you, be sure to read the parameter's documentation before you use it.

Source Processing

The next few steps show some basic exemplary source-level processing -- these nodes operate on a time series that has one channel for each source location in the brain, which, in the case of this head model, happens to be the vertices of a mesh that follows the brain surface (totaling to over 4000 vertices). The source processing done here is simply calculating the moving-window variance in a particular frequency band of interest, and is shown below:

image alt text

In essence we apply an IIR filter to select a frequency band of interest (here the 8-12 Hz 'alpha' band that is known to capture a type of brain idle activity), and then we square the signal (i.e., each value in the time series), take a moving average (also known as running mean), which gives us the variance of the signal in a sliding window (normally the variance would require subtracting the mean, but here we know that it is approximately zero, since any slow drifts of the signal have been suppressed by the frequency bandpass filter). Finally we take the square root, which gives us the standard deviation, or root-mean-square (RMS) signal for each source location.

This is merely an example of the many things that could be calculated on the source-space representation. Some other computations would be to calculate the average activity in selected regions of interest, which results in a far lower-dimensional (and thus easier to work with) signal, and then to compute various metrics of these ROI time courses. Some applications of this are found in the ROIBandPowerEstimation, ROISpectralPowerEstimation, and ConnectivityEstimation example pipelines.

Output

Finally, we output the resulting signal. Here, we both visualize it on a 3d plot of the brain surface using the CortexActivity node, and we also send it out over LSL so that it can be picked up in another application for further use. The CortexActivity node has a crucial parameter that sets the color scale -- if you change the type of EEG system, you will very likely have to adapt this number to bring the colors into a reasonable range.

Source Localization (Beamformer)

This pipeline is very similar to the pipeline documented in the previous section, Source Localization (sLORETA). However, this pipeline uses linearly constrained minimum variance (LCMV) beamforming for source estimation, rather than the sLORETA algorithm. The beamformer makes different assumptions from sLORETA, and can produce a more accurate signal reconstruction for source vertices of interest, especially in the presence of various noise sources, but the result may result in a less focal reconstruction of single-source activity under well-behaved conditions than what sLORETA or eLORETA can provide. We focus our discussion here only on the LCMV-related nodes, and refer the reader to the sLORETA pipeline walkthrough to learn about the other stages of the data flow, including data acquisition, pre-processing, source processing, and output.

The overall pipeline data flow is depicted in the following:

image alt text

The pathway in the uppermost row of nodes constitutes data acquisition and pre-processing, followed by source estimation, processing of source-space signals, and finally output in the lower two rows. The nodes relevant to source estimation are shown in the following cut-out:

image alt text

First, we perform common-average re-referencing, which is crucial when using the head models that are included with NeuroPype, since all of these assume that the data has been common-average referenced. The resulting EEG time series is then fed into the Data input port of the LCMV Beamforming node. The Beamforming node is somewhat untypical in that it has two input ports, called "Data" and "Covariance". The second input port, Covariance, accepts an estimate of the noise covariance that is used in the Beamforming algorithm. This covariance matrix typically changes over time, and is estimated from a few-second window of data, which is here accomplished using the Per-Element Covariance and Exponential Moving Average nodes. There are many alternative ways to estimate this covariance matrix, for instance using the Moving Average node instead of Exponential Moving Average, or by using special robust or regularized covariance estimators. When processing segmented (or epoched) EEG data, one can also use pre-stimulus EEG to estimate the covariance from, however, here, a sliding window is being used -- be sure to read the documentation of the moving-average node to understand what time range is effectively being used to estimate the covariance from.

The LCMV beamforming node has a number of parameters, which we briefly discuss in the following (for more information, be sure to read the node's documentation and parameter tooltips):

image alt text

The first parameter, the head model file is the type of head model to use. NeuroPype typically ships with a default head model based on the Colin27 brain, which is what is being used here. This file will work for any EEG data that has channels in the 10-20 labeling system. If you have other channel labels, you will need the locations for your channels and then either remap the channel labels to the closest 10-20 labels, or you can try to use the experimental interpolate option (see parameter documentation). The regularization strength is primarily used to prevent the covariance matrix from becoming degenerate, which can happen if the time window is too short, or there are too many channels, or some EEG sensors were electrically shorted. For these purposes, the default should be fine. We recommend leaving the Rescale activations setting at is defaults unless you need to squeeze out the last percent of computation speed from the node, and to leave the Normalize leadfield checkbox unchecked, unless you are dealing with a headmodel that has scaling issues (this can happen with some beta-quality head models, but does not apply to the default model).

The output of this node is a time series that has one channel for each brain source location in the head model, which, in the case of the NeuroPype default head model, correspond to the vertices of a mesh that conforms to the brain surface, resulting in ca. 4500 channels total.

Connectivity Estimation

The connectivity estimation pipeline example demonstrates how NeuroPype can be used to extract brain source activity from multiple brain regions, and how these activity estimates can be used to estimate directed information flow between brain regions using granger causal approaches in real time.

The main pathway of the pipeline is shown in the following figure:

image alt text

The processing done by this pipeline can be divided into multiple stages, including EEG data acquisition, pre-processing, source activity estimation, connectivity modeling, post-processing, and output.

Data Acquisition

For data acquisition we use the lab streaming layer (LSL) like in the other example pipelines. The input node is set up to read from a particular named data stream, which happens to be the one that is provided by the three nodes at the bottom of the pipeline, which play back a recording of EEG data over LSL, shown below:

image alt text

To switch the input over to a live EEG device, we recommend removing the three playback nodes, and changing the LSL Input node's Query parameter from a name-based query to one that basically causes it to read from any EEG stream, using the string type='EEG'. Alternatively, you can also use a name-based query as before, but use the name of the actual live EEG LSL stream (these names can be looked up in the list of current LSL streams, which is opened up by the menu item Help/LSL Streams). You can always go back to playback by running the StreamPlaybackData example pipeline in a separate pipeline (ideally using a recording from your own EEG hardware).

Preprocessing

The pre-processing pathway, shown below, is similar to that of other source localization pipelines, and for completeness we repeat it here.

image alt text

First, we retain the subset of data channels that contain EEG, since non-EEG channels, such as trigger, accelerometer, ExG, etc., will throw off the source localization stage. The Select Channels node is currently set up to remove the last 5 channels of the data, which here happen to be such miscellaneous channels. Note the tooltip for this parameter, which explains the various syntax options in more detail.

Next, we need to associate each channel with a location on the scalp, since otherwise we would have no frame of reference to estimate brain source activity from. If your channels are in the international 10-20 system (or the finer-grained 10-5 system), this step will work out of the box, but if you have other kinds of montages, you may need a custom montage file. See documentation of this node for more information on this.

In the next step, we remove unwanted frequency components from the EEG using a 1-45Hz bandpass filter. This filter will remove both slow drifts, as well as high-frequency activity such as power line noise from the data. The specific frequency response of this filter could be tuned somewhat to better match the a specific application.

The next node in this setup is performing a resampling, which here is set to reduce the sampling rate by half. As explained further in the documentation of this node, the node requires that the input signal contains no frequency contents above 1/2 of the new sampling rate (the Nyquist frequency), since otherwise spectral aliases would appear -- this is the second function of the FIR filter, which happens to remove these high frequencies together with other unwanted frequencies. The main purpose of this resampling is to increase the processing speed of the pipeline so that it runs well on slower machines. If you have plenty of CPU power or speed is a lesser concern for you, you can drop the decimation without a second thought (though if your EEG system provides very high sampling rates such as 2 or more KHz, you probably should keep the decimation in).

Lastly, we change the reference of the data using the re-referencing node. Specifically, we change to a common average reference, that is, we remove the average of all channels from each channel. This step is crucial when using source localization features of NeuroPype, since NeuroPype's head models assume that the data has been common-average referenced.

Source Estimation

The next steps compute the source activity across the brain and then compute average activity in specific select brain regions, shown below:

image alt text

In the first step, we calculate the activity for each source location in the brain, which is, when using the NeuroPype default head model, the set of vertices of a mesh that follows the contours of the brain (based on the Colin27 brain). It is possible to use one of several alternative nodes here, such as sLORETA, eLORETA, or LCMV Beamforming. For this example, we use sLORETA. This node has one main option, namely the head model to use. The NeuroPype Suite ships with a standard head model. There is also a tunable parameter that can be used to adjust the smoothness of the source reconstruction. Further, there is an experimental feature that allows to adapt the headmodel to channel labels and locations that are not in the 10-20 system, using the interpolate checkbox. If this applies to you, be sure to read the parameter's documentation before you use it. The output of this node is a time series with one channel for each vertex (of which there are over 4000).

Next, we want to compute the average activity of a few entire brain regions rather than thousands of small patches of cortex, among others to reduce dimensionality and make it more tractable to compute information flow between these regions. To this end, we first compute the average activity in pre-defined regions of interests (ROIs). These regions form a parcellation of the brain surface into specific named anatomical regions, and they are pre-defined by what is called a brain atlas. A head model file may contain one or more atlases, in this case it contains one (the Desikan-Killiany atlas with 467 regions). Unless we want to use a non-standard atlas or only a subset of the regions, we can use the default settings of the ROIActivations node for this, which will output a time series with one channel for each region. To learn where each of the regions is located, have a look at the companion article for the D-K atlas.

Last, we intend to compute the average activity time course for a small subset of regions, which also happen to be larger than the parcels defined in the atlas. So the goal is to compute a time series with fewer channels, each of which has the average activity of a specific group of atlas regions, that is the average of a specific subset of input channels. If we had only one or a handful of target regions, we could use the Select ROIs node to make a selection and then the Mean node to average the result into a channel, and do this process for each of our target ROIs -- however, since we have quite a few target regions, the amount of repetitive wiring in the pipeline would be rather cumbersome, and so we use a special-purpose node called Merge ROIs, which can merge down multiple sets of ROIs into a new set of target ROIs. The description of what shall be merged and how it should be called is written as a textual description, using a bracket syntax that should be familiar to Python users -- you can find an explanation and some examples in the node's documentation. When editing this, it helps to copy-paste the content of the parameter edit field into a text editor and possibly line-break the text to get a good grasp of what is merged into what, and to carefully review that all desired regions are included. The list of regions that are available for use can also be found in a file named the same as the head model, but ending in -atlas-labels.txt, under NeuroPype Enterprise/resources/headmodels.

Connectivity Modeling

The connectivity modeling is here done in two stages, using the below two NeuroPype nodes:

image alt text

The approach used here first estimates a (time-varying) multivariate autoregressive (MVAR) model that describes the dynamics of the system (the select set of regions), and then we estimate directed information flow between the involved regions from this MVAR model. The Group Lasso MVAR node is a powerful approach for MVAR modeling, which is also fast enough for real-time use as long as there are not too many regions or excessively high sampling rates or model orders involved. The main assumption used by this node to make the inverse modeling problem statistically tractable is that at a given time likely not every region strongly drives the activity in each other region, but that the connectivity is "sparse", that is, only a subset of connections between regions are active. The node is quite robust to a variety of data, but when applying the setup to a different set of regions or EEG system it is recommended to make sure that the parameters are reasonably well chosen. One simple way to this end is to review the output and make sure that the estimated connectivity is not pathological, that is, either no connectivity at all (too sparse), or everything is connected with everything else (non-sparse solution). The main parameter to adjust to ensure that the node operates in a healthy regime is the degree of sparsity (also known as regularization strength). To a lesser extent there are several parameters that determine the complexity of the model (model order), the amount of data available for estimation (window length), parameters that guide the fitting procedure (rho, alpha), and parameters that serve as the stopping criterion (max iterations, tolerance). These parameters can be adjusted to ensure the the algorithm runs in a healthy regime -- see documentation for each parameter. These settings also trade off the running time and the accuracy of the resulting solution.

Once we have the MVAR model, the next step is straightfoward. Essentially, we can use any of a number of alternative dynamical measures (see nodes in the connectivity category) to describe the system dynamics, including, for instance, partial directed coherence or direct directed transfer function. One of the highest-quality measures is the standardized directed directed transfer function (sdDTF), which is the one used in this patch following the MVAR modeling. The output of this node is for every time step and every frequency a square matrix that describes the degree of information flow from a given source (here: region) to any other source. When NeuroPype processes a new chunk of EEG, the output of this node is a 4-way tensor, indexed by (time, frequency, region, region), which describes this interaction as it changes over time.

Postprocessing and Output

The connectivity solution is quite a rich representation, and can be used for a variety of purposes. Here, we prepare the results for visualization by smoothing over time (yielding a more slowly changing estimate), and by averaging over all frequencies of interest. Then we are ready to plot the connectivity using the Connectivity Plot node, and also to send the data out over LSL. Since LSL only supports a flat list of channels, we also need to vectorize the regions by regions matrix before we send it off.

Offline Gaze event detection and saccade-locked EEG ERPs

gaze offline

Before getting started, make sure that you are using Pipeline Designer with "Show expert parameters" checked (under File->Settings).

Data Import

Here we are importing data from an XDF file comprising several streams, including an 'eeg'-type stream and a 'gaze'-type stream.

Gaze Data Processing

We first extract the gaze data with the ExtractStreams node with the stream_names parameter set to ['gaze'].

Most eye trackers provide enough data so one could reconstruct the eye position and orientation as well as its gaze direction, and they usually also provide one or more channels describing the eye-tracker's confidence in the reported data. Here we are only interested in the position of the gaze on the screen and the confidence associated with those positions, so we use the SelectRange node to select only the channels of interest. We are selecting the X and Y positions for both the Left and Right eyes, as well as their Validity (Confidence), for a total of 6 channels. Parameters: axis='space', selection=['LeftEye_GazePoint_PositionOnDisplayArea_X', 'LeftEye_GazePoint_PositionOnDisplayArea_Y', 'RightEye_GazePoint_PositionOnDisplayArea_X', 'RightEye_GazePoint_PositionOnDisplayArea_Y', 'LeftEye_GazePoint_Validity', 'RightEye_GazePoint_Validity'], unit='names'.

We next use the SanitizeGazeData node which uses the Validity measures and the binocular data to reconstruct invalid samples using cross-interpolation. Parameters: confidence_channel='Validity', handle_bad='cross-interpolate', interp_identifiers=['Left', 'Right'].

After the binocular data has been used to reconstruct invalid samples, we next use the SelectRange node once again to keep only one eye (typically the dominant eye). One could also average the two eyes to get a uni-eye position. It is also OK to keep X and Y channels for both eyes in the data, but then the gaze event detection node will treat the gaze position as 4-dimensional which is unintuitive and is more computationally intensive without much benefit. Parameters: axis='space', selection=['RightEye_GazePoint_PositionOnDisplayArea_X', 'RightEye_GazePoint_PositionOnDisplayArea_Y'], unit='names'.

Some gaze-event detection algorithms may expect gaze data to be in specific units. Gaze data can be converted between units using the GazeScreenCoordinateConversion node, but only if the required information is in the packet, and that information can be first set with the SetGazeMetaData node. One can supply the SetGazeMetaData node with the distance between the user and the screen, and with other physical characteristics of the screen. If the latter information isn't provided then the dimensions may be queried from the currently connected monitor. It turns out that the gaze-event detection method we're using in this example is not dependent on the units because it normalizes the position data, so we do not need to supply meta data or convert coordinates here.

The gaze-event detection method we're using does, however, require velocity and acceleration. We next calculate gaze velocity and acceleration from its position using two parallel instances of the FiniteDifferences node, the first with order=1 (default) and the second with order=2. Make sure to check the keep shape box for both nodes. The velocity and acceleration channels must be renamed using the RenameChannels node and parameters search_expr='(Position)' for both and either replace_expr='Velocity' or replace_expr='Acceleration'.

The position, velocity, and acceleration are then concatenated together into a single data packet with the Concatenate Tensors node, along the space axis, and with Append new axis unchecked.

We are now ready to detect gaze events with the ExtractGazeEvents node. As we are not interested in fixations, we set fixation_algorithm='none' and merge_fixation_distance_threshold=0.0 (it will not be necessary to change the latter parameter when fixation_algorithm='none' after the next Neuropype release). We use the velocity and acceleration thresholding algorithm to detect saccades, by setting saccade_algorithm='V+A threshold'. The velocity_threshold argument is very important and depends a lot on the quality of your gaze data and whether or not you want to detect tiny (micro?) saccades. In this example we set velocity_threshold=5.0 but I often satisfied with the default value of 8.0. The ExtractGazeEvents returns a packet containing a table of gaze events, in this case all with the 'Marker' field set to 'Saccade'.

EEG Processing

We begin by using ExtractStreams to isolate the EEG packet. We then use AssignChannelLocations to set the physical locations of known EEG channels using a montage file or, if no montage file provided, by assuming default locations for channels named in the 10-20 system. As these data were collected with a BioSemi A-B cap, we can use the built-in montage file that comes with Neuropype named 'biosemi-mne-64-ab.locs'.

The remaining EEG processing steps are standard procedure and we will only briefly mention a few parameters we changed for this pipeline. We downsample to 256 Hz and use a high-pass at 2 Hz (typically highpass is 0.5 Hz, but we aren't interested in slow components).

ERP Extraction

We use the MergeStreams node to merge the EEG and gaze events packets together. This allows us to then use the Segmentation node to extract individual segments with limits [-0.3, 0.4] around each gaze event. We then subtract the per-ERP baseline with the Re-referencing node, setting the baseline to -0.3...-0.2 seconds on the time axis. Finally, we take the Mean along the instance axis to get the multi-channel ERPs averaged across all detected saccades.

We additionally use two parallel branches of SelectRange and Mean nodes to calculate the average ERP amplitude in specific time windows. 0.01...0.055 will yield an eye-movement artifact, and 0.21...0.28 will yield an occipital ~P200.