Full Text
PDF
Biomedical Signal Processing and Control

High-frequency SSVEP-BCI system for detecting intermodulation frequency components using task-discriminant component analysis

star_border
     Loading your article ...      Welcome to Your Next Discovery   
PDF
Article Details
Authors
Hongyan Cui, Meng Li, Xiaodong Ma, Xiaogang Chen
Journal
Biomedical Signal Processing and Control
DOI
10.1016/j.bspc.2024.106868
Table of Contents
Abstract
1. Introduction
2. Methods And Materials
2.1. Experimental Environment
2.1.1. Subjects
2.1.2. Experiment Equipment And Data Acquisition
2.2. Experimental Design
2.2.1. Paradigm Design
2.2.2. Experiment Procedure
2.3. Signal Processing
2.3.1. FFT
2.3.2. E-TDCA
Xa = [X̃, X̃P] (4)
2.4. System Performance Evaluation
3. Results
3.1. Offline Experiment Results
3.2. Online Experiment Results
4. Discussion
5. Conclusion
Acknowledgements
Abstract
Recently, steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) has significantly progressed and is moving from the laboratory to practical application. However, the system performance and comfort of SSVEP-BCIs still need to be improved. In this study, five flicker frequencies (i.e., 30–34 Hz with an interval of 1 Hz) and eight scaling frequencies (i.e., 0.4–1.8 Hz with an interval of 0.2 Hz) were adopted to jointly encode forty visual stimulus targets using evoked intermodulation (IM) frequency components. Both luminance and shape changes are implemented by sinusoidal sampling stimulus coding methods. High-frequency flicker frequencies and green visual stimuli were chosen to improve the comfort of the proposed system. An extended version of a training algorithm named task-discriminant component analysis (TDCA) was proposed to detect the IM components of SSVEP signals. The average recognition accuracy of eleven subjects is 96.82 ± 0.01 % in the offline experiments for a data length of 5 s. Online validation experiments was constructed from the optimized parameters of offline analysis, and the average accuracy and ITR were 94.37 ± 1.17 % and 113.47 ± 2.60 bits/ min, respectively. Furthermore, ten subjects who participated in the validation part also completed the online free-spell task successfully. These results showed that it is feasible to expand the number of stimulus targets by using IM frequency components of SSVEP signals for target coding, and that the system performance is superior.
High-frequency SSVEP-BCI system for detecting intermodulation frequency components using task-discriminant component analysis Hongyan Cui a,b, Meng Li a, Xiaodong Ma c,*, Xiaogang Chen a,b,* a Institute of Biomedical Engineering, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300192, China b Tianjin Key Laboratory of Neuromodulation and Neurorepair, Institute of Biomedical Engineering, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300192, China c People’s Hospital of Ningxia Hui Autonomous Region, Yinchuan 750001, China A R T I C L E I N F O Keywords: Brain-computer interface High frequency steady-state visual evoked potential Intermodulation frequencies Task-discriminant component analysis A B S T R A C T Recently, steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) has significantly progressed and is moving from the laboratory to practical application. However, the system performance and comfort of SSVEP-BCIs still need to be improved. In this study, five flicker frequencies (i.e., 30–34 Hz with an interval of 1 Hz) and eight scaling frequencies (i.e., 0.4–1.8 Hz with an interval of 0.2 Hz) were adopted to jointly encode forty visual stimulus targets using evoked intermodulation (IM) frequency components. Both luminance and shape changes are implemented by sinusoidal sampling stimulus coding methods. High-frequency flicker frequencies and green visual stimuli were chosen to improve the comfort of the proposed system. An extended version of a training algorithm named task-discriminant component analysis (TDCA) was proposed to detect the IM components of SSVEP signals. The average recognition accuracy of eleven subjects is 96.82 ± 0.01 % in the offline experiments for a data length of 5 s. Online validation experiments was constructed from the optimized parameters of offline analysis, and the average accuracy and ITR were 94.37 ± 1.17 % and 113.47 ± 2.60 bits/ min, respectively. Furthermore, ten subjects who participated in the validation part also completed the online free-spell task successfully. These results showed that it is feasible to expand the number of stimulus targets by using IM frequency components of SSVEP signals for target coding, and that the system performance is superior.
1. Introduction
Brain-computer interface (BCI) focuses on establishing a direct communication and control channel between the brain and external devices [1–3]. Recently, steady-state visual evoked potential (SSVEP)based BCI has become a hot spot as one of the non-invasive BCI systems due to its outstanding information transfer rate (ITR) and ease of use [4,5]. The performance of SSVEP-BCIs have been greatly improved in recent years, but there is still much room for improvement compared to traditional human–computer interaction methods. Currently, researchers are trying to promote the practical application of SSVEP-BCIs by improving the system performance and comfort of SSVEP-BCIs. On the one hand, the ITR is a golden metric for assessing the performance of SSVEP-BCIs. Currently, the highest averaged ITR achieved by an online BCI speller is 325.33 ± 38.17 bits/min with a data length of 0.3 s [6]. It is known that the value of ITR is determined by the time to output a single command, the recognition accuracy rate and the number of visual stimulus targets. In particular, the number of visual stimulus targets has a significant effect on ITR values when the time of visual stimulation task is difficult to compress further and the decoding algorithms have been developed to a certain level of maturity. However, most previous studies have adopted single flicker frequency to encode visual stimulus targets, which means that the limited range of response frequencies may limit the number of encoding targets. To address this issue, researchers have focused on encoding visual stimulus targets using multi-modal or multi-frequency methods. For example, Bai et al. [7] used two flicker frequency sequences (from 6 Hz to 8.5 Hz and from 9 Hz to 11.5 Hz with a resolution interval of 0.2 Hz) to encode the rows and columns of the target matrix separately to evoke P300 and SSVEP signals simultaneously. As a result, the recognition accuracy and ITR recached 94.29 % and 28.64 bits/min, respectively, for a BCI speller task. To further reflect the high ITR advantage of the SSVEP-BCIs, a * Corresponding authors at: Institute of Biomedical Engineering, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300192, China (X. Chen). E-mail addresses: mxd119@163.com (X. Ma), chenxg@bme.cams.cn (X. Chen). Contents lists available at ScienceDirect Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc https://doi.org/10.1016/j.bspc.2024.106868 Received 8 December 2023; Received in revised form 4 August 2024; Accepted 6 September 2024 Biomedical Signal Processing and Control 99 (2025) 106868 Available online 11 September 2024 1746-8094/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies. multi-frequency sequential coding (MFSC) method was proposed to realize SSVEP-BCIs. Originally, in the study of [8], four stimulation frequencies of LEDs (i.e., 16.4, 19.1, 17.5 and 20.2 Hz) were selected to jointly construct six targets using a dual-frequency encoding method. To further extend of the MFSC idea, Chen et al. [9] first built a calibrationfree SSVEP-BCI system with more than 100 commands using only eight frequencies. Furthermore, Ge et al. [10] proposed a coding scheme for a 48-character keyboard based on dual-frequency biased coding (DFBC), which was shown to outperform the system based on MFSC. However, regardless of MFSC or DFBC, the methods expand the number of visual stimulus targets by incorporating temporal factor based on the traditional flicker frequency method. In this case, there are still drawbacks such as the limited number of encoding targets, long encoding cycles and degradation of detection accuracy, which will greatly affect the performance of SSVEP-BCI systems. To overcome these problems, researchers investigated whether it is possible to encode two different types of frequency information simultaneously. For example, Chen et al. used luminance changes and color transitions to jointly encode eight targets, and then induced intermodulation frequency components in addition to the luminance change fundamental frequencies and harmonics [11,12]. In addition, the combination of spiral motion and the size change steady-state motion visual evoked potential (SSMVEP) method induced stable and complete intermodulation frequency components [13,14]. These abovementioned SSVEP-BCI studies suggest the potential for expanding the number of visual stimulus targets using intermodulation frequency components (IMs). On the other hand, from a practical point of view, the user experience is also an issue that cannot be ignored. Previous studies have shown that high frequency flickering targets are relatively less likely to induce visual fatigue than low and medium frequency flickering targets [15,16]. In addition, low-frequency visual stimuli can be consciously perceived, distracting subjects during task processing [17]. For example, a product of a high carrier frequency (i.e., 40 Hz) and low modulating frequencies (i.e., 9–12 Hz) was used to generate an amplitude-modulated visual stimulus to produce four- and six-target SSVEP-BCIs. The results show that the proposed AM-SSVEP system to reduces eye fatigue and achieves more than 90 % recognition accuracy by using stronger harmonic frequency components [18]. Furthermore, several studies have made efforts to improve the comfort level of SSVEP-BCI systems by reducing the eye irritation caused by flickering stimuli [19–21], where the use of radially zoomed motion information has been shown to improve the subjective comfort level of the BCI system [22–26]. In this study, we proposed a novel stimulation presentation method for SSVEP-BCIs. Specifically, eight scaling frequencies (i.e., 0.4–1.8 Hz with an interval of 0.2 Hz) were superimposed on five flicker frequencies (i.e., 30–34 Hz with an interval of 1 Hz) to jointly encode 40 visual stimulus targets. High-frequency flicker frequencies and green visual stimuli were chosen to improve the comfort of the proposed system. An extended version of the TDCA training algorithm was proposed to decode the evoked EEG responses. Offline experiments were employed to optimize the stimulus parameters for the subsequent online experiments. And the online experiments were used to further validate the feasibility of the proposed system. Furthermore, since the BCI speller is a typical method to evaluate the performance of SSVEP-BCI [27,28], we added a free-spell task to further test the usefulness of the proposed system.
2. Methods and materials
2.1. Experimental environment
2.1.1. Subjects
Fifteen subjects (mean age 24.3 years, three males and twelve females) joined in our study. They all had normal or corrected-to-normal visual acuity, were able to concentrate, and had no history of psychiatric or neurological disorders. Our experiments include both offline and online experiments. Eleven and ten subjects participated in offline and online experiments, respectively. Furthermore, six subjects participated in both offline and online experiments consecutively. All subjects were informed about the task by signing an informed consent form before the start of the experiment, which states that they have the right to withdraw from the experiment at any time if they feel uncomfortable during the tasks. Our study was approved by the Institutional Review Board of Tsinghua University.
2.1.2. Experiment equipment and data acquisition
The Synamps2 system developed by Neuroscan was used to acquire EEG signals with a sampling rate of 1000 Hz and a band-pass filtering range of 0.1 to 200 Hz. In order to eliminate the disturbance of the industrial frequency, the EEG data was processed with a 50 Hz notch. The scalp EEG data were acquired with the international 10–20 modified 64- channel EEG cap. In the offline experiments, 60 electrode channels (excluding M1, M2, CB1 and CB2) were used for data acquisition with the reference electrode at the left posterior mastoid. In the online experiments, only 9 electrodes were selected to record EEG data, including Pz, Oz, O1, O2, POz, PO3, PO4, PO5, and PO6. The recorded data are sent in real time from the signal acquisition side to the visual stimulus side via TCP/IP data transfer protocol. In the offline and online experiments, the ground electrode was placed at the midpoint of FPz and Fz electrodes.
2.2. Experimental design
2.2.1. Paradigm design
According to previous studies [26,29,30], the SSVEP-BCI systems constructed with the green circle stimulus targets have higher comfort. Therefore, this stimulus design was also adopted in this study. The Psychtoolbox of MATLAB was used to present the visual stimulus interface, which was presented on an LCD monitor with a screen resolution of 1920 × 1080 pixels and a refresh rate of 120 Hz. The user interaction interface is presented in Fig. 1(a), which is divided into two parts by a line, with 40 green solid circles (RGB: 0, 255, 0) of 120 × 120 pixels distributed in a 5 × 8 layout below the line. In addition, the distance between two adjacent targets in the same row is 120 pixels, and between the same column is 50 pixels. The horizontal and vertical margins are both 60 pixels. In particular, different text and symbolic messages are set in the center of each target, including the number from 0 to 9, 26 letters of the alphabet and 4 characters. The area above the line is the display area for text messages in the spelling task of the online experiment, a blank area with the same height is set at the top of the interface in other tasks to be consistent with the spelling task interface. In our study, the sinusoidal sampling encoding method was used to realize the luminance change and scaling change of the visual stimulus targets, and the frequency combination and the layout in the user stimulus interface are shown in Fig. 1(b). The expression for the variation of the luminance presentation of the stimulus target with the number of refreshed frames is given by: l(Fk, i) = 1 2 × {1 + sin[2πFk(i/R)]}, k = 1, 2, ... ,40 (1) where Fk denotes the luminance changing frequency. According to the study by Chen et al. [31], we set Fk to range from 30 to 34 Hz with an interval of 1 Hz for different rows. k denotes the total number of visual stimulus targets, i indicates the index of the number of frames refreshed in a stimulus presentation sequence and R represents the screen refresh rate. The stimulus sequence l(Fk, i) takes values from 0 to 1, where 0 and 1 indicate the darkest and brightest states, respectively. Fig. 1(d) shows an example of the process of changing the shape of a visual stimulus target. Then the size of the radius of the visual stimulus target can be mathematically expressed as: r(fk, i) = Ra+Asin[2πfk(i/R)] − A, k = 1, 2, ...,40 (2) where fk is the scaling frequency, ranging from 0.4H z to 1.8 Hz with an interval of 0.2 Hz. Since the system design requires subjects to observe both the flickering and scaling characteristics of the target, scaling frequencies within the feasible range were selected to ensure that subjects could accurately receive the target’s flickering stimuli. Ra denotes the original radius, and A is the range of variation of the radius size with a value of 20 pixels.
2.2.2. Experiment procedure
Both offline and online experiments were conducted in this study. Prior to the start of the experiment, subjects were informed of the experimental task, sat in a chair at a distance of 90 cm away from the monitor, and remained relaxed. There are 6 blocks in the offline experiments, and each block consists of 40 trials. Before the start of a block, the user interaction interface of Fig. 1(a) is presented on the screen at half the brightness of the original screen to indicate to the subjects that it is time to prepare for the experiment. Then, a red circle randomly appears at the location of one of the 40 targets, which is presented for 0.5 s and used to prompt the stimulus target in that trial. Subsequently, all visual stimulus targets begin to flicker and scale at their respective frequencies for 5 s. During this time, subjects are asked to focus their eyes on the cued target. Next, the interface is returned to the state of Fig. 1(a) and held for 0.5 s. The procedure is the same for each subsequent trial, lasting 6 s, until all 40 targets have been prompted once. The online experiments consist of two sessions, including the online verification system and the online free-spell system. The first part is used to demonstrate the feasibility and stability of the system, and it consists of 13 blocks. Each block consists of 40 trials. The overall procedure is similar to the offline experiments. The time to output a command is 2.5 s, which includes 0.5-s cue time and 2-s task time. In particular, the first 12 blocks are used to train the system model and the last block is used for online real-time testing. In addition, a beep is released as acoustic feedback for the subject if the target recognized in real time matches the predefined target. In another session, subjects are asked to complete a specific spelling task to further validate the utility of the proposed system. Specifically, the character content of the free-spell task is “HIGH SPEED BCI”, including twelve letters and two spaces. The visual stimulus interface is designed as Fig. 1(c). Unlike the two experimental tasks above, this part of the experiment is not preceded by a cueing step for each trial. Once the interface is presented, all the targets begin to flicker, and subjects have 2 s to gaze at the target character. Subsequently, all targets restored to the original state for 2.5 s. At the same time, the recognition result of the character determined by the system’s real-time analysis, is displayed in the second line of the interface in Fig. 1(c). Subjects were asked to adjust the target character to gaze at for the next trial based on the recognition results displayed in real time. If the presented character did not match the corresponding character in the free-spell task, the subject could make corrections on the next trial by selecting the “←” stimulus target, otherwise the subject proceeded directly to the spelling of next task character. Finally, when all characters were spelled correctly, subjects had to complete the free-spell task by selecting the “ ” character twice in succession. The subjects were asked to complete a comfort questionnaire [19] after completing the offline and online experiments. The subjects are asked to score the comfort level based on a 6-point scale ranging from 1 (totally unacceptable) to 6 (a good experience).
2.3. Signal processing
2.3.1. FFT
Fast Fourier Transform (FFT) is one of the most common methods for extracting features from SSVEP signals. We can quickly derive the intensities information of EEG signals at different frequency components by FFT. In this study, 5-s data length SSVEP data were analyzed for all subjects in the offline experiments.
2.3.2. e-TDCA
As a trained SSVEP identification method, task-discriminant component analysis (TDCA) [32] has been demonstrated to significantly outperform integrated task-related component analysis (TRCA) [6] and other trained competing methods. In our study, an extended version of TDCA algorithm (e-TDCA) is proposed to decode the EEG signal. First, the raw SSVEP signals X are processed into multiple sub-band signals (XSBn , n = 1,2,⋯,N) using zero-phase Chebyshev type I infinite impulse response (IIR) filters. N is the number of sub band. The filtfilt function is used to remove the phase delay of the signal, preserving the features in the filtered waveform exactly where they are in the unfiltered signal. Second, augmentation operations are performed on the preprocessed EEG data. Data augmentation is a common method which can increase the volume, quality and diversity of training data [32,33]. The filtered EEG signals for each training trial are elevated for the first data enhancement: X̃ = [XT , XT1 , ...,X T l ] (3) where X̃ ∈ R(l+1)Nch×NP is the enhanced SSVEP signal. X ∈ RNch×NP is the raw signal for each trial and Xl ∈ RNch×NP denotes the delayed EEG signals of l points. Nch and NP are the number of channels and sampling points, respectively. Since the data points after NP are not within the task time of the current trial, then the points delayed beyond NP are processed by adding zeros. The EEG signal after the second data enhancement is represented as follows:
Xa = [X̃, X̃P] (4)
where X̃P obtained by projecting the first enhanced EEG signal X̃ onto a subspace defined by a reference signal, is expressed as X̃P = X̃Pk. Specifically, Q in the orthogonal projection matrix Pk = QQT for class k, (k = 1, 2, ...,40) target is obtained from the QR decomposition of the reference signal Yk. According to Chen et al. [5], Yk can be set artificially on a case-by-case basis, and Pk is calculated from Yk. Yk = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ sin(2πFkt) cos(2πFkt) sin(2πNhFkt) cos(2πNhFkt) ⋮ sin ( 2π(NhFk − NIMfk)t ) cos ( 2π(NhFk − NIMfk)t ) sin ( 2π(NhFk + NIMfk)t ) cos ( 2π(NhFk + NIMfk)t ) ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ , t = [ 1 fs ,⋯ , Np fs ] (5) where, Nh denotes the number of harmonics and fs denotes the sampling frequency. Notably, intermodulation (IM) frequency components are added to the fundamental and harmonics of the flicker frequency for targets identification. NIM denotes the number of IM components. Third, two-dimensional linear discriminant analysis based on Fisher criterion is applied to find distinguishable projection direction for different classes and then output the spatio-temporal filters. maximize W tr(WT(HbHTb )W) r(WT(HwHTw)W) (6) where Hb ∈ R(l+1)Nch×2NkNp denotes the between-class difference matrix and Hw ∈ R(l+1)Nch×2NtNp denotes the within-class difference matrix. Nk and Nt represent the number of classes and the number of training trials. W ∈ R(l+1)Nch×(l+1)Nch is the combination of the desired spatial filters. During the test step, the first augmentation of the test data Xt ∈ RNch×NP is performed by zero-completion operation to elevate the EEG data dimensions according to Eq. (3). Then, the orthogonal projection matrices Pc are performed on the test data elevated X̃ t ∈ R(l+1)Nch×NP , and the second data augmentation is completed as Eq. (4). Importantly, the obtained spatial filter W ∈ R(l+1)Nch×(l+1)Nch is used to project the training data templates Xka and the augmented signals of test data (X t a) k to obtain the projected matrix. Finally, the canonical correlation coefficient ρk of the projected matrix is calculated. The class k corresponding to the maximum coefficients is the stimulus target classification label of the test trial. The coefficient ρk used for target detection is calculated from the weighted sum of the squared correlation values corresponding to each sub-band signal. ρk = ∑Nh i=1 w(i) • (ρki ) 2 (7) where the setting of w(i) refers to [5]. ω(i) = i− a + b (8) where i denotes the number of sub-bands. a and b represent the constants that are set to 1.25 and 0.25, respectively. The frequency of the reference signal corresponding to the maximum correlation coefficient is then identified to be the frequency of SSVEPs. ftarget = max fk ρk (9)
2.4. System performance evaluation
ITR represents the total amount of information transmitted per unit of time by a BCI system, which is mathematically expressed as follows: ITR = ( log2N+P*log2P+(1 − P)*log2 1 − P N − 1 ) *( 60 T ) (10) The value of ITR is determined by three parameters of the system. N is the total number of visual stimulus targets, P denotes the recognition accuracy rate, and T stands for the time needed to output a single instruction.
3. Results
3.1. Offline experiment results
In this study, five flicker frequencies and eight scaling frequencies were set to jointly encode 40 visual stimulus targets with different combinations. According to the theory of intermodulation frequency components, the IM components will appear in addition to the fundamental frequency and harmonic peaks. As an example, Fig. 2 shows the average SSVEP spectrum induced by the flicker frequency of 32 Hz and the scaling frequency of 1.8 Hz. The EEG data used for the FFT calculation are averaged over all subjects at Oz electrode channel with a data length of 5 s in the offline experiments. Obviously, there is a strong fundamental frequency component and the IM components Fk ± NIM × fk. As for the second harmonic, there is a significant decrease in its peak value compared to the fundamental frequency and the IM components 2Fk ± NIM × fk are also missing and attenuated to some extent. Therefore, we mapped the SSVEP amplitude topography of the Fk ± NIM × fk IM components. Obviously, the spatial distribution of signal amplitude strengths is similar in this set of topographic maps, with strong SSVEP responses appearing in the occipital area. We calculated the amplitude of the fundamental frequency component for all targets, and the comparative results are shown in Fig. 3 as 3D bar charts. Obviously, the amplitude of the SSVEP signal varies from target to target in Fig. 3(a). A two-way repeated measures ANOVA was then used to test whether the signal amplitudes of all subjects were significantly different under the effects of the two conditions: flicker frequency and scaling frequency variation. A significant main effect was found only at the level of scaling frequency (p <0.05). Specifically, the average SSVEP amplitude values corresponding to the eight scaling frequencies for all subjects are shown in the histogram of Fig. 3(b) with significant difference marker. Specifically, there are significant differences between the 1.8 Hz scaling frequency condition and the other six conditions (p < 0.05), except for the 1.6 Hz condition. Similarly, the signal amplitude of the 1.6 Hz condition is significantly lower than other four scaling frequency conditions (i.e., 0.6 Hz, 0.8 Hz, 1 Hz, 1.2 Hz) (p <0.01) and the signal amplitude of the 0.4 Hz condition is significantly lower than the amplitude of 0.6 Hz and 0.8 Hz conditions (p <0.05). Fig. 3(c) shows the distribution of the average SSVEP amplitude for five flicker frequency conditions. The higher frequency conditions (33 Hz and 34 Hz) corresponding to a lower fundamental frequency amplitude, which is consistent with previous study [34], but none of the five conditions presenting significant differences. The SSVEP signal amplitude tends to decrease with increasing flicker frequency. However, except for the lower amplitude at 0.4 Hz and 1.8 Hz, the trend of the amplitude at different scaling frequencies is not significant. In the e-TDCA algorithm used in this study, there are two parameters, the delay points l and the number of subspace filters nsp that need to be optimized based on the offline SSVEP data. We calculated the ITR values of the proposed system for all parameter combinations using the mesh grid search method. Specifically, the average ITR corresponding to each parameter combination is obtained by the cross-validation approach. For each round of cross-validation, three of the six blocks in the offline experiment were selected as the training group, while the others were used as the test group, and a total of 20 rounds of cross-validation ITR value calculation were completed. To ensure the best performance of the online experiment, we calculated the system performance corresponding to different data lengths. Fig. 4 shows the ITR results by optimized method with different data lengths. The highest ITR value of 95.34 bits/ min is achieved at 1.5-s data length when nsp is six and l is two. Therefore, the optimized parameters nsp and l are selected to construct the eTDCA algorithm. The average recognition accuracy of all subjects at 5-s data length for forty targets is then shown in Fig. 5. Obviously, each target in this BCI system is correctly recognized over 84 % of the time, and some of them correspond to 100 % recognition accuracy. A one-way repeated measures ANOVA of the recognition accuracy under each target shows that there is no significant difference between all targets (p >0.05). The average accuracy rate and ITR for all subjects with different data lengths are calculated in Fig. 6. The vertical scale on the left side of the figure shows the accuracy scale. Obviously, the average recognition accuracy increases with the data length, and the highest recognition accuracy value achieves 96.82 ± 0.01 % at 5-s data length. The ITR on the right peaks at 1.5-s data length of 95.34 ± 9.5 bits/min. In general, the data length corresponding to the highest ITR is used to perform the online experiments. However, the average recognition accuracy of the system at 1.5 s is less than 80 %, so it is not meaningful to validate the system at the current window. In order to validate the system more efficiently, we choose to extend the data length to achieve a higher recognition accuracy rate. The average recognition accuracy reaches 83.82 ± 0.04 % when the data length is taken to be 2 s, and a paired ttest shows that the recognition accuracy is significantly higher than the value corresponding to a data length of 1.5 s (p <0.05). Furthermore, we also used a paired t-test to analyze the significant difference in average ITR values for the two data lengths, and the result shows that there is no significant difference between the ITR of 1.5-s and 2-s data length (p >0.05). As a result, a 2-s data length was finally chosen to build the online system. In addition, the mean comfort score of all subjects participating in the offline experiments was 5.12 ± 0.87. This high comfort score shows that the proposed system has a high comfort level.
3.2. Online experiment results
According to the data length parameters and the decoding algorithm parameters optimized from the offline analysis, we chose a data length of 2 s to build online experiments. Therefore, the time to output one command is 2.5 s including the 0.5 s gaze-shifting time in the online validation session. The recognition accuracy and ITR performances of the online validation system are listed in Table 1. Obviously, the average recognition accuracy of all subjects reaches 94.37 ± 1.17 %, which meets the criterion of usefulness, and the average ITR reaches 113.47 ± 2.60 bits/min. In addition, there were six people who completed both the offline and online experimental tasks, and a paired t-test shows that there is no significant difference between the offline and online experiments of recognition accuracy for all six subjects (p >0.05). The performance of online system is comparable to the results analyzed in offline experiments, proving the robustness of the proposed system. This statistical result provides strong evidence for the stability of the system. In the free-spell task session, a 2 s reaction time was added to provide enough free time for subjects to determine the next target character, thus the output time of the free-spell system is set to 4.5 s. All ten subjects who participated in the online validation experiment successfully completed the set free-spell task. The performances of ten subjects in online free-spell task are listed in Table 2. Each recognition result of the system during the spelling task varied across subjects due to differences in the model calibrated based on each subject, so the total time to complete the task also varied. The sixth subject spent the shortest time of 81 s. Possibly due to individual attention differences, S3 and S10 took longer to complete compared to other subjects. In addition, the mean comfort score of all subjects participating in the online experiments was 5 ± 0.47, proving that the system’s userinteraction interface setup is subjectively highly acceptable.
4. Discussion
In this study, differentiating from the traditional visual stimulation paradigms that use one flicker stimulus frequency to encode one target, we innovatively used five flicker frequencies to build a forty-target SSVEP-BCI system using intermodulation frequency components, which are generated by the modulation of luminance and motion information. Furthermore, an extended version of the TDCA training algorithm was proposed to decode the evoked EEG responses. According to the experimental results, the average recognition accuracy for all subjects in the online verification system reached 94.37 ± 1.17 %, demonstrating the feasibility and stability of the proposed system. At the same time, all subjects successfully completed the given task in the online free-spell system. As in previous studies, simultaneous modulation of luminance changes and scaling motion can induce obvious IM frequency components. Table 3 lists the paradigm and system performance parameters of related studies to the best of our knowledge. Obviously, in addition to the exploratory experiments using only one stimulus target design paradigm, the other studies using IM frequency components to encode stimulus targets achieved desirable system performance. However, compared to the system parameters used in the traditional single frequency encoded stimulus paradigm, the number of stimulus targets are lower, usually no more than ten. For example, in the benchmark SSVEP dataset published in 2017 [39], the number of stimulation targets is forty and it has been widely used in several studies for further investigation [40,41]. In particular, the number of stimulus targets is a vital factor to evaluate the system performance, especially for ITR. Therefore, we increased the number of stimulus targets to forty for the purpose of improving the usability and comparability of the system. The high performance of the proposed system verifies the effectiveness of the encoding method in this study. Because several previous studies [39,42] have shown that the performance of trained methods is better than that of untrained methods, this study proposed a trained EEG decoding algorithm based on the characteristics of evoked EEG responses. An extended version of the TDCA training algorithm was proposed to decode the evoked EEG responses. As a result, the average ITR in the online experiments significantly outperformed other similar studies, reaching 113.47 ± 2.60 bits/min. For future studies, we would like to further explore experimental paradigms using IM components to encode the targets. This is because the frequency-tagging method based on IM components has been demonstrated to be useful for clinical diagnosis. For example, IM frequency components can be measured as a neural signature to test whether the visual orthographic deficit is caused by defective processing [43]. The IM components used in this study are derived from the visual integration of the ventral pathway involved in motion detection and the dorsal pathway involved in target luminance detection in the visual senses. In the daily life, people rely on multiple senses to perceive the real world around them. The senses of sight, hearing, and touch work together to provide us with a wealth of information and a comprehensive experience. To explore the effects of multisensory integration of information from different stimuli, Giani [35] and Lapenta [44] have successively investigated whether visual and auditory senses can be nonlinearly integrated. However, the expected IM components were not observed. Therefore, more research and experiments are needed on the nature of cross-sensory information integration as well as on the specific parameters of the stimulus information. In addition to the using IM frequency components to expand the number of stimulus targets, Ye et al. [45] proposed multi-symbol time division coding (MSTDC) encoding method using the same frequency of 30 Hz and four phases to build a system with forty targets. Therefore, if high-precision clock synchronization can be guaranteed, this approach is very superior and has more room for development.
5. Conclusion
The present study proposed a novel SSVEP-BCI spelling system using five high-frequency flicker frequencies to encode forty visual stimulus targets. According to the optimal parameters of offline analysis, the eTDCA algorithm was used to decode the intermodulation frequency components. As a result, the average recognition accuracy and ITR reached 94.37 % and 113.47 bits/min, respectively, and ten subjects successfully completed the online free-spell task. These results clearly demonstrate the feasibility and stability of the proposed system. Importantly, the potential to expand the number of targets in this method is validated. CRediT authorship contribution statement Hongyan Cui: Writing – review & editing, Supervision, The subjects number marked with ‘*’ were those who also took part in the offline experiment. The subjects number marked with ‘*’ were those who also took part in the offline experiment. Conceptualization. Meng Li: Writing – original draft, Visualization, Validation, Funding acquisition, Formal analysis, Data curation. Xiaodong Ma: Supervision, Writing – review & editing. Xiaogang Chen: Writing – review & editing, Supervision, Funding acquisition. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data availability Data will be made available on request.
Acknowledgements
We sincerely thank National Key R&D Program of China (No. 2022YFC3602803), National Natural Science Foundation of China (No. 62171473), Tianjin Municipal Science and Technology Plan Project (No. 21JCYBJC01500), Fundamental Research Funds for the Central Universities (No. 3332023170), and Key Research and Development Program of Ningxia (No. 2023BEG02063) for their financial support in conducting this research. Furthermore, we would like to express our gratefulness to every subject who actively involved in our experiments and thank Xinyi Chi for data analysis support.
 
Article Images (0)