DATA AUGMENTATION TECHNIQUES FOR BIOSIGNAL ANOMALY DETECTION.
BASIC INFORMATION
Ph.D. Student: Mohammed Ghanim Ayoub
Advisor: Sebastián Ventura
Started on: January 2022
Keywords: Anomaly detection, Biosignal, Data augmentation, Time series data
THESIS PROPOSAL
An electroencephalogram (EEG) is a non-invasive and painless test that is used to record brain activity as electrical signals through the scalp using multiple electrodes. These signals and other biosignals can be seen as a time series, in particular, a sequence of data points obtained through repeated measurements over time. EEG recording contains abnormal waveforms in case of neurological disorders, for example, epileptic seizures. Thus, this measurement is played a pivotal role in the diagnosis and monitoring of a number of a different chronic neurological disorders such as epilepsy and stroke by identifying abnormalities in the signal. Recognition of these abnormalities is time-consuming and laborious which requires extremely trained specialists due to the complexity of the physiological signals. For this reason, it is crucial to develop automated models to distinguish between normal and abnormal EEG measurements in order to detect the abnormalities pertinent to those neurological disorders. Anomaly detection is a technique used to detect the unusual point or patterns in EEG time series data.
The EEG signal can be classified into distinct categories using traditional distance-based machine learning algorithms such as KNN or artificial neural networks. Utilizing RNN and CNN are achieving better results for time series recognition, especially for biomedical signals. In the last few years, due to its success in other research fields, there has been a growing interest in using deep learning (DL) with artificial neural networks for automatic (EEG) analysis compared to traditional EEG processing methods. It is used in wide applications for EEG-related tasks such as seizure detection or prediction, evaluating sleep and its disorders and brain-computer interfaces (BCI). Availability of data for training models could be considered the main factor of the significant success of DL. Furthermore, increasing the amount of training data significantly impacts the performance of models. On the other hand, insufficient training data in ML and DL models is considered one of the main reasons that lead to poor performance and overfitting training data. Data synthesis and data augmentation (DA) are techniques that use to increase the training data and trainable parameters. Therefore, applying these techniques for increasing amounts of data by synthesising new data and trainable parameters could be a potential solution to address these issues.
The objective of this study is to investigate various data augmentation techniques in time series anomaly detection. These techniques will be applied on biosignals such as EEG data to diagnosis, for example, the risk of developing Central Neuropathic Pain or other related neurological disorders. The working hypothesis is that DA has a significant impact on the performance of deep learning and machine learning models. Insufficient training data is considered one of the main reasons for poor performance and overfitting. Therefore, generating more data from the existing ones could be a potential solution to address the aforementioned issues. Another objective of this study will be answering to this question: do anomaly detection will be efficient with more generated data or not? Due to the increase of the rare events and observations in data.
More specifically, the following objectives are detailed:
- To utilize data augmentation for time series anomaly detection with Deep Learning DL, specific on EEG or other biosignal data.
- To overcome the problem of overfitting and to improve the accuracy of the classifier.
- To eliminate time-consuming in EEG measurements preparation.
- To test various data augmentation techniques on EEG data for Spinal Cord Injury and comparing the accuracy.
- To understand the relation between the EEG data augmentation and the improvement of performance of anomaly classifier.