In this paper, we propose an algorithm for snoring sounds detection based on convolutional recurrent neural networks (CRNN). The log Mel energy spectrum of the audio signal is extracted from overnight recordings and is used as input to the CRNN with the aim to detect the precise onset and offset time of the sound events. The dataset used in the experiments is highly imbalanced toward the non-snore class. A data augmentation technique is introduced, that consists in generating new snore examples by simulating the target acoustic scenario. The application of CRNN with the acoustic data augmentation constitutes the main contribution of the work in the snore detection scenario. The performance of the algorithm has been assessed on the A3-Snore corpus, a dataset which consists of more than seven hours of recordings of two snorers and consistent environmental noise. Experimental results, expressed in terms of Average Precision (AP), show that the combination of CRNN and data augmentation in the raw data domain is effective, obtaining an AP up to 94.92%, giving superior results within the related literature.
Convolutional Recurrent Neural Networks and Acoustic Data Augmentation for Snore Detection
Romeo L.;Romeo L.;Romeo L.;
2019-01-01
Abstract
In this paper, we propose an algorithm for snoring sounds detection based on convolutional recurrent neural networks (CRNN). The log Mel energy spectrum of the audio signal is extracted from overnight recordings and is used as input to the CRNN with the aim to detect the precise onset and offset time of the sound events. The dataset used in the experiments is highly imbalanced toward the non-snore class. A data augmentation technique is introduced, that consists in generating new snore examples by simulating the target acoustic scenario. The application of CRNN with the acoustic data augmentation constitutes the main contribution of the work in the snore detection scenario. The performance of the algorithm has been assessed on the A3-Snore corpus, a dataset which consists of more than seven hours of recordings of two snorers and consistent environmental noise. Experimental results, expressed in terms of Average Precision (AP), show that the combination of CRNN and data augmentation in the raw data domain is effective, obtaining an AP up to 94.92%, giving superior results within the related literature.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.