Several techniques for the automatic detection of violent scenes in videos and security footage appeared in recent years, for example with the goal of unburdening authorities from the need of analyzing hours of Closed-Circuit TeleVision (CCTV) clips. In this regard, Deep Learning-based techniques such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) emerged as effective for violence detection. Nevertheless, most of such techniques require significant computational and memory resources to run the automatic detection of violence. Thus, we propose the combination of an established CNN, MobileNetV2, designed for the use in mobile and embedded devices with a recurrent layer to extract the spatio-temporal features in the security videos. A lightweight model can run in embedded devices, in a edge computing fashion, for example to allow processing the videos near the camera recording them, to preserve privacy. Specifically, we exploit transfer learning, as we use a pre-trained version of MobileNetV2, and we propose two different models combining it with a Bidirectional Long Short-Term Memory (Bi-LSTM) and a Convolutional LSTM (ConvLSTM). The paper presents accuracy tests of the two models on the AIRTLab dataset and a comparison with more complex models developed in our previous work, in order to evaluate the drop of accuracy necessary to use a model compatible with limited resources. The network composed of MobileNetV2 and the ConvLSTM scores a 94.1% accuracy, against the 96.1% of a model based on a more complex 3D CNN.

Combining a mobile deep neural network and a recurrent layer for violence detection in videos

Sernani P.
2023-01-01

Abstract

Several techniques for the automatic detection of violent scenes in videos and security footage appeared in recent years, for example with the goal of unburdening authorities from the need of analyzing hours of Closed-Circuit TeleVision (CCTV) clips. In this regard, Deep Learning-based techniques such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) emerged as effective for violence detection. Nevertheless, most of such techniques require significant computational and memory resources to run the automatic detection of violence. Thus, we propose the combination of an established CNN, MobileNetV2, designed for the use in mobile and embedded devices with a recurrent layer to extract the spatio-temporal features in the security videos. A lightweight model can run in embedded devices, in a edge computing fashion, for example to allow processing the videos near the camera recording them, to preserve privacy. Specifically, we exploit transfer learning, as we use a pre-trained version of MobileNetV2, and we propose two different models combining it with a Bidirectional Long Short-Term Memory (Bi-LSTM) and a Convolutional LSTM (ConvLSTM). The paper presents accuracy tests of the two models on the AIRTLab dataset and a comparison with more complex models developed in our previous work, in order to evaluate the drop of accuracy necessary to use a model compatible with limited resources. The network composed of MobileNetV2 and the ConvLSTM scores a 94.1% accuracy, against the 96.1% of a model based on a more complex 3D CNN.
2023
File in questo prodotto:
File Dimensione Formato  
Contardo_combiningamobile_2023.pdf

solo utenti autorizzati

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Copyright dell'editore
Dimensione 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Contardo_frontespizio_2023.pdf

solo utenti autorizzati

Tipologia: Altro materiale allegato (es. Copertina, Indice, Materiale supplementare, Abstract, Brevetti Spin-off, Start-up etc.)
Licenza: Copyright dell'editore
Dimensione 112.44 kB
Formato Adobe PDF
112.44 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Contardo_indice_2023.pdf

solo utenti autorizzati

Tipologia: Altro materiale allegato (es. Copertina, Indice, Materiale supplementare, Abstract, Brevetti Spin-off, Start-up etc.)
Licenza: Copyright dell'editore
Dimensione 121.82 kB
Formato Adobe PDF
121.82 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11393/315270
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact