Detecting violence in video content, particularly within domestic environments, presents an ongoing challenge in both social and technological contexts. This paper proposes a lightweight deep learning framework for real-time violence detection, optimized for mobile and edge deployment. The approach is based on MoViNet-A0, evaluated in both Base and Stream configurations, and is complemented by a custom Conv2D-based baseline designed for ultra-low-latency inference. All models were trained and validated on the AIRTLab dataset, which includes 350 annotated videos representing violent and non-violent scenes. The Mo YiN et-A0 Base model achieved a validation accuracy of 92.8%, while the Conv2D-based model reached 89.6% validation accuracy, along with a precision and F1-score close to 90%. Performance benchmarks conducted on Android devices and desktop platforms show that real-time inference is feasible, with latencies as low as 0.9 seconds per 10-frame sequence on mid-range smartphones. The entire pipeline has been designed for mobile deployment, and integration into a functional prototype application is currently in progress, aiming to enable real-time violence detection directly on mobile devices.

Real-Time Violence Detection in Video Footage Using a Mobile-Friendly CNN-Based Model

Sernani, Paolo;
2025-01-01

Abstract

Detecting violence in video content, particularly within domestic environments, presents an ongoing challenge in both social and technological contexts. This paper proposes a lightweight deep learning framework for real-time violence detection, optimized for mobile and edge deployment. The approach is based on MoViNet-A0, evaluated in both Base and Stream configurations, and is complemented by a custom Conv2D-based baseline designed for ultra-low-latency inference. All models were trained and validated on the AIRTLab dataset, which includes 350 annotated videos representing violent and non-violent scenes. The Mo YiN et-A0 Base model achieved a validation accuracy of 92.8%, while the Conv2D-based model reached 89.6% validation accuracy, along with a precision and F1-score close to 90%. Performance benchmarks conducted on Android devices and desktop platforms show that real-time inference is feasible, with latencies as low as 0.9 seconds per 10-frame sequence on mid-range smartphones. The entire pipeline has been designed for mobile deployment, and integration into a functional prototype application is currently in progress, aiming to enable real-time violence detection directly on mobile devices.
2025
File in questo prodotto:
File Dimensione Formato  
Halilaj_Indice_2025.pdf

solo utenti autorizzati

Tipologia: Altro materiale allegato (es. Copertina, Indice, Materiale supplementare, Abstract, Brevetti Spin-off, Start-up etc.)
Licenza: Copyright dell'editore
Dimensione 254.88 kB
Formato Adobe PDF
254.88 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Halilaj_RealTimeViolenceDetection_2025.pdf

solo utenti autorizzati

Tipologia: Documento in post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Copyright dell'editore
Dimensione 586.22 kB
Formato Adobe PDF
586.22 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Halilaj_Frontespizio_2025.pdf

solo utenti autorizzati

Tipologia: Altro materiale allegato (es. Copertina, Indice, Materiale supplementare, Abstract, Brevetti Spin-off, Start-up etc.)
Licenza: Copyright dell'editore
Dimensione 1.81 MB
Formato Adobe PDF
1.81 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11393/374110
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact