UniMC - Pubblicazioni Aperte Digitali

Detecting and tracking people is a challenging task in a persistent crowded environment (i.e. retail, airport, station, etc.) for human behaviour analysis of security purposes. This paper introduces an approach to track and detect people in cases of heavy occlusions based on CNNs for semantic segmentation using top-view depth visual data. The purpose is the design of a novel U-Net architecture, U-Net3, that has been modified compared to the previous ones at the end of each layer. In particular, a batch normalization is added after the first ReLU activation function and after each max-pooling and up-sampling functions. The approach was applied and tested on a new and public available dataset, TVHeads Dataset, consisting of depth images of people recorded from an RGB-D camera installed in top-view configuration. Our variant outperforms baseline architectures while remaining computationally efficient at inference time. Results show high accuracy, demonstrating the effectiveness and suitability of our approach.

Convolutional Networks for Semantic Heads Segmentation using Top-View Depth Data in Crowded Environment

Liciotti, D;Paolanti, M;Pietrini, R;Frontoni, E;Zingaretti, P

2018-01-01

Abstract

Detecting and tracking people is a challenging task in a persistent crowded environment (i.e. retail, airport, station, etc.) for human behaviour analysis of security purposes. This paper introduces an approach to track and detect people in cases of heavy occlusions based on CNNs for semantic segmentation using top-view depth visual data. The purpose is the design of a novel U-Net architecture, U-Net3, that has been modified compared to the previous ones at the end of each layer. In particular, a batch normalization is added after the first ReLU activation function and after each max-pooling and up-sampling functions. The approach was applied and tested on a new and public available dataset, TVHeads Dataset, consisting of depth images of people recorded from an RGB-D camera installed in top-view configuration. Our variant outperforms baseline architectures while remaining computationally efficient at inference time. Results show high accuracy, demonstrating the effectiveness and suitability of our approach.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno di pubblicazione del prodotto

2018

Appare nelle tipologie:

04.01 Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ICPR2018.pdf solo utenti autorizzati Tipologia: Licenza (contratto editoriale) Licenza: Tutti i diritti riservati Dimensione 215.85 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	215.85 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
ICPR2018.pdf solo utenti autorizzati Tipologia: Licenza (contratto editoriale) Licenza: Tutti i diritti riservati Dimensione 215.85 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	215.85 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11393/291321

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

26

social impact