Violence Analysis Through Deep Learning: An Approach using Virtual Environments
Thesis
Authors | Nadeem, M. |
---|---|
Abstract | In the domain of multimedia content analysis, the precise extraction of valuable information from digital media holds paramount importance. Violence analysis is one such task, with numerous real-world applications. However, the inherently subjective nature of this task poses formidable challenges, as it entails detecting and localising acts of violence across diverse contexts, ranging from content filtering to video surveillance. Existing methods often struggle due to the lack of comprehensive datasets. Public video platforms restrict violent content uploads, impeding the development of real-time violence analysis. Additionally, ethical concerns have hampered progress in violence analysis compared to other computer vision tasks. To address this, we utilise synthetic data inspired by autonomous driving, generated in GTA-V, creating videos with weapons, blood, and combat elements. We automate labelling using object detection, verify with a human-labelled test set, and propose a novel deep learning approach, replacing the multi-head attention layer with a 1D convolution layer for violence classification. Our contributions can be summarised as follows: Firstly, we introduce the Weapon Violence Dataset (WVD), a synthetic dataset designed explicitly for violence analysis. To the best of our knowledge, this dataset is the only one of its kind and is publicly available. Secondly, we devise a novel deep learning-based technique for generating bounding box coordinates across the entire WVD. Precisely, the achieved IoU values for stage 1 and 2 on the synthetic data are 0.8036 and 0.9500 respectively. While on a real-world dataset the IoU value come to an impressive 0.7450 without any retraining of the data. Thirdly, we conduct extensive experiments to evaluate the effectiveness of the synthetic data corpus in training convolutional LSTM models. The results indicate that a remarkable enhancement is observed on established real-world benchmark datasets, with notable accuracy rates reaching 100\% on Peliculas (reflecting a 12\% improvement), 80\% on Violent Flow, 97\% on Hockey (demonstrating a substantial 10.84\% improvement), and 75\% on the Surveillance Camera Fight Dataset (SCFD) (indicating a 3\% improvement). Finally, we propose a new model that reduces the multi-head attention-based convolutional LSTM to a 1D convolution layer. Our empirical results demonstrate that this model performs on par with, and in some cases outperforms, existing models while requiring fewer trainable parameters, reduction by a factor of 2.74 and exhibiting reduced training and testing times. |
Year | 2023 |
Publisher | College of Science and Engineering, University of Derby |
Digital Object Identifier (DOI) | https://doi.org/10.48773/q3290 |
File | License File Access Level Controlled |
Publication process dates | |
Deposited | 21 Dec 2023 |
https://repository.derby.ac.uk/item/q3290/violence-analysis-through-deep-learning-an-approach-using-virtual-environments
Restricted files
File
51
total views5
total downloads1
views this month0
downloads this month