Violence Analysis Through Deep Learning: An Approach using Virtual Environments


Nadeem, M. 2023. Violence Analysis Through Deep Learning: An Approach using Virtual Environments. Thesis
AuthorsNadeem, M.

In the domain of multimedia content analysis, the precise extraction of valuable information from digital media holds paramount importance. Violence analysis is one such task, with numerous real-world applications. However, the inherently subjective nature of this task poses formidable challenges, as it entails detecting and localising acts of violence across diverse contexts, ranging from content filtering to video surveillance. Existing methods often struggle due to the lack of comprehensive datasets. Public video platforms restrict violent content uploads, impeding the development of real-time violence analysis. Additionally, ethical concerns have hampered progress in violence analysis compared to other computer vision tasks. To address this, we utilise synthetic data inspired by autonomous driving, generated in GTA-V, creating videos with weapons, blood, and combat elements. We automate labelling using object detection, verify with a human-labelled test set, and propose a novel deep learning approach, replacing the multi-head attention layer with a 1D convolution layer for violence classification. Our contributions can be summarised as follows: Firstly, we introduce the Weapon Violence Dataset (WVD), a synthetic dataset designed explicitly for violence analysis. To the best of our knowledge, this dataset is the only one of its kind and is publicly available. Secondly, we devise a novel deep learning-based technique for generating bounding box coordinates across the entire WVD. Precisely, the achieved IoU values for stage 1 and 2 on the synthetic data are 0.8036 and 0.9500 respectively. While on a real-world dataset the IoU value come to an impressive 0.7450 without any retraining of the data. Thirdly, we conduct extensive experiments to evaluate the effectiveness of the synthetic data corpus in training convolutional LSTM models. The results indicate that a remarkable enhancement is observed on established real-world benchmark datasets, with notable accuracy rates reaching 100\% on Peliculas (reflecting a 12\% improvement), 80\% on Violent Flow, 97\% on Hockey (demonstrating a substantial 10.84\% improvement), and 75\% on the Surveillance Camera Fight Dataset (SCFD) (indicating a 3\% improvement). Finally, we propose a new model that reduces the multi-head attention-based convolutional LSTM to a 1D convolution layer. Our empirical results demonstrate that this model performs on par with, and in some cases outperforms, existing models while requiring fewer trainable parameters, reduction by a factor of 2.74 and exhibiting reduced training and testing times.

PublisherCollege of Science and Engineering, University of Derby
Digital Object Identifier (DOI)
File Access Level
Publication process dates
Deposited21 Dec 2023
Permalink -

Restricted files


  • 28
    total views
  • 5
    total downloads
  • 1
    views this month
  • 0
    downloads this month

Export as

Related outputs

Deep labeller: automatic bounding box generation for synthetic violence detection datasets
Nadeem, M., Kurugollu, F., Saravi, S., Atlam, H. and Franqueira, V. 2023. Deep labeller: automatic bounding box generation for synthetic violence detection datasets. Multimedia Tools and Applications. pp. 1-18.