Skip to main content

Two-Stage Video Violence Detection Framework Using GMFlow and CBAM-Enhanced ResNet3D

Research Authors
Mohamed Mahmoud ,Bilel Yagoub,Mostafa Farouk Senussi,Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Hyun-Soo Kang
Research Date
Research Department
Research Journal
Mathematics
Research Publisher
MDPI
Research Vol
13
Research Website
https://doi.org/10.3390/math13081226
Research Year
2025
Research_Pages
20
Research Abstract

Video violence detection has gained significant attention in recent years due to its applications in surveillance and security. This paper proposes a two-stage framework for detecting violent actions in video sequences. The first stage leverages GMFlow, a pre-trained optical flow network, to capture the temporal motion between consecutive frames, effectively encoding motion dynamics. In the second stage, we integrate these optical flow images with RGB frames and feed them into a CBAM-enhanced ResNet3D network to capture complementary spatiotemporal features. The attention mechanism provided by CBAM enables the network to focus on the most relevant regions in the frames, improving the detection of violent actions. We evaluate the proposed framework on three widely used datasets: Hockey Fight, Crowd Violence, and UBI-Fight. Our experimental results demonstrate superior performance compared to several state-of-the-art methods, achieving AUC scores of 0.963 on UBI-Fight and accuracies of 97.5% and 94.0% on Hockey Fight and Crowd Violence, respectively. The proposed approach effectively combines GMFlow-generated optical flow with deep 3D convolutional networks, providing robust and efficient detection of violence in videos.