TY - GEN
T1 - Extended Abstract
T2 - 35th IEEE Intelligent Vehicles Symposium, IV 2024
AU - Karim, Muhammad Monjurul
AU - Yin, Zhaozheng
AU - Qin, Ruwen
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Detecting dangerous traffic agents in videos captured by a dashboard camera (dashcam) mounted on vehicles is essential to ensure safe navigation in complex driving environments. Crash-related videos are corner cases in driving-related big data, and pre-crash processes are transient and complex. Besides, risky and non-risky traffic agents can be similar in their appearance. These make the localization of risky traffic agents in driving videos particularly challenging. In addressing the challenges, this paper proposes an attention-guided multistream feature fusion network (AM-Net) to localize dangerous traffic agents from dashcam videos ahead of potential accidents. Two Gated Recurrent Unit (GRU) networks use object bounding box and optical flow features extracted from consecutive video frames to capture spatio-temporal cues for distinguishing risky traffic agents. An attention module, coupled with the GRUs, learns to identify traffic agents that are relevant to a crash. Fusing the two streams of global and object-level features, AM-Net predicts the riskiness scores of traffic agents in the video. This paper also introduces a new benchmark dataset called Risky Object Localization (ROL), which contains spatial, temporal, and categorical annotations of the crash, object, and scene-level attributes. The proposed AM-Net achieves a promising performance of 85.59% AUC on the ROL dataset. Additionally, the AM-Net outperforms the current state-of-the-art for video anomaly detection by 3.5% AUC on the public DoTA dataset. A thorough ablation study further reveals AM-Net's merits by assessing the contributions of its functional constituents.
AB - Detecting dangerous traffic agents in videos captured by a dashboard camera (dashcam) mounted on vehicles is essential to ensure safe navigation in complex driving environments. Crash-related videos are corner cases in driving-related big data, and pre-crash processes are transient and complex. Besides, risky and non-risky traffic agents can be similar in their appearance. These make the localization of risky traffic agents in driving videos particularly challenging. In addressing the challenges, this paper proposes an attention-guided multistream feature fusion network (AM-Net) to localize dangerous traffic agents from dashcam videos ahead of potential accidents. Two Gated Recurrent Unit (GRU) networks use object bounding box and optical flow features extracted from consecutive video frames to capture spatio-temporal cues for distinguishing risky traffic agents. An attention module, coupled with the GRUs, learns to identify traffic agents that are relevant to a crash. Fusing the two streams of global and object-level features, AM-Net predicts the riskiness scores of traffic agents in the video. This paper also introduces a new benchmark dataset called Risky Object Localization (ROL), which contains spatial, temporal, and categorical annotations of the crash, object, and scene-level attributes. The proposed AM-Net achieves a promising performance of 85.59% AUC on the ROL dataset. Additionally, the AM-Net outperforms the current state-of-the-art for video anomaly detection by 3.5% AUC on the public DoTA dataset. A thorough ablation study further reveals AM-Net's merits by assessing the contributions of its functional constituents.
KW - advanced driving assistance systems
KW - attention
KW - crash early prediction
KW - deep learning
KW - multi-modal
KW - risky object localization
UR - https://www.scopus.com/pages/publications/85199805005
U2 - 10.1109/IV55156.2024.10588532
DO - 10.1109/IV55156.2024.10588532
M3 - Conference contribution
AN - SCOPUS:85199805005
T3 - IEEE Intelligent Vehicles Symposium, Proceedings
SP - 3150
BT - 35th IEEE Intelligent Vehicles Symposium, IV 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 2 June 2024 through 5 June 2024
ER -