TY - GEN
T1 - FALCON
T2 - Workshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024
AU - Wang, Zuhui
AU - Sajeev, Sandra
AU - Mittal, Gaurav
AU - Hall, Matthew
AU - Yu, Ye
AU - Yin, Zhaozheng
AU - Chen, Mei
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Content moderation is the task of filtering inappropriate content (e.g., rude, hateful, or toxic posts) on online platforms. Deep learning models have been developed to address this task, however they tend to be prone to making unfair decisions for underrepresented groups such as racial minorities. Most popular methods for improving fairness only focus on a single group and single class bias, while multi-group and multi-class biases are prevalent and challenging in content moderation. In this paper, we present a novel framework, Fair Active Learning for CONtent moderation (FALCON), that helps mitigate multi-group and multi-class biases simultaneously while maintaining performance. We present a novel group-aware sample selection algorithm to actively select a subset of the entire dataset for training, and novel augmented uncertainty information that improves the query sample selection strategy by considering group fairness levels. We validate FALCON using multiple fairness evaluation metrics on three public datasets, including the Jigsaw Unintended Bias dataset. Our results show that FALCON maintains comparable performance to several bias mitigation methods while obtaining higher group fairness across multiple axes and datasets, as measured by a 22.5% improvement in demographic parity difference and an 8.4% improvement for equalized odds on average. Experiments on the Amazon Review dataset demonstrate the general applicability of FALCON beyond content moderation datasets. Warning: some content in this paper may be harmful, racist, and inappropriate.
AB - Content moderation is the task of filtering inappropriate content (e.g., rude, hateful, or toxic posts) on online platforms. Deep learning models have been developed to address this task, however they tend to be prone to making unfair decisions for underrepresented groups such as racial minorities. Most popular methods for improving fairness only focus on a single group and single class bias, while multi-group and multi-class biases are prevalent and challenging in content moderation. In this paper, we present a novel framework, Fair Active Learning for CONtent moderation (FALCON), that helps mitigate multi-group and multi-class biases simultaneously while maintaining performance. We present a novel group-aware sample selection algorithm to actively select a subset of the entire dataset for training, and novel augmented uncertainty information that improves the query sample selection strategy by considering group fairness levels. We validate FALCON using multiple fairness evaluation metrics on three public datasets, including the Jigsaw Unintended Bias dataset. Our results show that FALCON maintains comparable performance to several bias mitigation methods while obtaining higher group fairness across multiple axes and datasets, as measured by a 22.5% improvement in demographic parity difference and an 8.4% improvement for equalized odds on average. Experiments on the Amazon Review dataset demonstrate the general applicability of FALCON beyond content moderation datasets. Warning: some content in this paper may be harmful, racist, and inappropriate.
KW - Active Learning
KW - Content Moderation
KW - Fairness
UR - https://www.scopus.com/pages/publications/105007790029
U2 - 10.1007/978-3-031-92648-8_1
DO - 10.1007/978-3-031-92648-8_1
M3 - Conference contribution
AN - SCOPUS:105007790029
SN - 9783031926471
T3 - Lecture Notes in Computer Science
SP - 1
EP - 17
BT - Computer Vision – ECCV 2024 Workshops, Proceedings
A2 - Del Bue, Alessio
A2 - Canton, Cristian
A2 - Pont-Tuset, Jordi
A2 - Tommasi, Tatiana
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 29 September 2024 through 4 October 2024
ER -