TY - GEN
T1 - Residualized Similarity for Faithfully Explainable Authorship Verification
AU - Zeng, Peter
AU - Alipoormolabashi, Pegah
AU - Mun, Jihu
AU - Dey, Gourab
AU - Soni, Nikita
AU - Balasubramanian, Niranjan
AU - Rambow, Owen
AU - Schwartz, H. Andrew
N1 - Publisher Copyright:
©2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Responsible use of authorship verification (AV) systems requires not only high-accuracy but also interpretable solutions. Specifically, for systems to be deployed in contexts where decisions have real-world consequences, their predictions must be explainable through interpretable features that can be traced to the original text. Neural methods achieve high accuracies, but their representations lack direct interpretability. Furthermore, LLM predictions cannot be explained faithfully – if there is an explanation given for a prediction, it doesn’t represent the reasoning process behind the model’s prediction. To address this gap, we introduce residualized similarity (RS), 1 a novel method that supplements systems using interpretable features with a neural network to improve their performance while maintaining interpretability. Authorship verification is fundamentally a similarity task, where the goal is to measure how likely two documents are to be written by the same author. The key idea is to use a neural network to predict a residual similarity, i.e. the error in the similarity predicted by the interpretable system. Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.
AB - Responsible use of authorship verification (AV) systems requires not only high-accuracy but also interpretable solutions. Specifically, for systems to be deployed in contexts where decisions have real-world consequences, their predictions must be explainable through interpretable features that can be traced to the original text. Neural methods achieve high accuracies, but their representations lack direct interpretability. Furthermore, LLM predictions cannot be explained faithfully – if there is an explanation given for a prediction, it doesn’t represent the reasoning process behind the model’s prediction. To address this gap, we introduce residualized similarity (RS), 1 a novel method that supplements systems using interpretable features with a neural network to improve their performance while maintaining interpretability. Authorship verification is fundamentally a similarity task, where the goal is to measure how likely two documents are to be written by the same author. The key idea is to use a neural network to predict a residual similarity, i.e. the error in the similarity predicted by the interpretable system. Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.
UR - https://www.scopus.com/pages/publications/105028963773
U2 - 10.18653/v1/2025.findings-emnlp.856
DO - 10.18653/v1/2025.findings-emnlp.856
M3 - Conference contribution
AN - SCOPUS:105028963773
T3 - EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
SP - 15824
EP - 15837
BT - EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
A2 - Christodoulopoulos, Christos
A2 - Chakraborty, Tanmoy
A2 - Rose, Carolyn
A2 - Peng, Violet
PB - Association for Computational Linguistics (ACL)
T2 - 30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Y2 - 4 November 2025 through 9 November 2025
ER -