Skip to main navigation Skip to search Skip to main content

Leveraging Registers in Vision Transformers for Robust Adaptation

  • Srikar Yellapragada
  • , Kowshik Thopalli
  • , Vivek Narayanaswamy
  • , Wesam Sakla
  • , Yang Liu
  • , Yamen Mubarka
  • , Dimitris Samaras
  • , Jayaraman J. Thiagarajan
  • Stony Brook University
  • Lawrence Livermore National Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Vision Transformers (ViTs) have shown success across a variety of tasks due to their ability to capture global image representations. Recent studies have identified the existence of high-norm tokens in ViTs, which can interfere with unsupervised object discovery. To address this, the use of”registers” which are additional tokens that isolate high norm patch tokens while capturing global image-level information has been proposed. While registers have been studied extensively for object discovery, their generalization properties particularly in out-of-distribution (OOD) scenarios, remains underexplored. In this paper, we examine the utility of register token embeddings in providing additional features for improving generalization and anomaly rejection. To that end, we propose a simple method that combines the special CLS token embedding commonly employed in ViTs with the average-pooled register embeddings to create feature representations which are subsequently used for training a downstream classifier. We find that this enhances OOD generalization and anomaly rejection, while maintaining in-distribution (ID) performance. Extensive experiments across multiple ViT backbones trained with and without registers reveal consistent improvements of 2-4% in top-1 OOD accuracy and a 2-3% reduction in false positive rates for anomaly detection. Importantly, these gains are achieved without additional computational overhead.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368741
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: Apr 6 2025Apr 11 2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period04/6/2504/11/25

Keywords

  • Anomaly Rejection
  • OOD Generalization
  • Registers
  • Robustness
  • Vision Transformer

Fingerprint

Dive into the research topics of 'Leveraging Registers in Vision Transformers for Robust Adaptation'. Together they form a unique fingerprint.

Cite this