Skip to main navigation Skip to search Skip to main content

SWAT: Spatial Structure Within and Among Tokens

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Modeling visual data as tokens (i.e., image patches) using attention mechanisms, feed-forward networks or convolutions has been highly effective in recent years. Such methods usually have a common pipeline: a tokenization method, followed by a set of layers/blocks for information mixing, both within and among tokens. When image patches are converted into tokens, they are often flattened, discarding the spatial structure within each patch. As a result, any processing that follows (eg: multi-head self-attention) may fail to recover and/or benefit from such information. In this paper, we argue that models can have significant gains when spatial structure is preserved during tokenization, and is explicitly used during the mixing stage. We propose two key contributions: (1) Structure-aware Tokenization and, (2) Structure-aware Mixing, both of which can be combined with existing models with minimal effort. We introduce a family of models (SWAT), showing improvements over the likes of DeiT, MLP-Mixer and Swin Transformer, across multiple benchmarks including ImageNet classification and ADE20K segmentation. Our code is available at github.com/kkahatapitiya/SWAT.

Original languageEnglish
Title of host publicationProceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
EditorsEdith Elkind
PublisherInternational Joint Conferences on Artificial Intelligence
Pages956-964
Number of pages9
ISBN (Electronic)9781956792034
DOIs
StatePublished - 2023
Event32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 - Macao, China
Duration: Aug 19 2023Aug 25 2023

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2023-August
ISSN (Print)1045-0823

Conference

Conference32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
Country/TerritoryChina
CityMacao
Period08/19/2308/25/23

Fingerprint

Dive into the research topics of 'SWAT: Spatial Structure Within and Among Tokens'. Together they form a unique fingerprint.

Cite this