Skip to main navigation Skip to search Skip to main content

REMASKER: IMPUTING TABULAR DATA WITH MASKED AUTOENCODING

  • Zhejiang University
  • Meta

Research output: Contribution to conferencePaperpeer-review

17 Scopus citations

Abstract

We present REMASKER, a new method of imputing missing values in tabular data by extending the masked autoencoding framework. Compared with prior work, REMASKER is both simple - besides the missing values (i.e., naturally masked), we randomly “re-mask” another set of values, optimize the autoencoder by reconstructing this re-masked set, and apply the trained model to predict the missing values; and effective - with extensive evaluation on benchmark datasets, we show that REMASKER performs on par with or outperforms state-of-the-art methods in terms of both imputation fidelity and utility under various missingness settings, while its performance advantage often increases with the ratio of missing data. We further explore theoretical justification for its effectiveness, showing that REMASKER tends to learn missingness-invariant representations of tabular data. Our findings indicate that masked modeling represents a promising direction for further research on tabular data imputation.

Original languageEnglish
StatePublished - 2024
Event12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria
Duration: May 7 2024May 11 2024

Conference

Conference12th International Conference on Learning Representations, ICLR 2024
Country/TerritoryAustria
CityHybrid, Vienna
Period05/7/2405/11/24

Fingerprint

Dive into the research topics of 'REMASKER: IMPUTING TABULAR DATA WITH MASKED AUTOENCODING'. Together they form a unique fingerprint.

Cite this