Skip to main navigation Skip to search Skip to main content

A Lightweight Measure of Classification Difficulty from Application Dataset Characteristics

  • Bryan Bo Cao
  • , Abhinav Sharma
  • , Lawrence O’Gorman
  • , Michael Coss
  • , Shubham Jain
  • Nokia
  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Although accuracy and computation benchmarks are widely available to help choose among neural network models, these are usually trained on datasets with many classes, and do not give a good idea of performance for few (<10) classes. The conventional procedure to predict performance involves repeated training and testing on the different models and dataset variations. We propose an efficient cosine similarity-based classification difficulty measure S that is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset. After a single stage of training and testing per model family, relative performance for different datasets and models of the same family can be predicted by comparing difficulty measures – without further training and testing. Our proposed method is verified by extensive experiments on 8 CNN and ViT models and 7 datasets. Results show that S is highly correlated to model accuracy with correlation coefficient |r|=0.796, outperforming the baseline Euclidean distance at |r|=0.66. We show how a practitioner can use this measure to help select an efficient model 6 to 29× faster than through repeated training and testing. We also describe using the measure for an industrial application in which options are identified to select a model 42% smaller than the baseline YOLOv5-nano model, and if class merging from 3 to 2 classes meets requirements, 85% smaller.

Original languageEnglish
Title of host publicationPattern Recognition - 27th International Conference, ICPR 2024, Proceedings
EditorsApostolos Antonacopoulos, Subhasis Chaudhuri, Rama Chellappa, Cheng-Lin Liu, Saumik Bhattacharya, Umapada Pal
PublisherSpringer Science and Business Media Deutschland GmbH
Pages439-455
Number of pages17
ISBN (Print)9783031781681
DOIs
StatePublished - 2025
Event27th International Conference on Pattern Recognition, ICPR 2024 - Kolkata, India
Duration: Dec 1 2024Dec 5 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15305 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Pattern Recognition, ICPR 2024
Country/TerritoryIndia
CityKolkata
Period12/1/2412/5/24

Keywords

  • class similarity
  • classification difficulty
  • efficient models
  • image classification
  • neural network selection

Fingerprint

Dive into the research topics of 'A Lightweight Measure of Classification Difficulty from Application Dataset Characteristics'. Together they form a unique fingerprint.

Cite this