TY - GEN
T1 - PropTest
T2 - 2024 Findings of the Association for Computational Linguistics, EMNLP 2024
AU - Koo, Jaywon
AU - Yang, Ziyan
AU - Cascante-Bonilla, Paola
AU - Ray, Baishakhi
AU - Ordonez, Vicente
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Visual Programming has recently emerged as an alternative to end-to-end black-box visual reasoning models.This type of method leverages Large Language Models (LLMs) to generate the source code for an executable computer program that solves a given problem.This strategy has the advantage of offering an interpretable reasoning path and does not require finetuning a model with task-specific data.We propose PropTest, a general strategy that improves visual programming by further using an LLM to generate code that tests for visual properties in an initial round of proposed solutions.Our method generates tests for data-type consistency, output syntax, and semantic properties.PropTest achieves comparable results to state-of-the-art methods while using publicly available LLMs.This is demonstrated across different benchmarks on visual question answering and referring expression comprehension.Particularly, PropTest improves ViperGPT by obtaining 46.1% accuracy (+6.0%) on GQA using Llama3-8B and 59.5% (+8.1%) on RefCOCO+ using CodeLlama-34B.
AB - Visual Programming has recently emerged as an alternative to end-to-end black-box visual reasoning models.This type of method leverages Large Language Models (LLMs) to generate the source code for an executable computer program that solves a given problem.This strategy has the advantage of offering an interpretable reasoning path and does not require finetuning a model with task-specific data.We propose PropTest, a general strategy that improves visual programming by further using an LLM to generate code that tests for visual properties in an initial round of proposed solutions.Our method generates tests for data-type consistency, output syntax, and semantic properties.PropTest achieves comparable results to state-of-the-art methods while using publicly available LLMs.This is demonstrated across different benchmarks on visual question answering and referring expression comprehension.Particularly, PropTest improves ViperGPT by obtaining 46.1% accuracy (+6.0%) on GQA using Llama3-8B and 59.5% (+8.1%) on RefCOCO+ using CodeLlama-34B.
UR - https://www.scopus.com/pages/publications/85210488538
U2 - 10.18653/v1/2024.findings-emnlp.483
DO - 10.18653/v1/2024.findings-emnlp.483
M3 - Conference contribution
AN - SCOPUS:85210488538
T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
SP - 8241
EP - 8256
BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
A2 - Al-Onaizan, Yaser
A2 - Bansal, Mohit
A2 - Chen, Yun-Nung
PB - Association for Computational Linguistics (ACL)
Y2 - 12 November 2024 through 16 November 2024
ER -