Abstract
The increasing prominence of scientific discourse on social media platforms presents both unprecedented opportunities for public engagement and significant risks of misinformation. While scientific claims, references to publications, and mentions of research entities proliferate rapidly, current platforms lack robust mechanisms to validate their veracity or trace implicit sources. Manual identification and sourcing of such content is impractical at scale, and although computational methods exist for generic fact-checking or citation retrieval, they often fail to address the unique challenges of noisy, abbreviated social media language – particularly the detection of nuanced scientific discourse and the retrieval of publications from implicit, non-URL references. In this paper, we propose a unified framework tackling two critical tasks: (1) detection of scientific web discourse, where we identify tweets containing scientific claims, references or research entities, using a combination of natural language augmentation and supervised learning; and (2) source retrieval for scientific claims, employing a two-stage dense retrieval and re-ranking pipeline to link implicit mentions of sources to their actual publications from candidate pools. Our multi-stage architecture first filters and classifies scientific content, then prioritizes and resolves latent citations. Evaluations on a curated dataset provided by the CLEF-2025 CheckThat! Lab demonstrate the effectiveness of our approach, achieving significant improvements across both tasks. This work provides essential tools for automating scientific credibility assessment and aiding the verification of scientific information in online ecosystems.
| Original language | English |
|---|---|
| Pages (from-to) | 1256-1264 |
| Number of pages | 9 |
| Journal | CEUR Workshop Proceedings |
| Volume | 4038 |
| State | Published - 2025 |
| Event | 26th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2025 - Madrid, Spain Duration: Sep 9 2025 → Sep 12 2025 |
Keywords
- Bi-encoder
- Cross-encoder
- Data Augmentation
- Dense Retrieval
- Large language model
- Re-ranking
- Transformer
Fingerprint
Dive into the research topics of 'SCIRE at CheckThat! 2025: Bridging Social Media, Scientific Discourse, and Scientific Literature'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver