Abstract
In natural language generation, language models, particularly those based on decoder-only architectures as in popular Large Language Models (LLMs), have demonstrated impressive performance across a wide range of tasks. However, encoder-decoder architectures remain highly effective for tasks involving non-text data, such as images and time-series data. The decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, this might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding for improved encoder-decoder language models, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source inputs. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and improve the performance of sequence learning with deep representations on diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on benchmark datasets, including a low-resource machine translation dataset and low-resource medical report generation datasets.
| Original language | English |
|---|---|
| Article number | 106 |
| Journal | ACM Transactions on Knowledge Discovery from Data |
| Volume | 19 |
| Issue number | 5 |
| DOIs | |
| State | Published - Jun 6 2025 |
Keywords
- Attention Mechanism
- Deep Representations
- Large Language Models
- Natural Language Generation
- Sequence Learning
Fingerprint
Dive into the research topics of 'Rethinking Natural Language Generation with Layer-Wise Multi-View Decoding'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver