LOCOST: State-Space Models for Long Document Abstractive Summarization - Département MathNum Access content directly
Conference Papers Year : 2024

LOCOST: State-Space Models for Long Document Abstractive Summarization

Abstract

State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of O(L log L), this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles inputs exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.
Fichier principal
Vignette du fichier
LOCOST__EACL_ (2).pdf (3.6 Mo) Télécharger le fichier
Origin : Files produced by the author(s)
licence : CC BY NC ND - Attribution - NonCommercial - NoDerivatives

Dates and versions

hal-04438465 , version 1 (07-02-2024)

Identifiers

  • HAL Id : hal-04438465 , version 1

Cite

Florian Le Bronnec, Song Duong, Mathieu Ravaut, Alexandre Allauzen, Nancy F. Chen, et al.. LOCOST: State-Space Models for Long Document Abstractive Summarization. European Chapter of the Association for Computational Linguistics (EACL), Mar 2024, St. Julian’s, Malta. ⟨hal-04438465⟩
55 View
36 Download

Share

Gmail Facebook X LinkedIn More