LOCOST: State-Space Models for Long Document Abstractive Summarization

State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of O(L log L), this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles inputs exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

Mots clés

state-space models summarization

Domaines

Informatique et langage [cs.CL] Apprentissage [cs.LG]

Fichier principal

LOCOST__EACL_ (2).pdf (3.6 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
licence	Paternité - Pas d'utilisation commerciale - Pas de modification

Song Duong : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04438465

Soumis le : mercredi 7 février 2024-10:34:39

Dernière modification le : mercredi 26 juin 2024-03:26:27

Archivage à long terme le : mercredi 8 mai 2024-18:08:53

Dates et versions

hal-04438465 , version 1 (07-02-2024)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

HAL Id : hal-04438465 , version 1

Citer

Florian Le Bronnec, Song Duong, Mathieu Ravaut, Alexandre Allauzen, Nancy F. Chen, et al.. LOCOST: State-Space Models for Long Document Abstractive Summarization. European Chapter of the Association for Computational Linguistics (EACL), Mar 2024, St. Julian’s, Malta. ⟨hal-04438465⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

AGROPARISTECH CNRS UNIV-DAUPHINE ISIR LAMSADE-DAUPHINE MIA-PARIS GENCI PSL UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE SU-SCIENCES INRAE ANR GS-MATHEMATIQUES GS-COMPUTER-SCIENCE ISIR_MLIA MATHNUM PEPR_IA

93 Consultations

84 Téléchargements