Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription

Can Cui; Imran Ahamad Sheikh; Mostafa Sadeghi; Emmanuel Vincent

Pré-Publication, Document De Travail Année : 2023

Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription

(1) , (2) , (1) , (1)

1
2

Can Cui

Fonction : Auteur
PersonId : 753642
IdHAL : can-cui
ORCID : 0000-0003-4332-1851

Speech Modeling for Facilitating Oral-Based Communication

Imran Ahamad Sheikh

Fonction : Auteur
PersonId : 1000772

Vivoka

Mostafa Sadeghi

Fonction : Auteur
PersonId : 752828
IdHAL : msadeghi
ORCID : 0000-0002-0272-8017

Speech Modeling for Facilitating Oral-Based Communication

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Résumé

Distant-microphone meeting transcription is a challenging task. State-of-the-art end-to-end speaker-attributed automatic speech recognition (SA-ASR) architectures lack a multichannel noise and reverberation reduction front-end, which limits their performance. In this paper, we introduce a joint beamforming and SA-ASR approach for real meeting transcription. We first describe a data alignment and augmentation method to pretrain a neural beamformer on real meeting data. We then compare fixed, hybrid, and fully neural beamformers as front-ends to the SA-ASR model. Finally, we jointly optimize the fully neural beamformer and the SA-ASR model. Experiments on the real AMI corpus show that, while state-of-the-art multi-frame cross-channel attention based channel fusion fails to improve ASR performance, fine-tuning SA-ASR on the fixed beamformer's output and jointly fine-tuning SA-ASR with the neural beamformer reduce the word error rate by 8% and 9% relative, respectively.

Domaines

Informatique et langage [cs.CL]

Fichier principal

Template_Blind.pdf (1.07 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Can Cui : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04755558

Soumis le : mardi 29 octobre 2024-05:56:25

Dernière modification le : mercredi 30 octobre 2024-03:13:06

Dates et versions

hal-04755558 , version 1 (29-10-2024)

Identifiants

HAL Id : hal-04755558 , version 1

Citer

Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent. Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription. 2023. ⟨hal-04755558⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 GENCI LORIA LORIA-NLPKD

0 Consultations

0 Téléchargements

Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager