Diffusion-based Unsupervised Audio-visual Speech Enhancement

This paper proposes a new unsupervised audiovisual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF) noise model. First, the diffusion model is pre-trained on clean speech conditioned on corresponding video data to simulate the speech generative distribution. This pre-trained model is then paired with the NMF-based noise model to iteratively estimate clean speech. Specifically, a diffusion-based posterior sampling approach is implemented within the reverse diffusion process, where after each iteration, a speech estimate is obtained and used to update the noise parameters. Experimental results confirm that the proposed AVSE approach not only outperforms its audio-only counterpart but also generalizes better than a recent supervisedgenerative AVSE method. Additionally, the new inference algorithm offers a better balance between inference speed and performance compared to the previous diffusion-based method.

Mots clés

unsupervised learning audio-visual speech enhancement diffusion models posterior sampling

Domaines

Intelligence artificielle [cs.AI] Vision par ordinateur et reconnaissance de formes [cs.CV] Apprentissage [cs.LG] Traitement du signal et de l'image [eess.SP]

Fichier principal

cmxyyzzrpvkmyrykwgbnjftrchwgjsgk.pdf (394.02 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Mostafa SADEGHI : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04718254

Soumis le : jeudi 3 octobre 2024-11:47:05

Dernière modification le : mardi 5 novembre 2024-11:16:02

Dates et versions

hal-04718254 , version 1 (03-10-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04718254 , version 1
ARXIV : 2410.05301

Citer

Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel, Xavier Alameda-Pineda. Diffusion-based Unsupervised Audio-visual Speech Enhancement. 2024. ⟨hal-04718254⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD ANR

42 Consultations

33 Téléchargements