cover image: Multimodal diarization : towards robustness and fairness in the wild

Multimodal diarization : towards robustness and fairness in the wild

4 Dec 2023

Speaker diarization, or the task of automatically determining "who spoke, when?" in an audio or video recording, is one of the pillars of modern conversation analysis systems. On television, the content broadcasted is very diverse and covers about every type of conversation, from calm discussions between two people to impassioned debates and wartime interviews. The archiving and indexing of this content, carried out by the Newsbridge company, requires robust and fair processing methods. In this work, we present two new methods for improving systems' robustness via fusion approaches. The first method focuses on voice activity detection, a necessary pre-processing step for every diarization system. The second is a multimodal approach that takes advantage of the latest advances in natural language processing. We also show that recent advances in diarization systems make the use of speaker diarization realistic, even in critical sectors such as the analysis of large audiovisual archives or the home care of the elderly. Finally, this work shows a new method for evaluating the algorithmic fairness of speaker diarization, with the objective to make its use more responsible.

Authors

Yannis Tevissen

Related Organizations

Bibliographic Reference
Yannis Tevissen. Diarisation multimodale : vers des modèles robustes et justes en contexte réel. Intelligence artificielle [cs.AI]. Institut Polytechnique de Paris, 2023. Français. ⟨NNT : 2023IPPAS014⟩. ⟨tel-04345081⟩
HAL Collection
STAR - Dépôt national des thèses électroniques
HAL Identifier
4345081
Institution
Télécom SudParis
Laboratory
Services répartis, Architectures, MOdélisation, Validation, Administration des Réseaux
Published in
France

Table of Contents