cover image: Realism in virtually supervised learning for acoustic room characterization and sound source localization

Realism in virtually supervised learning for acoustic room characterization and sound source localization

13 Nov 2023

Audio Augmented Reality aims to integrate virtual audio content into the user'sacoustic environment, creating an immersive audio experience. The commercial availability of augmented reality headsets such as Apple Vision Pro has further motivated interestin this research field. To synthesize binaural spatial audio that can recreate the perception of distance, direction, and acoustic cues, the knowledge of specific acoustic parameters of the user's environment is a prerequisite. Acoustic parameters can be divided into two categories: global parameters associated with the room's geometry, reverberation time, and wall materials, and local parameters concerning the location of each sound source. With the help of room acoustic simulators, these parameters are used to simulate room impulse responses. These room impulse responses can then be convolved with dry speech signals to synthesize binaural spatial audio with a perception of realism. However, the estimation of these acoustic parameters is a challenge. Previous research has attempted to address this problem through cumbersome and time-consuming in-situ measurements, which are often impractical. In this thesis, we tackle this challenge by leveraging supervised machine-learning techniques using speech recordings as input. Our primary focus is on cuboid rooms with static acoustic scenarios. In the initial part of our work, we develop a multi-task neural network for room parameter estimation. We then assess its robustness using real-world data. In the second part, we shift our focus towards virtually supervised learning. This approach involves training machine learning models exclusively on simulated data. The rationale behind this strategy is rooted in the limited availability of task-specific real datasets within this domain. To ensure genralization, the training dataset should closely resemble the scenarios encountered in the test datasets. In order to bridge the gap, we improve realism in the open-source room acoustics simulator Pyroomacoustics by implementing an extended image source method. Further, this improved room acoustics simulator is used to train neural networks for the tasks of room parameter estimation and sound source localization. We employ several real test datasets to assess the positive impact brought by training the systems using the improved simulator. Our experiments show that the generalization of the system is improved across both tasks when compared to the systems trained for the same task with less realistic training data. To the best of our knowledge, this is one of the first studies to explore the field of virtually supervised learning for the task of global and local room acoustic parameter estimation.

Authors

Prerak Srivastava

Bibliographic Reference
Prerak Srivastava. Realism in virtually supervised learning for acoustic room characterization and sound source localization. Machine Learning [cs.LG]. Université de Lorraine, 2023. English. ⟨NNT : 2023LORR0184⟩. ⟨tel-04313405⟩
Department
Department of Natural Language Processing & Knowledge Discovery
Funding
INRIA
HAL Collection
['CNRS - Centre national de la recherche scientifique', 'INRIA - Institut National de Recherche en Informatique et en Automatique', 'STAR - Dépôt national des thèses électroniques', 'INRIA Nancy - Grand Est', 'Publications du LORIA', 'TESTALAIN1', 'Université de Lorraine', 'INRIA 2', 'Laboratoire Lorrain de Recherche en Informatique et ses Applications', 'Department of Natural Language Processing & Knowledge Discovery', "Thèses de doctorat soutenues à l'Université de Lorraine"]
HAL Identifier
4313405
Institution
['Institut National de Recherche en Informatique et en Automatique', 'Université de Lorraine']
Laboratory
['Inria Nancy - Grand Est', 'Laboratoire Lorrain de Recherche en Informatique et ses Applications']
Published in
France

Table of Contents