Geintra

Departamento de electronica Universidad de Alcala

Líneas de investigación

Accede a información sobre la estructura de la actividad investigadora de Geintra.

Trabaja con nosotros

Accede a nuestra oferta actual de becas, tesis doctorales, contratos y trabajos fin de carrera.

Contacta con el grupo

Si desea contactar con nosotros, puede usar varios medios.

    Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates

    TitleTowards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates
    Publication TypeJournal Article
    Año de publicación2018
    AutoresVera-Diaz, J, Pizarro, D, Macias-Guarasa, J
    Idioma de publicaciónEnglish
    JournalSensors
    Volumen18
    Número10
    Páginas1-22
    Fecha de publicación10/2018
    EditorialMDPI
    ISSN1424-8220
    DOI10.3390/s18103418
    Abstract

    This paper presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN). In the proposed solution, the CNN is designed to directly estimate the three-dimensional position of a single acoustic source using the raw audio signal as the input information and avoiding the use of hand-crafted audio features. Given the limited amount of available localization data, we propose, in this paper, a training strategy based on two steps. We first train our network using semi-synthetic data generated from close talk speech recordings. We simulate the time delays and distortion suffered in the signal that propagate from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results, evaluated on a publicly available dataset recorded in a real room, show that this approach is able to produce networks that significantly improve existing localization methods based on SRP-PHAT strategies and also those presented in very recent proposals based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the performance of our CNN method does not show a relevant dependency on the speaker’s gender, nor on the size of the signal window being used.

    DOI10.3390/s18103418