Geintra

Departamento de electronica Universidad de Alcala

Líneas de investigación

Accede a información sobre la estructura de la actividad investigadora de Geintra.

Trabaja con nosotros

Accede a nuestra oferta actual de becas, tesis doctorales, contratos y trabajos fin de carrera.

Contacta con el grupo

Si desea contactar con nosotros, puede usar varios medios.

    Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

    TítuloAnalysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech
    Tipo de publicaciónJournal Article
    Año de publicación2010
    AutoresBarra-Chicote, R, Yamagishi, J, King, S, Montero, JM, Macias-Guarasa, J
    Idioma de publicaciónEnglish
    Revista académicaSpeech Communication
    Volumen52
    Número5
    Páginas394-404
    Fecha de publicación05/2010
    EditorialElsevier
    Rank in category38/94
    JCR CategoryCOMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
    Palabras claveEmotional speech synthesis, HMM-based synthesis, unit selection synthesis
    JCR Impact Factor1.229
    ISSN0167-6393
    DOI10.1016/j.specom.2009.12.007
    Resumen

    We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recorded – happiness, sadness, anger, surprise, fear, disgust. For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion.

    Our analysis shows that, although the HMM method produces significantly better neutral speech, the two methods produce emotional speech of similar quality, except for emotions having context-dependent prosodic patterns. Whilst synthetic speech produced using the unit selection method has better emotional strength scores than the HMM-based method, the HMM-based method has the ability to manipulate the emotional strength. For emotions that are characterized by both spectral and prosodic components, synthetic speech using unit selection methods was more accurately identified by listeners. For emotions mainly characterized by prosodic components, HMM-based synthetic speech was more accurately identified. This finding differs from previous results regarding listener judgements of speaker similarity for neutral speech. We conclude that unit selection methods require improvements to prosodic modeling and that HMM-based methods require improvements to spectral modeling for emotional speech. Certain emotions cannot be reproduced well by either method.

    AdjuntoTamaño
    2010-FinalPaperBarraSpecom.pdf452.68 KB