Perception of social speaker characteristics in synthetic speech <BR>(3 minutes introduction)

Perception of social speaker characteristics in synthetic speech
(3 minutes introduction)

Sai Sirisha Rallabandi (Technische Universität Berlin, Germany), Abhinav Bharadwaj (Technische Universität Berlin, Germany), Babak Naderi (Technische Universität Berlin, Germany), Sebastian Möller (Technische Universität Berlin, Germany)

With the improved computational abilities, the usage of chatbots and conversational agents has become more prevalent. Therefore, it is essential that these agents exhibit certain social speaker characteristics in the generated speech. In this paper, we study the perception of such speaker characteristics in two commercial Text-to-Speech (TTS) systems, Amazon Polly and Google TTS. We carried out a 15-item semantic differential scaling test. The factor analysis provided us with three underlying dimensions that can be perceived from synthetic speech, warmth, competence, and extraversion. Our results show that we can perceive both interpersonal relationships and also personality traits from synthetic voices. Additionally, we observed that the female participants perceived male voices to be more responsible, energetic, relaxed, and enthusiastic. In comparison, male participants found female voices to be more reliable, accessible, and confident. A discussion on the comparison of our results with that of the studies on natural speech is also provided.

InterSpeech 2021

Perception of social speaker characteristics in synthetic speech
(3 minutes introduction)

Search in Audio

Related Recordings

Comparing Speech Enhancement Techniques for Voice Adaptation-Based Speech Synthesis
(3 minutes introduction)

Hi-Fi Multi-Speaker English TTS Dataset
(3 minutes introduction)

InterSpeech 2021

Perception of social speaker characteristics in synthetic speech (3 minutes introduction)

Search in Audio

Related Recordings

Comparing Speech Enhancement Techniques for Voice Adaptation-Based Speech Synthesis (3 minutes introduction)

Hi-Fi Multi-Speaker English TTS Dataset (3 minutes introduction)

Perception of social speaker characteristics in synthetic speech
(3 minutes introduction)

Comparing Speech Enhancement Techniques for Voice Adaptation-Based Speech Synthesis
(3 minutes introduction)

Hi-Fi Multi-Speaker English TTS Dataset
(3 minutes introduction)