Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers
(Oral presentation)
Marvin Borsdorf (Universität Bremen, Germany), Chenglin Xu (NUS, Singapore), Haizhou Li (NUS, Singapore), Tanja Schultz (Universität Bremen, Germany) |
---|
Speaker extraction has been studied mostly for the scenarios where a target speaker is present in a two or more talkers mixture. Such scenarios do not adequately reflect everyday conversations. For example, a target speaker can be the only active talker, be quiet for a while, or leave the conversation, that means the target speaker is absent from the mixture. Traditional speaker extraction models fail in these scenarios. We propose a novel speaker extraction approach to handle speech mixtures with one or two talkers in which the target speaker can either be present or absent. First, we formulate four speaker extraction conditions to cover the typical scenarios of everyday conversations with one and two talkers. Second, we introduce a joint training scheme with one unified loss function that works for all four conditions. We show that only a small amount of data is required to adapt the model to work well in the four conditions.