Speech localisation in multitalker mixtures is affected by the listener’s expectations about the spatial arrangement of the sound sources. This effect was investigated via experiments with human listeners and a machine system, in which the task was to localise a female-voice target among four spatially distributed male-voice maskers. Two configurations were used: either the masker locations were fixed or the locations varied from trial-to-trial. The machine system employs deep neural networks (DNNs) to learn the relationship between binaural cues and source azimuth, and exploits top-down knowledge about the spectral characteristics of the target source. Performance was evaluated in both anechoic and reverberant conditions. Our experiments show that the machine system outperformed listeners in some conditions. Both the machine system and human listeners were able to make use of prior knowledge about the spatial configuration of the sources.
Share this on
Speech localisation in a multitalker mixture by humans and machines
Added on 24/11/2017