Attentive Tracking of Sound Sources


Stimulus Examples

You will hear mixtures of two synthesized voices changing in pitch and timbre (F0, F1, and F2).


Synthetic time-varying voices
lacking distinguishing features
and linguistic content.


Before the mixture you will hear a CUE-- the beginning portion of one of the voices.

After the mixture you will hear a PROBE-- the ending portion of one of the voices.




Did the probe come from the cued voice?

(in these examples, the probe is always from the cued voice)




Experiment 1: Basic experiment.
Two representative stimuli can be heard here.



In Experiment 2, listeners additionally detected a perturbation ('vibrato')
which could appear in either voice (here, in the cued voice):

The perturbation was subtle. If you didn't hear it in the mixture,
listen to the cued voice alone:
In Experiment 3, the voices had speech-like discontinuities:
Experiment 4 parametrically varied the minimum distance between trajectories.

A stimulus from condition 1, with low minimum distance between trajectories:
It may be difficult to hear if the probe came from the cued voice.

A stimulus from condition 8, with high minimum distance between trajectories:
It should be easy to hear that the probe came from the cued voice.
Experiment 5 tested stimuli which only varied in one dimension (F0):








Supplementary


Vocoded stimuli

Voices were vocoded independently prior to summing in the mixture.

You will hear Cue->Mixture->Probe (the probe is correct):

Comparison non-vocoded stimulus (the same trajectories, without vocoding):

Whispered stimulus, demonstrating a similar effect of eliminating source harmonics.
Processed with STRAIGHT (Hideki Kawahara), to remove the voiced source while retaining filter information.



Robustness of 'bouncing' to unvoiced gaps

In these example stimuli, two voices changing in F0 cross each other once, in an X.
The rising voice has a discontinuity centered where the source trajectories would otherwise collide.
Try to follow the continuous, falling voice, through the intersection.





Robustness of 'bouncing' to rate of trajectories.

In these examples, voices changing only in F0 are heard as 'bouncing', at different rates.
You will hear Cue -> Mixture -> Probe (the probe is incorrect):
('bouncing' occurs every at rates)