Many natural sounds, such as those arising
from rain, fire, or a swamp full of insects, are produced by a
concurrence of many similar acoustic events that overlap in time. We
refer to these sounds as "auditory textures", analogous to the visual
textures that have been studied for decades. The defining
characteristic of textures is stationarity - their properties remain
constant over moderate timescales. This suggests that texture
properties could be captured by summary statistics that are
time-averages of acoustic measurements. We
have explored the hypothesis that the auditory system represents
textures in this way, with statistics of the measurements made in the
peripheral auditory system (joint work with Eero Simoncelli).
The key idea: if a set of
statistics underlie texture perception, one should be able to
synthesize textures that sound like real-world textures by
generating
signals that match their statistics. We designed a texture synthesis algorithm to test this idea. Although matching
statistics of individual auditory filters (replicating the power
spectrum of the original signal, among other things) is inadequate to
produce compelling synthesis, statistics of
intermediate complexity, capturing simple dependencies between filters,
can produce compelling synthetic
examples of many natural sound textures. These pages contain some
examples.
There are four examples of each sound: the original from
which the statistics were measured, a synthetic version matching only
the spectrum of the original sound, a synthetic version matching the
marginal statistics of "cochlear" filter envelopes (producing the
same spectrum, and sparsity, as the original recording), and a
synthetic version matching a larger set of statistics
(including correlations between filters).
Matching the spectrum alone produces signals that sound like noise, and that rarely resemble the original sound.
Matching the marginal statistics (yielding sounds with sparsity comparable to the originals) generally produces
realistic
synthesis only for water, and the results often
sound watery irrespective of whether the original sound was water or
not. We believe this is because the salient properties of
water are produced by independent bandpass events, and are thus easily
captured by marginal statistics of bandpass filters. Most other sounds
have statistical dependencies that are more complex. With the larger
set of statistics, however, we can synthesize many different natural
sound textures.
The synthesis algorithm, along with experiments exploring the perception of textures, is described in this paper (2011, Neuron). An earlier
version of this work is described in this paper (2009, WASPAA). The perception of texture was further explored in this paper (2013, Nature Neuroscience).
Headphones are recommended.
Quicktime must be installed for the sounds to play.
**Please note - sound files make take a few seconds to load. If not all sound load, please try refreshing the page.**
Sound Type | Original Recording | Synthetic - Spectrum | Synthetic - Marginals | Synthetic - Full Set of Statistics |
Stream | ||||
Bubbling Water | ||||
Wind | ||||
Fire | ||||
Waves |