Many natural sounds, such as those arising
from rain, fire, or a swamp full of insects, are produced by a
concurrence of many similar acoustic events that overlap in time. We
refer to these sounds as "auditory textures", analogous to the visual
textures that have been studied for decades. The defining
characteristic of textures is stationarity - their properties remain
constant over moderate timescales. This suggests that texture
properties could be captured by summary statistics that are
time-averages of acoustic measurements. We
have explored the hypothesis that the auditory system represents
textures in this way, with statistics of the measurements made in the
peripheral auditory system (joint work with Eero Simoncelli).
The key idea: if a set of statistics underlie texture perception, one should be able to synthesize textures that sound like real-world textures by generating signals that match their statistics. We designed a texture synthesis algorithm to test this idea. Although matching statistics of individual auditory filters (replicating the power spectrum of the original signal, among other things) is inadequate to produce compelling synthesis, statistics of intermediate complexity, capturing simple dependencies between filters, can produce compelling synthetic examples of many natural sound textures. These pages contain some examples.
There are four examples of each sound: the original from which the statistics were measured, a synthetic version matching only the spectrum of the original sound, a synthetic version matching the marginal statistics of "cochlear" filter envelopes (producing the same spectrum, and sparsity, as the original recording), and a synthetic version matching a larger set of statistics (including correlations between filters).
Matching the spectrum alone produces signals that sound like noise, and that rarely resemble the original sound. Matching the marginal statistics (yielding sounds with sparsity comparable to the originals) generally produces realistic synthesis only for water, and the results often sound watery irrespective of whether the original sound was water or not. We believe this is because the salient properties of water are produced by independent bandpass events, and are thus easily captured by marginal statistics of bandpass filters. Most other sounds have statistical dependencies that are more complex. With the larger set of statistics, however, we can synthesize many different natural sound textures.
The synthesis algorithm, along with experiments exploring the perception of textures, is described in this paper (2011, Neuron). An earlier version of this work is described in this paper (2009, WASPAA). The perception of texture was further explored in this paper (2013, Nature Neuroscience).
Headphones are recommended. Quicktime must be installed for the sounds to play.
**Please note - sound files make take a few seconds to load. If not all sound load, please try refreshing the page.**
|Sound Type||Original Recording||Synthetic - Spectrum||Synthetic - Marginals||Synthetic - Full Set of Statistics|