Ambient sound provides supervision for visual learning

Owens,  A; Wu,  J; McDermott,  J H; Freeman,  W T; Torralba,  A

Abstract: Ambient sound provides supervision for visual learning

Ambient sound provides supervision for visual learning

A Owens, J Wu, J H McDermott, W T Freeman and A Torralba.

Published in European Conf. Computer Vision, pp. 801--816, Aug 2016.

Download:

The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.

Listing of all publications