Example audio from subjective evaluation experiments, in which participants were asked to rate the "naturalness" and "cleanness" of speech clips.
Audio label |
Description of denoising model used to process input signal |
Input Signal |
Noisy input speech signal consisting of clean speech superimposed on background noise (no denoising model) |
A123 |
Wave-U-Net trained to minimize deep feature losses from three AudioSet-trained DNNs (model producing highest subjective ratings) |
A1+W1 |
Wave-U-Net trained to minimize deep feature losses from one AudioSet-trained DNN and one Word-trained DNN |
Random123 |
Wave-U-Net trained to minimize deep feature losses from three untrained DNNs (random weights) |
Baseline UNet |
Wave-U-Net trained to reconstruct clean speech waveform |
Baseline WaveNet |
WaveNet trained to reconstruct clean speech waveform |