Speech Enhancement Examples

This is the companion webpage for the paper:

Zhiyao Duan, Gautham J. Mysore and Paris Smaragdis, Speech enhancement by online non-negative spectrogram decomposition in non-stationary noise environments, in Proc. Interspeech, 2012. <pdf> <slides>


Overall comparisons

As describied in the paper, we carried out experiments using the NOIZEUS speech dataset [1]. We collected noise files through recording or downloading from the internet.

For the proposed method, we varied the the Dirichlet prior ramp length tau from 0 (no prior at all) to 20 (always a prior throughout the iterations). But here we only show the resuls for tau=10.

We compare the proposed method with four categories of conventional speech enhancement methods, which are all online algorithms. We use P.C. Loizou's implementations [1]:

We also compare with an offline spectrogram decomposition method:

All these above methods train their noise models using the same noise-only excerpts, which are unseen in testing mixtures.

We evaluate speech enhancement results using two metrics:

Noise: computer keyboard (training excerpt) Online methods (PESQ/SDR) Offline methods
SNR (dB) Noisy speech Clean speech Proposed (tau=10) MB Wiener-as log-MMSE KLT PLCA
-10 WAV WAV WAV (1.65/1.13) WAV (1.17/-5.52) WAV (0.80/-7.42) WAV (0.88/-7.14) WAV (0.75/-7.31) WAV (1.40/0.67)
-5 WAV WAV WAV (1.92/5.35) WAV (1.42/-1.34) WAV (1.06/-3.01) WAV (1.14/-2.71) WAV (0.96/-2.93) WAV (2.05/5.03)
0 WAV WAV WAV (2.14/9.62) WAV (1.41/1.82) WAV (1.03/0.27) WAV (1.13/0.70) WAV (0.93/0.18) WAV (2.20/9.52)
5 WAV WAV WAV (2.39/11.38) WAV (1.77/6.40) WAV (1.41/5.24) WAV (1.57/5.67) WAV (1.25/5.11) WAV (2.50/9.85)
10 WAV WAV WAV (2.70/12.37) WAV (2.14/10.93) WAV (1.83/10.26) WAV (1.96/10.77) WAV (1.72/10.09) WAV (3.01/10.94)

Noise: casino (training excerpt) Online methods (PESQ/SDR) Offline methods
SNR (dB) Noisy speech Clean speech Proposed (tau=10) MB Wiener-as log-MMSE KLT PLCA
-10 WAV WAV WAV (1.23/-6.79) WAV (0.80/-9.95) WAV (0.74/-10.10) WAV (0.78/-10.08) WAV (0.76/-10.14) WAV (2.09/-8.37)
-5 WAV WAV WAV (1.62/0.33) WAV (1.39/-5.84) WAV (1.40/-4.89) WAV (1.47/-4.28) WAV (1.40/-4.07) WAV (1.61/-1.12)
0 WAV WAV WAV (1.51/3.81) WAV (1.48/0.04) WAV (1.50/0.77) WAV (1.54/1.73) WAV (1.39/1.56) WAV (1.49/3.80)
5 WAV WAV WAV (1.88/6.33) WAV (1.80/5.60) WAV (1.84/6.53) WAV (1.89/6.98) WAV (1.64/6.84) WAV (1.88/6.27)
10 WAV WAV WAV (1.88/5.60) WAV (2.38/11.95) WAV (2.23/12.74) WAV (2.43/13.10) WAV (2.44/13.85) WAV (2.06/8.49)

Noise reduction vs. speech distortion

Now we show the tradeoff between noise reduction and speech distortion, introduced by the Dirichlet prior ramp length parameter tau.

We use three measures from [8]:

Noise: birds (training excerpt) Proposed method with different prior ramp length (SDR/SIR/SAR)
SNR (dB) Noisy speech Clean speech tau=0 tau=1 tau=5 tau=10 tau=15 tau=20
-10 WAV WAV WAV (-2.55/1.79/1.65) WAV (-1.68/5.21/0.46) WAV (0.31/9.13/1.43) WAV (0.48/10.12/1.39) WAV (1.14/12.06/1.77) WAV (1.65/13.06/2.18)
-5 WAV WAV WAV (1.31/4.86/5.08) WAV (5.27/13.21/6.23) WAV (6.52/15.55/7.22) WAV (5.20/15.97/5.69) WAV (5.35/17.57/5.70) WAV (5.08/18.13/5.37)
0 WAV WAV WAV (7.72/12.64/9.64) WAV (9.83/21.83/10.14) WAV (9.26/22.73/9.48) WAV (8.51/22.34/8.72) WAV (9.87/23.29/10.09) WAV (9.69/23.46/9.89)
5 WAV WAV WAV (10.69/15.23/12.71) WAV (8.89/18.78/9.42) WAV (9.00/24.54/9.14) WAV (8.53/24.13/8.67) WAV (7.93/24.88/8.03) WAV (8.82/25.27/8.93)
10 WAV WAV WAV (15.14/20.57/16.65) WAV (14.15/30.17/14.26) WAV (13.52/31.26/13.59) WAV (13.45/31.01/13.53) WAV (12.58/32.61/12.62) WAV (12.84/31.66/12.90)

References

[1] Loizou, P.C. Speech Enhancement: Theory and Practice, Taylor and Francis, 2007.
[2] Kamath, S. and Loizou, P.C., "A multi-band spectral subtraction method for enhanceing speech corrupted by colored noise," in Student Research Abstracts of Proc. ICASSP, 2002.
[3] Scalart, P. and Filho, J., "Speech enhancement based on a priori signal to noise estimation,'' in Proc. ICASSP, pp. 629--632, 1996.
[4] Ephraim, Y. and Malah, D., "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process., 33:443--445, 1985.
[5] Hu, Y. and Loizou, P.C., "A generalized subspace approach for enhancing speech corrupted by colored noise," IEEE Trans. Speech Audio Process., pp. 334--341, 2003.
[6] Smaragdis, P., Raj, B. and Shashanka, M., "A probabilistic latent variable model for acoustic modeling," in Workshop of Advances in Models for Acoustic Processing, NIPS, 2006.
[7] Rix, A., Beerends, J. Hollier, M. and Hekstra, A., "Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codes," in Proc. ICASSP, pp. 749--752, 2001.
[8] Vincent, E., Fevotte, C. and Gribonval, R., "Performance measurement in blind audio source separation," IEEE Trans. on Audio Speech Lang. Process., 14(4):1462--1469, 2006.


For any questions or comments, please contact us at zhiyaoduan00 AT gmail DOT com.