DEVELOPMENT OF A SPEECH SIGNALS NOISE CLEANING METHOD TO IMPROVE THE QUALITY OF BIOMETRIC VOICE VERIFICATION
Main Article Content
Abstract
Speaker verification systems have recently been widely used in a widerange of information systems. This method of identity verification is extremely convenient,since only a microphone, which is available by default in most electronic devices, isneeded to register speech samples. However, the performance of such systems issignificantly reduced when the speech sample was recorded in noisy environment. In thispaper, a new speech enhancement model based on recurrent neural networks was proposed,which was tested for the problem of speaker verification. On the DNS Challenge 2020 dataset, the developed approach demonstrated the best quality of noise removal in comparisonwith alternative approaches. It made it possible to significantly reduce the level of errors inthe model system for verifying speakers on the VoxCeleb1 test data set.
Downloads
Article Details
References
Loizou P.C. Speech Enhancement: Theory and Practice. – М.: Boca Raton. FL. USA: CRC Press, 2007. – 716 c.
Николенко С.И., Кадурин А.А., Архангельская Е.О. Глубокое обучение. – М., СПб. : Питер, 2018 – 480 с.
Williamson D.S., Wang Y., Wang D. Complex ratio masking for monaural speech separation // IEEE/ACM Transactions on Audio, Speech, and Language Processing. – 2016. –No 3 (24). – P. 483-492.
Nasretdinov R.S., Ilyashenko I.D., Lependin A.A. Two-stage method of speech denoising by long short-term memory neural network // 11th International Conference on High-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production, HPCST 2021, Barnaul 21-22 May 2021. CCIS, Vol. 1526. – Springer, 2022. – P. 86-97.
Reddy, C.K.A., Gopal, V., Cutler, R., Beyrami, E., Cheng, R., Dubey, H., Matusevych, S., Aichner, R., Aazami, A., Braun, S., Rana, P., Srinivasan, S., Gehrke, J. The INTERSPEECH 2020 Deep Noise Suppression Challenge: datasets, subjective testing framework, and challenge results // Proc. Interspeech 2020. – 2020. – P. 2492-2496.
Nagrani A., Chung J.S., Zisserman A. VoxCeleb: a large-scale speaker identification dataset // Proc. Interspeech 2017 – 2017. – pp. 2616-2620.
Rix A.W., Beerends J.G., Hollier M.P. Hekstra A.P. Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs // 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. – 2001. – P. 749-752.
Taal C.H., Hendriks R.C., Heusdens R., Jensen J., A short-time objective intelligibility measure for time-frequency weighted noisy speech // 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. – 2010. – P. 4214- 4217.
Roux J.L., Wisdom S., Erdogan H., Hershey J.R. SDR – half-baked or well done? // ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). – 2019. – P. 626-630.
Braun S., Tashev I. Data augmentation and loss normalization for deep noise suppression // 22nd International Conference on Speech and Computer (SPECOM). LNAI 12335. – Springer, 2020. – P. 79–86.
Hao X. FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement / X. Hao, X. Su, R. Horaud, X. Li // IEEE International Conference on Acoustics, Speech, and Signal Processing. - 2021. - P. 1-5.
Hu J., Shen L., Sun G. Squeeze-and-excitation networks // Proceedings of the IEEE conference on computer vision and pattern recognition. – 2018. – P. 7132-7141.