EFFICIENT DATA AUGMENTATION FOR TRAINING A VOICE VERIFICATION SYSTEM RESISTANT TO SPEECH DISTORTION
Main Article Content
Abstract
In this paper a new method of augmentation of speech data for effective training of voice verification systems was presented. It was based on expanding the set of audio signal transformations by adding a speech quality improvement method applied to distorted audio signals. This ensures that all the main ways of using modern verification systems were taken into account, both with and without preprocessing of the recorded speech signal. The proposed technique had been tested on VoxCeleb1 voice recordings, noise and pulse characteristics from the DNS Challenge 2023 set. FastResNet34 architecture was used as a neural network for testing the proposed technique. It was shown that training on an expanded augmented data set with artificially distorted and distortion-free speech samples gave a significant increase in the quality of verification in all major scenarios of using the model verification system.
Downloads
Article Details
References
2. Reddy C., Gopal V., Cutler R. DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors // ICASSP 2020. Proc. IEEE 2020, Barcelona, Spain, 4-8 мая 2020.
3. Abayomi-Alli O. O., Damasevicius R., Qazi A., Adedoyin-Olowe M. Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review // Electronics. 2022. Т. 22. № 11. С. 3795.
4. Chung J. S., Jaesung H., Seongkyu M., Lee M., Heo H. S., Choe S., Ham C., Jung S., Lee B. J., Han I. In defence of metric learning for speaker recognition // Proc. Interspeech. 2020. С. 2977-2981.
5. Kaiming H., Xiangyu Z., Shaoqing R., Jian S. Deep Residual Learning for Image Recognition // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Proc. IEEE 2016, Las Vegas, 26 июня - 1 июля. С. 770-778.
6. Sainburg T., Thielk M., Gentner T. G. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires // PLOS Computational Biology. 2020. Т. 10. № 16.
7. timsainb/noisereduce // Github.com: сайт. URL: https://github.com/timsainb/noisereduce (дата обращения: 15.10.2024).
8. Nagrani, A., Chung, J.S., Zisserman, A. VoxCeleb: A Large-Scale Speaker Identification Dataset // Proc. Interspeech. 2017. С. 2616-2620.
9. BS.562: Subjective assessment of sound quality // itu.int: сайт. URL: https://www.itu.int/rec/ R-REC-BS.562/en (дата обращения: 15.10.2024).
10. Муртазин Р. А., Кузнецов А. Ю., Фёдоров Е. А., Гарипов И. М., Холоденина А. В., Балданова Ю. Ю., Воробьева А. А. Алгоритм выявления синтезированного голоса на основе кепстральных коэффициентов и сверточной нейронной сети // Научно-технический вестник информационных технологий, механики и оптики. 2021. Т. 21. № 4. С. 545–552.