USE OF SCATTERING TRANSFORM ON DISCRETE WAVELET DECOMPOSITION COEFFICIENTS FOR BIOMETRIC SPEAKER VERIFICATION
Main Article Content
Abstract
In this paper authors propose a new approach for calculating of speechsignal features for the sake of speaker verification problem. A multilevel transformationwas applied to the signal, calculating the scattering coefficients based on discrete waveletdecomposition. The resulting feature vectors were used as input data for a time-delayneural network. On their basis, the neural network calculated the speaker identity vectors,which were directly used for biometric verification. The proposed approach was tested ondata from the VoxCeleb1 and VoxCeleb2 voice sample sets. The effectiveness of theapproach was shown in comparison with existing verification methods based on deepneural networks
Downloads
Download data is not yet available.
Article Details
How to Cite
1. Lependin A., Gaponov D., Filin Y., Ladygin P. USE OF SCATTERING TRANSFORM ON DISCRETE WAVELET DECOMPOSITION COEFFICIENTS FOR BIOMETRIC SPEAKER VERIFICATION // ПРОБЛЕМЫ ПРАВОВОЙ И ТЕХНИЧЕСКОЙ ЗАЩИТЫ ИНФОРМАЦИИ, 2020. № 8. P. 35-41. URL: http://journal.asu.ru/ptzi/article/view/13934.
Section
Проблемы технического обеспечения информационной безопасности
References
Rabiner L., Juang B.H. Fundamentals of speech recognition // N.-J. PrenticeHall, 1993. – 507 p.
ГОСТ Р 58624.1–2019. Информационные технологии. Биометрия. Обнаружение атаки на биометрическое предъявление. Стандарт по атакам представлением. Часть 1. Структура
Mallat S. Group Invariant Scattering [электронный ресурс] // режим доступа: http://arxiv.org/abs/1101.2286.
Anden J., Mallat S. Multiscale Scattering for Audio Classification // Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011. pp. 657-662.
Verma P, Das PK. I-vectors in speech processing applications: a survey // International Journal of Speech Technolng. — 2015. — Vol. 18, No. 4. DOI: 10.1007/978-981-10-6626-9_18.
Snyder D., Garcia-Romero D., Sell G., Povey, D., Khudanpur S. X -Vectors: Robust DNN Embeddings for Speaker Recognition // ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). –pp. 5329-5333.
Nagrani A., Chung J.S., Zisserman A. VoxCeleb: a large scale speaker identification dataset [электронный ресурс] // режим доступа: https://arxiv.org/pdf/1706.08612
Chung J.S., Nagrani A., Zisserman A. VoxCeleb2: Deep Speaker Recognition [электронный ресурс] // режим доступа: https://arxiv.org/pdf/1806.05622
Huang X., Acero A., Hon H.-W. Spoken Language Processing. A Guide to Theory Algorithm and System Development. N.-J. Prentice Hall. – 965 p.
Lee Fugal D. Conceptual Wavelets in Digital Signal Processing // San Diego: Space & Signals Technologies. 2009. 302 p.
Kingma D., Ba J. Adam: A Method for Stochastic Optimization // Proc. of International Conference on Learning Representations [электронный ресурс] // режим доступа:: https://arxiv.org/pdf/1412.6980
Pedamonti, D. Comparison of non-linear activation functions for deep neural networks on MNIST classification task [электронный ресурс] // режим доступа::https://arxiv.org/pdf/1804.02763
ГОСТ Р 58624.1–2019. Информационные технологии. Биометрия. Обнаружение атаки на биометрическое предъявление. Стандарт по атакам представлением. Часть 1. Структура
Mallat S. Group Invariant Scattering [электронный ресурс] // режим доступа: http://arxiv.org/abs/1101.2286.
Anden J., Mallat S. Multiscale Scattering for Audio Classification // Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011. pp. 657-662.
Verma P, Das PK. I-vectors in speech processing applications: a survey // International Journal of Speech Technolng. — 2015. — Vol. 18, No. 4. DOI: 10.1007/978-981-10-6626-9_18.
Snyder D., Garcia-Romero D., Sell G., Povey, D., Khudanpur S. X -Vectors: Robust DNN Embeddings for Speaker Recognition // ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). –pp. 5329-5333.
Nagrani A., Chung J.S., Zisserman A. VoxCeleb: a large scale speaker identification dataset [электронный ресурс] // режим доступа: https://arxiv.org/pdf/1706.08612
Chung J.S., Nagrani A., Zisserman A. VoxCeleb2: Deep Speaker Recognition [электронный ресурс] // режим доступа: https://arxiv.org/pdf/1806.05622
Huang X., Acero A., Hon H.-W. Spoken Language Processing. A Guide to Theory Algorithm and System Development. N.-J. Prentice Hall. – 965 p.
Lee Fugal D. Conceptual Wavelets in Digital Signal Processing // San Diego: Space & Signals Technologies. 2009. 302 p.
Kingma D., Ba J. Adam: A Method for Stochastic Optimization // Proc. of International Conference on Learning Representations [электронный ресурс] // режим доступа:: https://arxiv.org/pdf/1412.6980
Pedamonti, D. Comparison of non-linear activation functions for deep neural networks on MNIST classification task [электронный ресурс] // режим доступа::https://arxiv.org/pdf/1804.02763