USE OF SCATTERING TRANSFORM ON DISCRETE WAVELET DECOMPOSITION COEFFICIENTS FOR BIOMETRIC SPEAKER VERIFICATION

Main Article Content

A.A. Lependin Email: andrey.lependin@gmail.com
D.A. Gaponov
Y.A. Filin
P.S. Ladygin

Abstract

In this paper authors propose a new approach for calculating of speechsignal features for the sake of speaker verification problem. A multilevel transformationwas applied to the signal, calculating the scattering coefficients based on discrete waveletdecomposition. The resulting feature vectors were used as input data for a time-delayneural network. On their basis, the neural network calculated the speaker identity vectors,which were directly used for biometric verification. The proposed approach was tested ondata from the VoxCeleb1 and VoxCeleb2 voice sample sets. The effectiveness of theapproach was shown in comparison with existing verification methods based on deepneural networks

Downloads

Download data is not yet available.

Article Details

How to Cite
1. Lependin A., Gaponov D., Filin Y., Ladygin P. USE OF SCATTERING TRANSFORM ON DISCRETE WAVELET DECOMPOSITION COEFFICIENTS FOR BIOMETRIC SPEAKER VERIFICATION // ПРОБЛЕМЫ ПРАВОВОЙ И ТЕХНИЧЕСКОЙ ЗАЩИТЫ ИНФОРМАЦИИ, 2020. № 8. P. 35-41. URL: http://journal.asu.ru/ptzi/article/view/13934.
Section
Проблемы технического обеспечения информационной безопасности

References

Rabiner L., Juang B.H. Fundamentals of speech recognition // N.-J. PrenticeHall, 1993. – 507 p.

ГОСТ Р 58624.1–2019. Информационные технологии. Биометрия. Обнаружение атаки на биометрическое предъявление. Стандарт по атакам представлением. Часть 1. Структура

Mallat S. Group Invariant Scattering [электронный ресурс] // режим доступа: http://arxiv.org/abs/1101.2286.

Anden J., Mallat S. Multiscale Scattering for Audio Classification // Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011. pp. 657-662.

Verma P, Das PK. I-vectors in speech processing applications: a survey // International Journal of Speech Technolng. — 2015. — Vol. 18, No. 4. DOI: 10.1007/978-981-10-6626-9_18.

Snyder D., Garcia-Romero D., Sell G., Povey, D., Khudanpur S. X -Vectors: Robust DNN Embeddings for Speaker Recognition // ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). –pp. 5329-5333.

Nagrani A., Chung J.S., Zisserman A. VoxCeleb: a large scale speaker identification dataset [электронный ресурс] // режим доступа: https://arxiv.org/pdf/1706.08612

Chung J.S., Nagrani A., Zisserman A. VoxCeleb2: Deep Speaker Recognition [электронный ресурс] // режим доступа: https://arxiv.org/pdf/1806.05622

Huang X., Acero A., Hon H.-W. Spoken Language Processing. A Guide to Theory Algorithm and System Development. N.-J. Prentice Hall. – 965 p.

Lee Fugal D. Conceptual Wavelets in Digital Signal Processing // San Diego: Space & Signals Technologies. 2009. 302 p.

Kingma D., Ba J. Adam: A Method for Stochastic Optimization // Proc. of International Conference on Learning Representations [электронный ресурс] // режим доступа:: https://arxiv.org/pdf/1412.6980

Pedamonti, D. Comparison of non-linear activation functions for deep neural networks on MNIST classification task [электронный ресурс] // режим доступа::https://arxiv.org/pdf/1804.02763