DETECTION OF PHYSICAL SPEECH SPOOFING ATTACKS USING LIGHT CONVOLUTION NEURAL NETWORK WITH GRAPH ATTENTION LAYER УДК 004.934

Main Article Content

Aleksandr S. Beloslyudov
Andrey A. Lependin Email: andrey.lependin@gmail.com
Jacob A. Filin

Abstract

In this paper a model based on modification of the LCNN convolutional neural network through the use of graph attention layers was proposed. It is capable of effectively detecting physical attacks on speech data. The relevance and significance of the problem of detecting speech spoofing was shown in the context of increased interest in voice technologies and the security threat associated with the possibility of forging or changing audio data. The proposed approach was implemented in Python using the PyTorch library. The model was trained and tested using data from the ASVspoof 2019 set. The number of “heads” in the graph attention layer was selected. The selected version of the neural network model was compared in terms of accuracy and equivalent error EER with the base model, which was the LCNN network. The superiority of the modified approach proposed in this work has been demonstrated, both in terms of the quality of recognition of spoofed speech and in the number of model parameters.

Downloads

Download data is not yet available.

Article Details

How to Cite
1. Beloslyudov A. S., Lependin A. A., Filin J. A. DETECTION OF PHYSICAL SPEECH SPOOFING ATTACKS USING LIGHT CONVOLUTION NEURAL NETWORK WITH GRAPH ATTENTION LAYER // ПРОБЛЕМЫ ПРАВОВОЙ И ТЕХНИЧЕСКОЙ ЗАЩИТЫ ИНФОРМАЦИИ, 2023. № 11. P. 8-15. URL: http://journal.asu.ru/ptzi/article/view/14174.
Section
Проблемы технического обеспечения информационной безопасности

References

Wu Z., Evans N., Kinnunen T., Yamagishi J., Alegre F., Li H. Spoofing and countermeasures for speaker verification: A survey. // Speech Communication. 2015. Т. 66. С. 130–153.

Nautsch A., Wang X., Evans N., Kinnunen T. H., Vestman V., Todisco M., Delgado H., Sahidullah Md., Yamagishi J., Lee K.A. ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech // IEEE Transactions on Biometrics, Behavior, and Identity Science. 2021. № 3. С. 252–265.

Zhou J., Cui G., Zhang Z. Graph Neural Networks: A Review of Methods and Applications // AI Open. 2020. № 1(1). С. 57-81.

Lavrentyeva G., Novoselov S., Malykh E., Kozlov A., Kudashev O., Shchemelinin V. Audio replay attack detection with deep learning frameworks // Proc. Interspeech 2017, Stockholm, Sweden, 20-24 августа 2017. C. 82-86.

Petar V., Preixens G.C., Paga A. C., Romero A., Lio P., Bengio Y. Graph attention networks // ICLR 2018, Vancouver, Canada, 30 апреля-3 мая 2018. 12 c.

Todisco M., Delgado H., Enavs N. A New Feature for Automatic Speaker Verification AntiSpoofing: Constant Q Cepstral Coefficients // Proc. The Speaker and Language Recognition Workshop (Odyssey 2016), Bilbao, Spain, 21-24 июня 2016. С. 283–290.

Wu X., He R., Sun Z., Tan T. A Light CNN for Deep Face Representation with Noisy Labels // IEEE Transactions on Information Forensics and Security. 2018. № 11(13). С. 2884–2896.

LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-Based Learning Applied to Document Recognition // Proceedings of the IEEE. 1998. № 11(86). С. 2278–2324.

Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift // arxiv.org: сайт. URL: https://arxiv.org/abs/1502.03167/ (дата обращения: 15.10.2023).

Xu B., Wang N., Chen T., Li M. Empirical Evaluation of Rectified Activations in Convolutional Network // arxiv.org: сайт. URL: https://arxiv.org/abs/1505.00853/ (дата обращения: 15.10.2023).