USING OF DIGITAL VOICE SIGNAL PROCESSING TO IMPROVE SPEECH RECOGNITION

Main Article Content

А.А. Дмитриев Email: dmitriev@asu.ru
Д.А. Дмитриев Email: dmitriev.d.a@vc.asu.ru

Abstract

The paper proposes a method for pre-processing voice audio recordingsreceived over a telephone line during a conversation between a user and a voice assistant.For speech recognition, a hardware-software complex built on the basis of Kaldi softwarewas used in the work. It is shown that the received voice signals can be distorted by noiseassociated with the operation of telephone network devices. Therefore, for reliablerecognition of words in the recorded speech, a preliminary filtering of the signal wasapplied. A band pass filter was used to perform the signal processing. The use of digitalfiltering made it possible to improve the quality of the recordings and reduce the error inrecognizing individual words in the recorded signals.

Downloads

Download data is not yet available.

Article Details

How to Cite
1. Дмитриев А., Дмитриев Д. USING OF DIGITAL VOICE SIGNAL PROCESSING TO IMPROVE SPEECH RECOGNITION // ПРОБЛЕМЫ ПРАВОВОЙ И ТЕХНИЧЕСКОЙ ЗАЩИТЫ ИНФОРМАЦИИ, 2023. № 10. P. 4-8. URL: http://journal.asu.ru/ptzi/article/view/13171.
Section
Проблемы технического обеспечения информационной безопасности

References

Беленко М.В., Балакшин П.В. Сравнительный анализ систем распознавания речи с открытым кодом // Международный научно-исследовательский журнал. – 2017. - №4(58). – С. 13-18.

Jha M. Improved unsupervised speech recognition system using MLLR speaker adaptation and confidence measurement // V Jornadas en Tecnologıas del Habla (VJTH’2008). – 2008. – P. 255-258.

Ravanelli M., Parcollet T., Bengio Y. The pytorch-kaldi speech recognition toolkit // 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). – 2019. – P. 6465-6469.

Брайант Р., Медсен Л., Меггелен Д. В. Asterisk: окончательное руководство // O'Reilly Media, 2013. – 641 p.

Povey D., Ghoshal A., Boulianne G. The Kaldi Speech Recognition Toolkit // IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. – 2011. – P. 1-4 .

Берзинь А.У. Применение i-векторов для автоматизированного определения уровня близости языков // Труды ИСП РАН. – 2019. – Т. 31. Вып. 5. – С. 153 - 164.

Peddinti V., Povey D., Khudanpur S. A time delay neural network architecture for efficient modeling of long temporal contexts // Interspeech. – 2015. – P. 3214-3218.

Georgescu A.-L., Cucu H., Burileanu C. Kaldi-based DNN Architectures for Speech Recognition in Romanian // 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD). – 2019. – P. 1-6.

Сергиенко А.Б. Цифровая обработка сигналов // СПб. : Питер, 2002. – 608 с.

Карпов А.А., Кипяткова И.С. Методология оценивания работы систем автоматического распознавания речи // ИЗВ. ВУЗОВ. ПРИБОРОСТРОЕНИЕ. – 2012. – T. 55, № 11. – С. 38-43.