With the rapid development of artificial intelligence, the voiceprint recognition system has been widely used in society, especially in the financial sector. However, voice spoofing attacks can easily deceive some apps like WeChat and Alipay, triggering public concern.
Is there any kind of technology intended for spoofing countermeasures that can protect our voice security and privacy? In the “Huawei Cup”, the first China Postgraduate Cyber Security Innovation Contest, a student team from Wuhan University presented several new ideas.
The team was composed of Liu Wuyang, Deng Junlong, Peng Li, and Zhu Hongcheng, four postgraduate students from the School of Cyber Science and Engineering. Under the guidance of Prof. Ren Yanzhen, they launched a project called “A voice spoofing detection system based on signal correlation at the micro level” and won first prize in the creative work competition.
Team members Liu Wuyang, Deng Junlong, Peng Li and Zhu Hongcheng
How did they come up with this system? Liu Wuyang, the team leader, explained, “Our team has been long researching steganalysis of speech signals. We found that when synthesizing spoofed voices, the similarities between authentic voice and spoofed voice on signals and features always get less attention. Therefore, the spoofed voice might have little in common with the authentic voice at the micro level. Our early experiments have confirmed this presumption, so we continued with it.”
To overcome the weaknesses of the current voiceprint recognition system, the team set up this voice spoofing detection system based on signal correlation at the micro level. As the pre-module of the voiceprint recognition system, this detection system extracts the micro correlation properties of different spectrograms from voice, then uses Vision Transformer (ViT) to demonstrate it. Therefore, the authenticity of the voice can be recognized before the voiceprint recognition.
“If we imagine the spectrogram as a painting, this detection system is like a magnifying glass. Strokes of authentic voice are consistent, while strokes of spoofed voice are snatchy under this ‘magnifying glass’,” Liu Wuyang explained.
The core technology of the voice spoofing detection system is the extraction of signal correlation at the micro level. The team first transformed audio into the spectrogram, then used the multiple linear regression model to fit the intraframe and interframe correlation of the spectrogram. They then found that the correlation of the spectrogram is higher in horizontal and vertical orientations, so they designed four different convolution kernels. “We can regard these convolution kernels as orientations of the ‘magnifying glass’, leading them to search for features horizontally and vertically,” Liu Wuyang explained.
The principle of the voice spoofing detection system
Compared with other algorithms, what advantages does this “Made In WHU” detection system have? As Liu Wuyang explained, this system adopts the idea of steganalysis and creates a method of voice spoofing detection based on signal correlation. By analyzing the correlation between intraframe and interframe signals, it can recognize the authenticity of voice. What’s more, its error rate is less than five percent, which excels against almost any other current algorithm. Meanwhile, given that different window lengths of spectrograms capture different features, this advanced system is equipped with multiscale spectrograms to analyze frames with different lengths, which further improves the detection accuracy.
In the closing speech of the contest, the judges spoke highly of this voice spoofing detection system invented by WHUers. “This system can effectively defend against current voice spoofing attacks, which is significant for the protection of voiceprint recognition systems,” said one judge.
How the voice spoofing detection system works
As voice synthesis technology improves rapidly, there will be increasing fraud using voice spoofing, which may do harm to individuals’ property and reputation. With this detection system, many affairs in forensics, e-commerce, and financial systems will be protected from voice forgery, and individual identity and privacy will also be safeguarded. To benefit more people, WHUers always insist on targeting the frontiers of technology and shooting for the stars.
Rewritten by Li Tong
Edited by Li Jing, Jin Chenwu, Sylvia, Xi Bingqing