Skillfully manipulating sounds that can not be heard by humans and hacking voice assistants such as Siri What is "Dolphin Attack"?



Techniques that can manipulate sound assistants that can be operated by talking to yourself, such as Amazon Echo or Google Home, freely with ultrasonic waves that human beings do not hear "DolphinAttack"(Dolphin · Attack) was developed by a research team at Zhejiang University in China. The point to pay attention to here is that the voice assistant who should respond only to human voice is using the principle of ultrasonic wave and "overtone" well.

DolphinAttack: Inaudible Voice Commands - Dolphinattack.pdf
(PDF)https://assets.documentcloud.org/documents/3987864/Dolphinattack.pdf

Hackers send silent commands to speech recognition with ultrasound | TechCrunch
https://techcrunch.com/2017/09/06/hackers-send-silent-commands-to-speech-recognition-systems-with-ultrasound/

A voice assistant that can give instructions by talking to "Ok Google" or "Alexa" is basically made to react only to human voice. In order to perform some operation, it is necessary to emit a sound in a frequency band that can be heard by a human, that is, "voice", so even if a malicious hacker attempts to take over the operation of the voice assistant, I hear a voice. Then "hijack" does not succeed, but the dolphin attack is a phenomenon that occurs when things oscillateHarmonic"By using it well, it is a surprising technology that makes it possible to 'produce' human's voices from sounds that can not be heard by human beings.

◆ What is overtone?
It is well known that sound is born by vibrating substances, but in that case "overtones" to the base (original) sound will occur. Harmonics are considered to be "components of sound having a frequency that is an integral multiple of 2 or more with respect to the fundamental frequency of the sound", and it is considered that all the sounds existing in nature are made up of fundamental tones and harmonics .

BySphl

For example, when playing the sound of "La" near the center of the keyboard with a piano, the height and frequency of that sound will be 440 hertz (Hz). However, in fact, sounds of frequencies such as 880 Hz, 1320 Hz, 1760 Hz which are integral multiples of 440 Hz, and sounds such as 220 Hz and 110 Hz lower than the basic tone are simultaneously sounding, and these harmonic tones determine the tone of each instrument . In fact, the harmonics that natural musical instruments have are often not integer multiples, and unique fluctuations are also creating the sound of instruments.

"Dolphin · Attack" is using the mechanism of this harmonic (resonance). The Dolphin Attack can only turn on the function by mistaking that the microphone of the voice assistant is "spoken" simply by issuing a high sound that can not be heard by the human ear.

◆ Easy mechanism of dolphin attack
Audio assistant equipment that operates in response to human voice has built-in microphone. The microphone catches the vibration transmitted through the air with a thin film "diaphragm" and converts the sound into an electric signal by vibrating the diaphragm. At this time, although it is ideal that the diaphragm that vibrates upon receipt of sound vibrates according to the waveform of the input sound, the diaphragm may also produce vibrations that were not originally in the original by the resonance .

For example, suppose that you play a "880 Hz" pure tone without overtones. Basically, it vibrates at 880 Hz to convert the diaphragm of the microphone that received the sound into an electric signal. In fact, however, due to the characteristics of the microphone, sounds such as 1760 Hz (2 times), 440 Hz (0.5 times), 110 Hz (0.125 times) etc. which should not be included in the pure tone are included slightly.

In many cases, this overtone can be ignored because the sound is so small that it does not matter, but if the original sound is a very large volume, it will be recognized as a signal that can not be ignored .

Dolphin attack converts the voice of the original person electrically and converts it to a very high sound of 20 kHz or more that can not be heard by the human ear. Then give that sound to the voice assistant microphone with a loud volume. At this time, it is felt that no sound is ringing to human beings, but the diaphragm of the microphone that receives this sound vibrates at the same frequency as the given sound and generates an electric signal. If the generated electrical signal is exactly the same as the original sound, the super high-frequency component will be cut by "low-pass filter" built in the electric circuit and it will be "silent" , It is the overtone mentioned above that becomes troublesome here.


If a person's voice converted to a height around 32 kHz is given, the diaphragm of the microphone will also vibrate at frequencies of harmonics corresponding to that sound. For harmonics of 32 kHz (= 32000 Hz), if the frequency is 1/2 the frequency is 16 kHz, if it is 1/4 kHz, the overtones are included in the human audible range. In addition, it is 4 kHz if it is 1/8, and 2 kHz if it is 1/16, but the frequency around this is said to be the most important frequency range that shapes human voice, and the contents of the talk to the voice assistant It contains enough information to make you understand.

At this time, the actual ringing sound is beyond the human audible range, so the human being who is watching the situation in the surroundings has not heard anything at all. Nevertheless, the diaphragm of the microphone of the voice assistant who received the Dolphin Attack 's ultra high tone produces the same vibration as a human voice "without permission", converts it into an electric signal and converts it into an electric signal and recognizes it to the computer, so that the voice assistant It creates "malfunction" of.

This explanation is a simplified version of the principle, so it seems that more advanced modulation etc is actually done. However, it can be said that Dolphin Attack, which is a method that successfully poked the pitfalls of human ear characteristics and microphone performance, has a really skilled hacking technique and a mechanism that can groan. The research team at Zhejiang University actually uses this technology to successfully operate voice assistants such as Siri, Google Now, Samsung S Voice, Huawei HiVoice, Microsoft Cortana and Amazon Alexa.

An example of equipment used for experiments is as follows. Samsung Galaxy S6's signal is amplified by a small amplifier and the speaker (transducer) that reproduces the super high sound is played, and the cost of the amplifier, the transducer, the battery etc. was less than 3 dollars (about 320 yen) Thing.


Although "Dolphin · Attack" which seems to respond to the sound which should not be heard in spite, at the present stage the effective range is only within 1 meter from the main body of the voice assistant equipment, It is also made clear that they do not have enough performance to operate with Kossori. In addition, setting to turn on voice assistant by talking to "Hey Siri"Turn offBy doing so, it is also possible to easily prevent unexpected takeovers. Nonetheless, it is possible to hack the Bluetooth speaker placed beside the voice assistant equipment to make it a stepping stone and operate the voice assistant by running ultrasonic waves from there.

in Hardware,   Security, Posted by darkhorse_log