Abstract

Using silent speech to issue commands has received growing attention, as users can utilize existing command sets from voice-based interfaces without attracting other people’s attention. Such interaction maintains privacy and social acceptance from others. However, current solutions for recognizing silent speech mainly rely on camera-based data or attaching sensors to the throat. Camera-based solutions require 5.82 times larger power consumption or have potential privacy issues; attaching sensors to the throat is not practical for commercial-off-the-shell (COTS) devices because additional sensors are required. In this paper, we propose a sensing technique that only needs a microphone and a speaker on COTS devices, which not only consumes little power but also has fewer privacy concerns. By deconstructing the received acoustic signals, a 2D motion profile can be generated. We propose a classifier based on convolutional neural networks (CNN) to identify the corresponding silent command from the 2D motion profiles. The proposed classifier can adapt to users and is robust when tested by environmental factors. Our evaluation shows that the system achieves 92.5% accuracy in classifying 20 commands.

Fig. 1 Use case of Endophasia.
Fig. 2 Example of reconstructed acoustic images of various silent commands.
Fig. 3 System flow and the proposed training model.

Demo Video


Publication