Voice assistants are real-life savers and for many, probably the best technology. We all have been using our voice assistant devices from time to time to help us ease the manual work such as finding our favorite song, sending text messages when our hands are tied, searching for information, or just simply asking for an address while we are driving. But it’s not a perfect or most efficient technology because it doesn’t always comply and do as you need. This could be due to various reasons such as a network outage, have lost signal, or you are at some out-of-coverage area – either way the voice assistant cannot connect to the main server to function and therefore cannot help you with what you need.
Speech-to-Text (STT) API processes over a billion minutes of speech monthly which is clear proof of how voice assistants and automatic voice recognition technologies are important for millions of people to take necessary decisions and navigate their lives.
Speech On-Device was made available at Google Cloud Next 22, embedding the speech recognition technology available in the cloud for various uses. This basically means environments of inconsistent networks or signals, or little and not all internet connection.
Let’s not forget that these speech-to-text and text-to-speech technologies have already been used in google assistant. But the speech on-device can help the newer apps and technologies that are developed by each day, to harness this technology and its services for creating better solutions.
AI Speech with or without a network connection
Speech On-Device delivers server-quality voice capabilities while also helping to maintain privacy and accessibility by keeping the data safe on the local device. That means if you are driving through a tunnel or using apps on integrated devices, speech On-Device has local access to the server to be able to comply without the network. This is made possible by the new techniques and technologies, for both speech-to-text and text-to-speech.
The size and necessary commute of data to fully run speech models with all available features have been decreased. This is made possible with years of work on end-to-end speech models for Speech-to-text. For text-to-speech, newer technologies have been utilized and leveraged at google to bring high-quality voice into the vehicles. Besides providing acoustic quality comparable to WaveNet, DeepMind’s breakthrough model for producing natural-sounding speech, Speech On-Device TTS is also much less computationally demanding and works on embedded CPUs, and can run without the need for acceleration.
There is a large scope for organizations to best utilize this new speech-driven technology and exciting experiences. Especially with the known knowledge of Speech On-Device’s early adopter’s actions.