Why Industrial Voice Crushes Siri, Google, and Alexa
By DAN PAUL, VP of Customer Success
Love it or hate it, speech recognition technology is now a part of our daily life. With U.S. revenue growth projections of 16% year over year, expect to find yourself talking to more and more machines over the course of your day. Most people have had experience with automated phone systems and/or the myriad of personal electronic devices which have speech as the main user input (Alexa, Siri, Google, etc.). Unfortunately, the user experience with these consumer-grade options leaves much to be desired. “I didn’t quite get that” messages and confounding translations (“10 percent” = “temper sent”??) can leave users ready to throw their devices out the window.
Anyone using the latest industrial grade voice recognition technologies will tell you that there exists a marked difference between consumer-grade speech recognition software and today’s industrial grade offerings. We have identified the top four reasons for this improved performance.
- As you might imagine, speech recognition is hard. While great strides are being made in natural language processing (NLP), consumer systems still rely on something known as ‘Speaker Independent” recognition. This means that the users of the system never train the system on how they talk and moreover, most of these systems do not improve over time as they speak more and more into it. With the wide array of accents, grammar, and speech patterns it takes massive computing power and a massive data set of examples to determine what someone is saying – and the mistakes are many with this variability.
- High-end Industrial Voice Systems use ‘Speaker Dependent’ recognition, which means that each individual user trains the system with their own voice. This may sound like a large time investment, but most voice systems need only 50-70 phrases to handle typical operational workflows and training takes less than half an hour. User-specific training means that person’s unique vocal patterns, accents, and language are used when determining what command or response the user is attempting to give.
- Consumer grade speech recognition typically relies on picking up sound from a distance or from over a phone. In both cases capturing good quality audio is challenging because of the varying volume levels, likelihood and presence of background noise, and varying quality levels of the device itself.
- The best industrial speech systems rely on rugged headsets which bring the microphone close to the source of the desired sound: the mouth. In addition, microphone arrays cancel out background noise before it even reaches the speech processor. By normalizing these components of sound (volume/gain, quality) you begin the race with a huge headstart.
- Most consumer grade speech systems require connectivity to the internet to function. Assuming that you have a strong and continuous connection, the response time for these systems can be pretty good due to the super computers crunching this sound data in the background. But any break in connectivity brings your efforts to a halt.
- In contrast, many industrial systems perform speech recognition right on the device worn by the user. This means that you are able to use the system even on a deserted island. Also, the system is designed to allow only a limited array of phrases at any given time, ensuring that the recognition of these phrases is instantaneous – every time.
- Adaptive Recognition
- As previously stated, the data that is being collected and compiled for consumer-grade recognition is enormous and growing. This helps raise overall recognition scores but does little for the individual experience.
- In contrast, industrial systems can adapt to your changing speech patterns. As the day wears on, your voice changes somewhat. Hay fever or a common cold can drastically change your speech. By taking constant samples of recognition scores and adjusting the underlying speech template for users, the system improves over time – even with changes in the user’s speech.
Speaker dependant voice-directed workflows in industrial settings, now in use for over two decades, have become the de-facto standard in distribution center technology. Mountain Leverage has been delivering voice solutions for over a decade to a fiercely loyal customer base who enjoy improvements in accuracy, productivity, training time, safety, employee satisfaction, and more. We understand how speech recognition works and use it as a tool to deliver amazing results across many industries.
Want to know more? Reach out and we can discuss if voice can help solve some of your challenges at firstname.lastname@example.org.
Dan is the VP of Customer Success at Mountain Leverage. With over a dozen years of experience delivering voice solutions, Dan has a passion for helping companies discover and unlock their operational excellence.