Asked — Edited
Resolved Resolved by DJ Sures!

Speech Recognition

Hello, I have literally tried everything in every tutorial on here to get speech recognition to move servos and nothing at all as worked. what could i be doing wrong?


Upgrade to ARC Pro

With ARC Pro, your robot is not just a machine; it's your creative partner in the journey of technological exploration.


Check out this tutorial, it will get you going.


Press the ? On any control for help of that control. Here's a direct link to the speech recognition control:

The most import piece to read in the speech recognition manual is that the speech recognition should work in Windows first. Use the training feature as detailed in the tutorial to setup speech recognition in Windows.

You can find numerous tutorials in the speech recognition manual. Again, here's the direct link:


Yes I have viewed all those tutorials and still not working, i entered enable command and disable but the only thing is i left the script for enable command and disable command blank, do i have to put functions in this scripts in order for it to work. i entered my own phrases and command scripts ad it picks up me saying for example hand open and it recognizes it as it displays the text but the servos do not move


lower your confidence setting in the speech recognition module to see if that helps. I would start at about 90% and the start increasing it from there until it stops working again.


If it "picks up" what you're saying, what does the diagnostic window of the speech recognition say? It will provide information of wether the confidence is too low or of what it heard, etc.. All information for diagnosing that you require is displayed in the status window of the speech recognition control.

Lastly, are you connected to the ezb when testing?


From your statement above it sounds like you don't have an action to preform in the script section next to the statement you have typed into the SR window. The SR sounds like it's working but the SR control doesn't know what to do with it. In the script area to the right of the statement you write type in a command. Tell the servos you want to move when you speak the statement where to go to and how fast to move there. Use EZ Script commands pointing to the digital ports the servos are attached to.


Ok this is what i have. on the little screen where is displays back what you say in text, the only thing it will display back to me is if i say hand on it will not display any other commands back at all nor will it perform any servo movements listed in the scripts under the commands. i know its probably something simple and stupid im doing wrong but idk what it is.

under the script for hand open: servo(D1,120) servo(D2,120) servo(D3,120) servo(D4,120) servo(D0,110)

and for close its listed the same just with number 24

User-inserted image


EDIT Never mind I was seeing things... this doesn't seem to be the case...

Commands look funny... Looks like you have them all on one line...


you can only move one servo per command? can you not move multiple servos with a single command?


Does voice command work at all?

What if you tried "What is your name" (with no quotes of course) as the command in the left column and in the right column type..

say("My name is Bob, nice to meet you")

Do you hear your PC speak the above? I assume your PC has speakers attached....


Again, maybe it's the confidence value - please verify the confidence value as previously suggested. If it hears with a low confidence value, it will be displayed in the status window.

As for multiple commands, they will all run. That's how a program works. It reads from the first line to the last line, similar to how you read a book.


It does no matter what i set the confidence value at. it will only recognize hand on command. if i say hand on it will display this text back "Low Confidence: hand on (0.77)" it will not register any other commands nor will it perform any movements. yes it say low confidence but it says that on every value, maybe i need to try an external mic?


Have you went through the training for Windows voice recognition yet? It's under speech recognition in your control panel (in Windows, not ARC).... I found that with Windows 7 I could not get speech rec to work in ARC until I did the speech training in Windows first....


You will need to do more training in Windows. I would run through this training multiple times.

Windows requires quite a bit of training to get good speech recognition when using the default speech to text controls. These are free to use within Windows and free to use for developers. The only downside is that it takes extensive training to get them to work great. The more you train, the better results you will get from ARC because it uses this Speech recognition engine.

There are other Speech recognition engines available but they are quite expensive and require additional fees to use and as ARC is a no-cost software, probably wouldn't be added without additional costs associated. It would also require a rewrite to quite a bit of ARC I would suspect.

The simple solution is to train your speech recognition engine multiple times, which would start to increase its reliability.

United Kingdom

Do you have head or earphones for your phone that have an inline mic? These work really well with EZ-bulider voice recognition.

You could/should also try testing your script by simply "clicking" the run button on the script page.


Low confidence means it will not run, and that's what we've been helping you with :)

Lowering the confidence less than 77 is not a fix. That low of a confidence value cannot be fixed by lowering the minimum confidence requirement. Follow the tutorial on training the speech recognition for Windows - and if that doesn't solve it, a new microphone would be needed.


The Problem was the MIC the whole time,

Thanks for the input everyone


Run through the training with the new mic if you want it to work well.

Believe me, I have spent a lot of time with various mics and with multiple speech recognition services.

In any event, it was caused by the windows SR service not being able to get a high enough quality result.


If I may, I would like to take the occasion of this thread to ask a question about how the speech recognition works and see if a certain something could be done.

From the posts, it appears that the speech recognition part actually works with the built in MS speech recognition functions. Does it then use what it thinks it recognizes being said and compare that with what is entered into the various boxes in the SR Control until it finds a match (or not)?

If that is so, does it actually recognize anything else that is said, but simply ignores any other words spoken since they don't show up in the list? Or don't show up in the right order, whatever?

Would it be possible to provide a variable related to the SR Control with a string with all the words it recognizes (or thinks it recognizes) in a given input sentence? Regardless if they matched one of the entries in the SR Control.

For example if the user said "Turn around robot" but there is no entry in the SR Control for the phrase "Turn around robot". Would it still recognize the words "turn", and "around". and "robot" simply because they are valid English words? And, if so, could they be returned to the system in a string?


This uses dictionary based SR and not dictation based SR. Dictionary based is more accurate because it only looks for the specific words. It would be worth looking this up to see why and how it works.

To use dictation based SR, you would then have to use speech classification and categorization to identify the meaning of what is said. This is a much more indepth process and what makes EZ-AI work. Watson, OK Google, Amazon and SIRI all work this way. It is a completely different thing. recognizing what is said and converting it to text is one thing. Understanding what to do or how to handle what is said is a completely different thing.

Dictionary based SR really doesn't require this because only certain things can be said in specific order. The meaning is known because the phrases are setup by the user. In dictation SR, anything can be said in any order. This is what requires the classification and meaning extraction to take place.

Dictionary based SR is what was used on automated phone attendant systems. Dictation SR has just become decent in the past 5 or so years.



Thank you for your response. Perhaps I wasn't clear in my previous post. I don't care if it understands the meaning of the words or the phrases, just that it recognize the words spoken and they be returned in a variable. I'm not looking for the routine to respond to what I say, just that it returns whatever words it recognized me as saying.

After all, you can put any valid word you want in the SR Control and have it recognized. It has to recognize the word when it is spoken from what has been typed into a given box in the SR Control. It doesn't compare a sample of you speaking the word. It has to know when you say the word simply from it being typed in. Therefore, the routine which looks for certain words and then compares those words to the entries in the SR Control has to know what the word is supposed to sound like. Since it can be any word. it must have a great many words in it's "vocabulary" that it can recognize and use to compare against the SR entries.

For example, if you say the word "around" it probably recognizes that word as a valid English word and will compare it against the entries in the SR Control as input by the user. If it's not there, it forgets it. Deletes it, whatever. But, at some point the underlying routine must have recognized it as a word, (and not just some random noise) and did something to decide whether to use it or not. And so it would be with other spoken words, regardless if they are in the user entries.

What I would like to see is for the underlying routine to remember all recognized words, regardless of whether they are in the entries in the SR Control, and return them in a variable as part of it's routine. I could then use those words in my own scripts and derive meaning from them myself.


Again, you need to look up dictation based SR vs Dictionary based SR.

Dictionary based SR takes the words that are defined, loads them into memory, and uses only these words in the exact order that they are in to decide what is said. As this list grows, it becomes less accurate. It is based on the phonetic sounds of this text and has no understanding of what these words are.

In order to get the text that is spoken and then decide what to do when using dictation based logic (which is required for what you are describing) you have to record the audio and pass it through a dictionary of known words that contains all known words for a language. This is much larger and as you can imagine, not as accurate unless you compare the words to the surrounding words.

Here is an example. "Lookup New Direction." vs "Look up 10 degrees."
just the first word or first two words require meaning extraction to understand if the word is Lookup or Look up. In order to take this speech, and then place this text into a variable alone, without doing anything except for placing this text into a variable, you have to classify and categorize the text. This is the reason the statement that I made earlier that it should be researched. It is a very interesting process and why some companies charge so much for these services.

With dictionary based services like this, you don't care because lookup and look up sound the same and the rest of the phrase is then matched based on the other words. Actually, this isn't done based on a list of known words but on speech patterns based on known character combinations and what you have stored in your dictionary. This is why it gets less accurate as the list gets longer and why it is so important to train the SR engine so well. It doesn't care what lookup is or what look up is. it has no clue if this is even a word or phrase. it just knows what pattern it needs to match for the sound of Luk up. This is also why you should use phonetic spellings of words like "to" instead of "two" or "too".


@CochranRobotics Thank you for the more detailed explanation. I know you are very busy with your projects and it was good of you to take the time to respond.

Based on what you posted, it seems I will have to make my own external routine to do what I want and import the results. That was the primary decision I needed to make regarding how to handle what I needed. Thanks again.


@wbs, if you want to try what speech recognition is like in dictation mode, use the PandoraBot control:

See if you get the recognition accuracy that you desire before creating a plugin.


Everyone who commented helped with this issue thank you all