Asked — Edited
Resolved by DJ Sures!
Hello, I have literally tried everything in every tutorial on here to get speech recognition to move servos and nothing at all as worked. what could i be doing wrong?
Upgrade to ARC Pro
Your robot can be more than a simple automated machine with the power of ARC Pro!
Based on your post activity, we found some content that may be interesting to you. Explore these other tutorials and community conversations.
What if you tried "What is your name" (with no quotes of course) as the command in the left column and in the right column type..
Do you hear your PC speak the above? I assume your PC has speakers attached....
As for multiple commands, they will all run. That's how a program works. It reads from the first line to the last line, similar to how you read a book.
Windows requires quite a bit of training to get good speech recognition when using the default speech to text controls. These are free to use within Windows and free to use for developers. The only downside is that it takes extensive training to get them to work great. The more you train, the better results you will get from ARC because it uses this Speech recognition engine.
There are other Speech recognition engines available but they are quite expensive and require additional fees to use and as ARC is a no-cost software, probably wouldn't be added without additional costs associated. It would also require a rewrite to quite a bit of ARC I would suspect.
The simple solution is to train your speech recognition engine multiple times, which would start to increase its reliability.
You could/should also try testing your script by simply "clicking" the run button on the script page.
Lowering the confidence less than 77 is not a fix. That low of a confidence value cannot be fixed by lowering the minimum confidence requirement. Follow the tutorial on training the speech recognition for Windows - and if that doesn't solve it, a new microphone would be needed.
Thanks for the input everyone
Believe me, I have spent a lot of time with various mics and with multiple speech recognition services.
In any event, it was caused by the windows SR service not being able to get a high enough quality result.
From the posts, it appears that the speech recognition part actually works with the built in MS speech recognition functions. Does it then use what it thinks it recognizes being said and compare that with what is entered into the various boxes in the SR Control until it finds a match (or not)?
If that is so, does it actually recognize anything else that is said, but simply ignores any other words spoken since they don't show up in the list? Or don't show up in the right order, whatever?
Would it be possible to provide a variable related to the SR Control with a string with all the words it recognizes (or thinks it recognizes) in a given input sentence? Regardless if they matched one of the entries in the SR Control.
For example if the user said "Turn around robot" but there is no entry in the SR Control for the phrase "Turn around robot". Would it still recognize the words "turn", and "around". and "robot" simply because they are valid English words? And, if so, could they be returned to the system in a string?
To use dictation based SR, you would then have to use speech classification and categorization to identify the meaning of what is said. This is a much more indepth process and what makes EZ-AI work. Watson, OK Google, Amazon and SIRI all work this way. It is a completely different thing. recognizing what is said and converting it to text is one thing. Understanding what to do or how to handle what is said is a completely different thing.
Dictionary based SR really doesn't require this because only certain things can be said in specific order. The meaning is known because the phrases are setup by the user. In dictation SR, anything can be said in any order. This is what requires the classification and meaning extraction to take place.
Dictionary based SR is what was used on automated phone attendant systems. Dictation SR has just become decent in the past 5 or so years.
Thank you for your response. Perhaps I wasn't clear in my previous post. I don't care if it understands the meaning of the words or the phrases, just that it recognize the words spoken and they be returned in a variable. I'm not looking for the routine to respond to what I say, just that it returns whatever words it recognized me as saying.
After all, you can put any valid word you want in the SR Control and have it recognized. It has to recognize the word when it is spoken from what has been typed into a given box in the SR Control. It doesn't compare a sample of you speaking the word. It has to know when you say the word simply from it being typed in. Therefore, the routine which looks for certain words and then compares those words to the entries in the SR Control has to know what the word is supposed to sound like. Since it can be any word. it must have a great many words in it's "vocabulary" that it can recognize and use to compare against the SR entries.
For example, if you say the word "around" it probably recognizes that word as a valid English word and will compare it against the entries in the SR Control as input by the user. If it's not there, it forgets it. Deletes it, whatever. But, at some point the underlying routine must have recognized it as a word, (and not just some random noise) and did something to decide whether to use it or not. And so it would be with other spoken words, regardless if they are in the user entries.
What I would like to see is for the underlying routine to remember all recognized words, regardless of whether they are in the entries in the SR Control, and return them in a variable as part of it's routine. I could then use those words in my own scripts and derive meaning from them myself.
Dictionary based SR takes the words that are defined, loads them into memory, and uses only these words in the exact order that they are in to decide what is said. As this list grows, it becomes less accurate. It is based on the phonetic sounds of this text and has no understanding of what these words are.
In order to get the text that is spoken and then decide what to do when using dictation based logic (which is required for what you are describing) you have to record the audio and pass it through a dictionary of known words that contains all known words for a language. This is much larger and as you can imagine, not as accurate unless you compare the words to the surrounding words.
Here is an example. "Lookup New Direction." vs "Look up 10 degrees."
just the first word or first two words require meaning extraction to understand if the word is Lookup or Look up. In order to take this speech, and then place this text into a variable alone, without doing anything except for placing this text into a variable, you have to classify and categorize the text. This is the reason the statement that I made earlier that it should be researched. It is a very interesting process and why some companies charge so much for these services.
With dictionary based services like this, you don't care because lookup and look up sound the same and the rest of the phrase is then matched based on the other words. Actually, this isn't done based on a list of known words but on speech patterns based on known character combinations and what you have stored in your dictionary. This is why it gets less accurate as the list gets longer and why it is so important to train the SR engine so well. It doesn't care what lookup is or what look up is. it has no clue if this is even a word or phrase. it just knows what pattern it needs to match for the sound of Luk up. This is also why you should use phonetic spellings of words like "to" instead of "two" or "too".
Thank you for the more detailed explanation. I know you are very busy with your projects and it was good of you to take the time to respond.
Based on what you posted, it seems I will have to make my own external routine to do what I want and import the results. That was the primary decision I needed to make regarding how to handle what I needed. Thanks again.
See if you get the recognition accuracy that you desire before creating a plugin.