Help With Speech Recognition

Hi everyone.

I am looking for some much needed help and advice in regards to speech recognition, and I'm hoping you can point me in the right direction.

I'm currently waiting for my EZ robot kit to arrive, so in the meantime I have been familiarising myself with the EZ builder software and have been playing around with the speech recognition and Pandorabots options. EZ builder is currently installed on a Windows 7 64 bit laptop and I'm currently using a headset microphone, not a great quality one I admit, but it does work.

So after training the Windows speech recognition program a few times, and adjusting the microphone settings, I have been trying to use it with MS Word, Notepad, EZ builder with my pandorabot and the pandorabots website on its own, and I have come to the conclusion that Windows speech recognition is, well, "pants" to put it politely (unless I'm missing something). In a quiet room with no background noise, and speaking with a clear English accent, it only seems to pick up 40 to 50% of the correct responses of what I'm saying, compared to the 95% on getting on my iPhone, which is getting really frustrating now.

So would like to ask...

1.) Is there a better way of training Windows speech recognition?

2.) Can anyone suggest a decent well priced microphone to use?

3.) Is there any better speech recognition software I could use which will work with EZ builder?

4.) And finally, what sort of set up do you guys use that works well with your projects that use speech rec?

I really need to nail this, as speech recognition will play a big part of interacting with my robot when he is finished. So any help, thoughts, or suggestions you guys can offer really would be appreciated, and I thank you in advance.


Steve. ;)

Are you using a headset? Otherwise, speech recognition with an open dictionary is unusable in any form or fashion:)

What I mean by "open dictionary" is using the entire English dictionary. Where, if you were to use ARC's Speech Recognition Control, it only detects phrases that you have defined.

An open dictionary has a kabillion words, and therefore increases the chance of false positives by a kabillion times

Use a headset
Use the speech recognition control within ARC to control your project or robot via voice... Mine is now close to getting 100% accuracy... Pandora bot just doesn't work that well. Besides Pandora bot has limited use in context of most robotics projects... For one thing you can't use it to control your robot... As I said, use the speech rec control to utilize speech control in your project...
Thanks for the response guys.

Yes I am using a headset at the moment as I am aware that using one is the best option for voice rec. I have played with numerous microphone settings but it just does not seem to make much difference.

As I mentioned before, it is not the best quality headset in the world but I should be getting better results than what I'm getting now. I just wondered if there is a better way to train the v/rec apart from reading the standard set up phrases in the control panel over and over? For example, one word it won't understand is "joke". It seems to think I'm saying "john" or "job" and I keep correcting it, but after a lot of corrections it still won't understand the joke (Maybe once out of about 30 attempts it will get it).

I knew controlling the EZ-B was possible through pandora after reading what DJ wrote saying about imbedding EZ script in to pandora response code (as DJ linked above). I just wondered if someone could suggest a good quality mic, better v/rec software or a better v/rec training process? Apparently the more Windows v/rec is used the better it learns but i've not seen evidence of this just yet, and it's getting a little frustrating.

Cheers guys.

Didn't know that... thanks,,,, In the speech control settings you can also adjust sensitivity... Try lowering from .94 (or whatever it is at) to maybe .92 or something like that.... This will help improve the % that ARC's speech control understands you.... If I am using Windows 7, I usually turn off or set windows voice rec to sleep... That's so when I am in ARC only ARC is listening to me and not windows as well....
Thanks Richard. I will have a look at that a bit later and give it a try. But (putting ARC aside for a moment) even when I use the s/rec with something else such as MS Word or notepad, the results I get back are still quite poor.

As I mentioned in my first post, using my iPhone a/rec the results are about 95% accurate so when DJ and his crew release the iOS side of things, that is what I will probably end up using, as I have had great success using iOS speech recognition and a chatbot app on another project I did a while back. I can hold the iPhone at arms length and it is still over 90% accurate, and I have REALLY long arms :P

But for now it's Windows s/rec I need to use for for my project. What do you guys use on your projects that use speech recognition which you have had success with?

i used a headset from medion eraser.it has some funny sound also like darth father.

If your results are poor you need to do one of two things;

Train it more
Use a better mic

I've been training Jarvis for years now, once a day, three or four times a week, run the training. I believe the more you use the Windows SAPI the better it gets too, I read somewhere that it is always "training" even when just listening, how true that is I don't know though.

I get 80% positive results when using the built in mic on my webcam 3 meters from where I am. I get 95% positive results on a bluetooth headset.

Here are some more tips

It is true though, Windows SAPI is poor when compared to iOS, google or DNS.
Yeah I'm with you about a headset mic replacement rich. Maybe that is the issue I'm having. Any suggestions?

When you say about training, do you mean going through the "Windows voice training" repeatedly over time, or actually correcting the words using the "Correct (word that is wrong)" command until it gets it?

I have look through the nuance website reading about the Dragon speech recognition software. Has anybody used this and would it work with ARC? It does seem a little bit pricy to me, but it also does seem to get good user reviews and comes complete with a headset.

I've used DNS11 but not with ARC. I believe you could link the two somehow though (although I haven't had chance to look in to that). DNS11 was awesome, 13 is now out (or soon to be out).

@Toymaker uses DNS11 too, I don't know if he has it "plugged in" to ARC or not though.

I don't know if I edited soon enough but I linked to a post I wrote some time ago with some tips in it. Correcting words and training etc. All of that helps. Also, use the correct profile language - in our case English(GB). I know there was an issue with ARC a long time ago with any profile that wasn't English(US) but I think this has been fixed.

As for mics. I have a turtle beach BT headset which is made for the PS3 but with some messing around it works on the PC. It also pairs to 2 devices at once, so you could pair to your phone too and if the phone kicks in it auto switches over which was a cool feature I thought. It works pretty well however there are better.

Tony (@Toymaker) has suggested some mics in the past but they were very expensive, I can't remember what they were now since when I saw the price tag (which to be fair is probably around the same amount I spent on testing out cheaper alternatives) I dismissed it. He will likely chime in if he sees this.

Basically, you get what you pay for. A cheap mic wont be as good as an expensive one. It's most likely going to be a case of checking reviews. Another place to check are the Windows SAPI focussed forums on other sites. Vox Commando's forums probably has some good advice too (it uses Windows SAPI - it's what I use for a lot of Jarvis' functions because it works better than ARC in different areas i.e. can use payload files, can look up from databases from media players such as XBMC)
I get outstanding performance and speech recognition accuracy with the Revolabs xtagwireless microphone www.revolabs.com/products/product-line/xtag-usb Its a professional mic and expensive. I wear it inside my shirt pocket and no one notices that I am even wearing a wireless mic.

With the xtag and DNS11 I get around 98% in dictation mode (saying anything) and almost 100% in fixed grammar mode.

I have today loaded the new DNS13 on my i5 tablet which does not have my DNS voice profile on so its a clean install and without any training I am dictating (saying anything) at almost 99% accuracy, that's pretty impressive!

As Rich said you get what you pay for in Speech recognition.

Unfortunately (as far as I know) you cannot use Dragon with EZ-Robot, but this would be a fabulous addition if DJ could do it. I know its expensive but the ability to say almost anything (without pre-loaded grammar files) opens all sorts of possibilities especially with things like chatbots etc.

Too bad someone cant make a plug in for DNS like Justin did with Face Recognition. That would be a unbelievable addition to ARC.

I know ARC and Windows has much better accuracy with a nice headset likeTony's pocket mic he linked to. However I passionately wanted to be able to walk into a room and just speak to my robot. I have scripted sentences I've written into the voice recognition control that trigger already recorded sound files and commands that trigger other scripts and functions on my B9. I use the Blue Snowball mic with great success. I don't know if others have tried it as each time I bring it up I get no feedback on it. I can set it up anywhere in the room where the robot is, plug it into the laptop through USB and it seems to hear me wherever I am and others can talk to him also. It has a unique three-pattern switch (cardioid, cardioid with -10dB pad and omni) for different listening patterns. Of course if there are a bunch of people talking and making noise I get some false triggers and accuracy drops off the cliff. I can always just place the speech recognition on pause if it gets irritating or If I want to communicate in those conditions I can use Tony's pocket mic.

Blue Snowball Mic

User-inserted image
@ Rich. Thanks for your link. To be fair I think I missed it first time round reading your post
*eyeroll*. That did make for some good reading. Very nice job with that tutorial mate. Very easy to understand and well laid out. Well done. I have not had a chance to have a look at Vox Commando's forum yet, but I will have a look a little later.

@ Toymaker. I do like the look of the xTag but its a bit out of my price range for the moment, but like yourself and Rich said, You get what you pay for, and this does sound like a serious bit of kit. I agree that Dragon is a bit on the pricy side, but from the reviews I have read elsewhere it would be something I would seriously consider getting, IF that is, it would work with EZ Robot hard/software. @ JD. If your reading this buddy, Is having Dragon connect to ARC on one of your possible "to do" lists? It sounds like that it would be a popular, welcome and useful addition to the EZ platform.

@ Dave. I'm completely with you with your statement, " However I passionately wanted to be able to walk into a room and just speak to my robot.". I would love to be able to do the same as that really would be a neat feature, especially in regards (like you mentioned) to have other people chat to K-9 without having to hold a mic. But for now I would just be happy enough for my speech recognition program to understand me (walk before I can run kind of thing). I like the look of the Blue Snowball mic you mentioned. It's a shame that nobody left you any feedback, but from what I have seen elsewhere on the interweb, It does look reasonably priced and seems to have good reviews from other users too.

Steve. ;)
The problem with DNS integration is that their SDK costs a small fortune. They build in support for common Windows apps and the operating system, and I think anywhere you can type, it can fill in (so some integration may be doable, we just need to think it through a bit), but if you want to embed the functionality into your own app, you need their SDK which is very pricey.

Back in Dragon Dictate v8 or so, they used SAPI, so anyone could integrate, but in either 9 or 10 they wen proprietary, reportedly because SAPI couldn't meet their needs and Microsoft was starting to compete with them in the enterprise voice recognition space, so there was some animosity.

@ Rich

I had a good look through your speech recognition tutorial and adjusted a few settings on my system and it has made a bit of a difference. I will definitely be investing in a new mic at some point, but as you say, it's all about the training so I'm doing as you suggested and doing a bit of training for an hour or 2 a few days a week. I just hope it makes a vast improvement to what it is now.

Thanks for your help, and to everyone else for your input aswell.

Steve. ;)