Asked — Edited
Resolved Resolved by Rich!

Convert Speech To Text (Not Text To Speech), Possible?

I am just throwing around some new ideas I have... One of them is I am trying to figure out if there is any way to use WriteFile and ReadFile in Ez Builder to create a database for my robot to learn new phrases and commands on his own... So instead of just using a list of typical canned responses to questions and phrases already programmed, I would like my robot to store new commands and phrases he has never heard before... Then store an answer or response to mirror the question or phrase just spoken... This is so that next time he hears the phrase or command he will now know how to respond accordingly... i.e. learning as he goes... If this is not possible yet in ARC... Maybe a new feature possibility?

Any comments or suggestions welcomed...

Cheers Richard


Upgrade to ARC Pro

Get access to the latest features and updates before they're released. You'll have everything that's needed to unleash your robot's potential!

United Kingdom

Yes but not with ARC.

I've been toying with posting a tutorial for some other software which will do this so I'll vet on to that later.

There are problems though, accuracy is very low at best.

I'll post more when I'm not on my phone.


Sweet, that would be awesome Rich.... I am wondering if this would ever be possible to add this to ARC as a feature at some point? If DJ could some how accomplish this, it would be a seriously brilliant...

Cheers and thanks again Rich

United Kingdom

I don't think it's difficult to get in to ARC but the results are very poor which is likely to be why it isn't part of ARC.

I tend to use payload lists or set phrases despite having the ability to freely speak with Jarvis, the accuracy is so much higher. For instance, I have one command programmed for shopping lists for groceries, I can speak an item i.e. Bacon, that he knows and accuracy is 98%+ or I can add an item but accuracy is down the 75-80% mark at best. The payload list takes priority.

I should have mentioned, the software I use isn't free. It's affordable and cheaper than DNS (which is another thing that works well but costly) but you need to shell out a little bit - free trial available to test it out anyway so no big loss:)

Details coming when I get home and have a chance to explain it all.


It should be possible for DJ interface to Dragon Naturally Speaking, which is the market leader in speech recognition. They do have API's for 3rd party interface. It used to be easier because they would tie right into the Micrsoft SAPI, but that as when speech reco didn't come standard in Windows.


United Kingdom

Rich is right.

There are 2 modes in modern day speech recognition and they are "grammar mode" and "dictation mode". ARC uses grammar mode which basically compares the incoming phrase with pre-programmed phrases from your script etc. To do what you are suggesting you need to use dictation mode, where you can say any phrase and the SR engine tries to workout what that is.

Grammar mode is very accurate as it only has to compare phrases, dictation is not and depends on how good the SR engine is and how well the system knows the users phonetic profile.

ARC uses the internal Windows SR engine, which works great for grammar mode, but I (personally) have never had much success in dictation mode even after a lot of training. I have done what you are suggesting with our Ai core (ARIEL), but I had to use Dragon (DNS11) engine as this works great for me in dictation mode.



Pandorabot kind of works as a reference, but mostly I get "low confidence" phrases being spit out... thanks guys for the ideas...

United Kingdom

Yes, Pandorabot (I assume) uses dictation mode rather than grammar mode. If you aren't satisfied with the results from that then really it's going to be a case of training and upgrading hardware (i.e. mics). No amount of programming will alter that and you would also find the method I mentioned earlier would give very poor results too.

I've been training the same voice profile for 4 years now. It is constantly learning every time I speak a command. It has only just, over the last 6 months or so, started to give 90%+ positive results. Prior to this it was 80-90%, I have a required confidence level of 94% (otherwise the TV or even Jarvis himself will be picked up and end up in a never ending loop).

DNS works much better however I didn't have time to change over to it when I was trying it out last year. It's something I may move over to but if I do I will need to rework 4 years of work done with my current set up.


It's something to consider now... I'll have to rethink what I want to do then.