thetechguru

USA

Asked Mar 2016 — Edited Mar 2016

New Option For Enhanced Speech Recognition

Skip to comments Jump to end

Google has just opened their speech API that powers Google now to developers, and at least for now it is free:

http://www.speechtechblog.com/2016/03/23/google-opens-access-speech-api

I haven't looked at the API yet to see what would be involved, but I imagine a plugin would not be out of the question.

Alan

Jump to end

Upgrade to ARC Pro

Get access to the latest features and updates before they're released. You'll have everything that's needed to unleash your robot's potential!

Compare Pro Features View Subscription Plans

Mickey666Maus

Germany

#1 Mar 2016

It's way too far on the tech side of things for me, but if anyone would do the digging I would throw in the body!

thetechguru

PRO

USA

#2 Mar 2016

I suspect when David Cochran sees this he'll play with it. He is already working with Nuance, which is great, but there is a cost involved. If Google follows their normal pattern, this will be free for a year or two and then 40% less than the competition (which will also drive the competition price down). Amazon is doing similar opening up their Alexa speech engine to developers, although they seem to be targeting hardware manufacturers.

Alan

CochranRobotics

USA

#3 Mar 2016

Funny... Alan knows me well. I passed this off to one of my programmers to look into earlier this afternoon. If it reduces the cost of ez-ai for people and won't delay it's release, I am about it.

Dictation based speech recognition and actions performed based on Dictation based speech recognition is definitely on its way though, one way or another...

thetechguru

PRO

USA

#4 Mar 2016

Yeah, if you hadn't commented I was going to either email or post to your google+ group to make sure you knew about it.

Because of my job I get a lot of speech industry related news and press releases, so I wasn't sure how widely this information was distributed so far.

Alan

fxrtst

PRO

USA

#5 Mar 2016

Just stumbled on the same post. Very interesting. Google certainly has a soft spot for robotics seeing how they have purchased so many companies in the past couple years. I like they are sharing rather than keeping it proprietary. Let us know what you think David.

CochranRobotics

USA

#6 Mar 2016

I think it is usable. What we will probably do is release EZ-AI with the services we have it setup for, and then have a future update that will allow the user to choose if they use Google or Nuance for speech recognition.

The issue with developing something like EZ-AI is that this technology is changing very quickly right now. It seems that once we get something working great something else comes out that causes us to rethink what is possible. We spend time working with various technologies only to find that their API is horrible which takes even more time. We have tried Watson, Google, Amazon, Microsoft and some others only to find that each was lacking. For example, Watson doesn't capitalize proper nouns. This seems like a small issue but when trying to figure out if someone is talking about a person or place, this becomes an issue. Something so simple that caused us to bail on them. Also, you have to use the training for Watson that IBM provides. This training comes from news paper articles. This doesn't fit well with robotics, so you end up with an 85% accuracy when trying to do things in Robotics. Each service had its own limitations after digging into them. It takes time to then try to find a work around for the limitation which then slows down progress even more.

It is because of this that we have chosen the services that we have. Nuance is simply the best out there but there is a cost. Google may offer an alternative to us but I don't want to delay the release of EZ-AI any longer in order to find that there is an issue with the API. I think the better use of resources is to go with the proven services that we have and then look at offering the user a replacement to some of the paid services if they would like to go that route.

An example of this is the knowledge service that we use (Wolfram|Alpha). This is a paid for service which is simply the best and most reliable source of information available on the internet. The information that is provided has been verified by a professional in the subject matter for each piece of knowledge. This is important as it promises that the information returned is accurate. We could have gone with Wikipedia but, Wikipedia is in the middle of replacing their API and there is no way of knowing if the information from Wikipedia is accurate. There is a lot of information, but there are issues with the information. This might not seem like a big deal to the hobbyist but when EZ-AI is placed in front of a pharmacist, I want to know the information is accurate. We are looking at offering an alternative to paying for this information which would also eliminate the ability to use Wolfram for the information. This would be chosen by the user which would then be a decision made outside of us and remove liability from us.

The core piece of the AI has its own speech recognition engine. This engine is already a part of the service we use. This is also the piece that allows the end user to take what we have developed and customize it for their own uses. This uses API.AI which has been around longer than SIRI. This will allow you to tie into Cortana apps or into Alexa apps. We simply will provide the zip file that contains the AI that we have built and you then take that zip file and load it to your own API.AI developer instance. This requires you to sign up as a developer for API.AI but this also makes your use of API.AI free. You would work with API.AI for any issues regarding your API.AI instance at that point.

If the statement made isn't understood by API.AI, we send this off to Nuance Cloud to convert the recorded audio to text and then we send that text off to Wolfram. If we went with Google for SST conversion, this is where it would fit in. As Alan stated, this would be free for a while instead of using Nuance, but it would be an option that is available only after the release and after an update to EZ-AI if we decided that we could use it.

From there, if we linked in Wikipedia for information, that part could also become free, but less reliable than the paid for option. This too would only be done after EZ-AI was released and after Wikipedia completed their rewrite of their API. I don't want to have to keep revisiting things instead of adding features, so doing this now is a bad decision.

I am giving away a lot of information about how we do things in EZ-AI but I also wanted to try to give some information about EZ-AI and what is being done with it. It is really hard for me to click the Reply button but I also want to give people a certain level of knowledge about what is coming soon. There is a path to have a free EZ-AI, but we don't have it available yet. I don't know when we will have it available at this point. I do know that the paid for version will be completed on May 1. We will be able to publish more information on what is happening after that date.

Thanks David

fxrtst

PRO

USA

#7 Mar 2016

Great information David. I don't think you gave away anything but how you are implementing things and information to the user is power.

fxrtst

PRO

USA

#8 Mar 2016

David I remember you once talking about pricing. Mad you approach beta release in May do you have a better idea of a cost break down including hard wear and service costs on a monthly basis?

thetechguru

New Option For Enhanced Speech Recognition

Upgrade to ARC Pro

Products

Community

Support

About