Asked — Edited

New Option For Enhanced Speech Recognition

Google has just opened their speech API that powers Google now to developers, and at least for now it is free:

http://www.speechtechblog.com/2016/03/23/google-opens-access-speech-api

I haven't looked at the API yet to see what would be involved, but I imagine a plugin would not be out of the question.

Alan


ARC Pro

Upgrade to ARC Pro

Elevate your robot's capabilities to the next level with Synthiam ARC Pro, unlocking a world of possibilities in robot programming.

#1  

It's way too far on the tech side of things for me, but if anyone would do the digging I would throw in the body! :D

#2  

I suspect when David Cochran sees this he'll play with it. He is already working with Nuance, which is great, but there is a cost involved. If Google follows their normal pattern, this will be free for a year or two and then 40% less than the competition (which will also drive the competition price down). Amazon is doing similar opening up their Alexa speech engine to developers, although they seem to be targeting hardware manufacturers.

Alan

#3  

Funny... Alan knows me well. I passed this off to one of my programmers to look into earlier this afternoon. If it reduces the cost of ez-ai for people and won't delay it's release, I am about it.

Dictation based speech recognition and actions performed based on Dictation based speech recognition is definitely on its way though, one way or another...

#4  

Yeah, if you hadn't commented I was going to either email or post to your google+ group to make sure you knew about it.

Because of my job I get a lot of speech industry related news and press releases, so I wasn't sure how widely this information was distributed so far.

Alan

PRO
USA
#5  

Just stumbled on the same post. Very interesting. Google certainly has a soft spot for robotics seeing how they have purchased so many companies in the past couple years. I like they are sharing rather than keeping it proprietary. Let us know what you think David.

#6  

I think it is usable. What we will probably do is release EZ-AI with the services we have it setup for, and then have a future update that will allow the user to choose if they use Google or Nuance for speech recognition.

The issue with developing something like EZ-AI is that this technology is changing very quickly right now. It seems that once we get something working great something else comes out that causes us to rethink what is possible. We spend time working with various technologies only to find that their API is horrible which takes even more time. We have tried Watson, Google, Amazon, Microsoft and some others only to find that each was lacking. For example, Watson doesn't capitalize proper nouns. This seems like a small issue but when trying to figure out if someone is talking about a person or place, this becomes an issue. Something so simple that caused us to bail on them. Also, you have to use the training for Watson that IBM provides. This training comes from news paper articles. This doesn't fit well with robotics, so you end up with an 85% accuracy when trying to do things in Robotics. Each service had its own limitations after digging into them. It takes time to then try to find a work around for the limitation which then slows down progress even more.

It is because of this that we have chosen the services that we have. Nuance is simply the best out there but there is a cost. Google may offer an alternative to us but I don't want to delay the release of EZ-AI any longer in order to find that there is an issue with the API. I think the better use of resources is to go with the proven services that we have and then look at offering the user a replacement to some of the paid services if they would like to go that route.

An example of this is the knowledge service that we use (Wolfram|Alpha). This is a paid for service which is simply the best and most reliable source of information available on the internet. The information that is provided has been verified by a professional in the subject matter for each piece of knowledge. This is important as it promises that the information returned is accurate. We could have gone with Wikipedia but, Wikipedia is in the middle of replacing their API and there is no way of knowing if the information from Wikipedia is accurate. There is a lot of information, but there are issues with the information. This might not seem like a big deal to the hobbyist but when EZ-AI is placed in front of a pharmacist, I want to know the information is accurate. We are looking at offering an alternative to paying for this information which would also eliminate the ability to use Wolfram for the information. This would be chosen by the user which would then be a decision made outside of us and remove liability from us.

The core piece of the AI has its own speech recognition engine. This engine is already a part of the service we use. This is also the piece that allows the end user to take what we have developed and customize it for their own uses. This uses API.AI which has been around longer than SIRI. This will allow you to tie into Cortana apps or into Alexa apps. We simply will provide the zip file that contains the AI that we have built and you then take that zip file and load it to your own API.AI developer instance. This requires you to sign up as a developer for API.AI but this also makes your use of API.AI free. You would work with API.AI for any issues regarding your API.AI instance at that point.

If the statement made isn't understood by API.AI, we send this off to Nuance Cloud to convert the recorded audio to text and then we send that text off to Wolfram. If we went with Google for SST conversion, this is where it would fit in. As Alan stated, this would be free for a while instead of using Nuance, but it would be an option that is available only after the release and after an update to EZ-AI if we decided that we could use it.

From there, if we linked in Wikipedia for information, that part could also become free, but less reliable than the paid for option. This too would only be done after EZ-AI was released and after Wikipedia completed their rewrite of their API. I don't want to have to keep revisiting things instead of adding features, so doing this now is a bad decision.

I am giving away a lot of information about how we do things in EZ-AI but I also wanted to try to give some information about EZ-AI and what is being done with it. It is really hard for me to click the Reply button but I also want to give people a certain level of knowledge about what is coming soon. There is a path to have a free EZ-AI, but we don't have it available yet. I don't know when we will have it available at this point. I do know that the paid for version will be completed on May 1. We will be able to publish more information on what is happening after that date.

Thanks David

PRO
USA
#7  

Great information David. I don't think you gave away anything but how you are implementing things and information to the user is power.

PRO
USA
#8  

David I remember you once talking about pricing. Mad you approach beta release in May do you have a better idea of a cost break down including hard wear and service costs on a monthly basis?

#9  

I should clarify something about API.AI...

If you dont want to modify the AI for your own purposes, you can just use ours. There is a cost for this, but it also provides a lot of simplicity in deployment. If you want to modify the AI, this is when you would take our zip file and sign up for your own API.AI developer environment. If you chose to publish your environment for others to use, you would then have to work with API.AI on pricing for your environment.

The more uses we get for any of the services we use, the lower the cost per use becomes. This will help us to continue to improve our AI. By customizing your own, you end up loosing the ability to use any improvements that we develop for ours after the point that you load our zip file into your environment. It would be possible to start over with our AI again after changes have happened, but you would loose your changes.

Anyway, it would be better to request changes be made to the common AI than create your own in my opinion, but that is up to you. The option is available. For a business it might be very logical to take our base and modify it to your business.

#10  

The going rate right now is about $87.69 for the server with the power supply, metal case and 2 year warranty. This doesn't include shipping. We flash the hardware so there is not any additional cost of storage and such. You will need a Cat5/6 cable to plug the hardware into your network. There is a webpage that you will use to configure your server.

I estimate the cost per month to be about $30.00. This depends on how quickly things are accepted and how quickly we can get to the price breaks. As we reach those, we will pass along this savings. This ~$30.00 per month will cover the charges passed on to us from API.AI, Wolfram and Nuance for about 33 requests a day or 1000 requests per month. This is really the lowest level of service that we can offer. From there, we would go up by 1000 requests for an additional ~$30.00.

You can have as many clients running against the server as you want, but it is safe to say that the server can handle 5 concurrent requests at a time. This is what we have tested to this point anyway. This means that 5 agents (robot or pc or smart watch or whatever we develop clients for) can be running and submitting requests at the same time. You could be running 100 agents or more but this level of hardware is tested for 5 concurrent connections. As long as you don't exceed this soft threshold you should still be able to get responses back to requests in a couple of seconds.

Larger installations are definitely possible by increasing the cost of the hardware. For about $400 you would have hardware that would support about 20 concurrent requests. This would be more of an office type environment.

PRO
USA
#11  

Excellent information! Really looking forward to this.

#12  

I was a bit low on the cost of the hardware. The server is $87.69 with a metal case, power supply and 2 year warranty. This is the configuration that we would sell these.

PRO
USA
#13  

Still a good price. And would the customer then own the server? If the customer terminated the service would you them buy back the server? Perhaps a rental on the server could also be an option like modem or cable box rental?!?

#14  

It's all yours to do whatever you want to do with at that point. It would contain a database with your information so I don't want to risk anything with it. It stays in your possession.

#16  

That is some interesting information, seems like the project has a way bigger scale than I imagined...I actually thought of it as a plugin for ARC! Will there also be an option for just getting the hardware and use it without needing a suscription...something like a EZ-AI light for hobbyists?

PRO
Synthiam
#17  

What about using an Azure hosting VM? There are dozens of "plugins" for azure servers, including speech recognition.

#18  

There will be a reduces cost version. It will be about $10 a month and will use Google for voice and Wikipedia for information. It will be a while before this version is available.

Here is the deal and why it costs money... In order for dictated text to be used, you have to incorporate a natural language processor and speech classifier and a relationship extracter. There are some free ones available but it takes quite a bit of programming and a more extensive server to use them. The processing has to happen either on the client or on a server. We started down this path but decided that it would be better to make a great product instead of a weaker product. You could eliminate the $10 dollar charge by setting up your own API.AI service as a developer. This would allow you to then take our AI and use it in your developer environment at API.AI and we wouldn't get charged for your activity.

There is a limited number of requests per month and is public with no way to make the environment private.

Eventually there will be a free version by taking this path, but there is programming that we will have to do to utilize google and Wikipedia, which isn't our focus at this time. You would still need the server @87ish usd and the plug in for ARC, but this is a way to get similar features without a monthly charge. Now, once google starts charging for their API, we would then have to start charging for the use of that.

There is no telling what will be available next week much less by the time we modify EZ-AI to use these services. The technology is changing very quickly right now.

The machine learning parts of EZ-AI are built into the server and there is no charge for those. Those are custom programmed by us and we are not paying for a service for these items. If at some point in the future we go to a service, there could then be a cost associated for a more advanced machine learning mechanism, but our will remain free to use.

I hope this helps. We are simply passing the costs for these advanced services along. When we get corporate customers we will charge a different amount for them to use the product. Right now we are focusing on bringing what the robot community here and elsewhere want their robots to be able to do.

#19  

Yes, we use Azure servers for the authentication and account setup pieces. We have a couple of VM'so setup. We want the installation at the site to be self contained so to speak. There are services that are used, but this is from the local server to thesee services. There is user specific information on these servers and housing this information isn't what I want to be worried about. There could be photos taken for facial recognition and such which should be inside of that person's network and not on a cloud.

This AI is also used in Rafiki, so there are concerns there also due to it having things like paparazzi mode, and facial detection. The pods also use facial recognition...

Keeping everything in a nice neat package is a consideration. We personally found the services provided by nuance to be the best and return proper nouns correctly, along with support for a huge range of languages.