Asked
— Edited
Google has just opened their speech API that powers Google now to developers, and at least for now it is free:
http://www.speechtechblog.com/2016/03/23/google-opens-access-speech-api
I haven't looked at the API yet to see what would be involved, but I imagine a plugin would not be out of the question.
Alan
Alan
Dictation based speech recognition and actions performed based on Dictation based speech recognition is definitely on its way though, one way or another...
Because of my job I get a lot of speech industry related news and press releases, so I wasn't sure how widely this information was distributed so far.
Alan
The issue with developing something like EZ-AI is that this technology is changing very quickly right now. It seems that once we get something working great something else comes out that causes us to rethink what is possible. We spend time working with various technologies only to find that their API is horrible which takes even more time. We have tried Watson, Google, Amazon, Microsoft and some others only to find that each was lacking. For example, Watson doesn't capitalize proper nouns. This seems like a small issue but when trying to figure out if someone is talking about a person or place, this becomes an issue. Something so simple that caused us to bail on them. Also, you have to use the training for Watson that IBM provides. This training comes from news paper articles. This doesn't fit well with robotics, so you end up with an 85% accuracy when trying to do things in Robotics. Each service had its own limitations after digging into them. It takes time to then try to find a work around for the limitation which then slows down progress even more.
It is because of this that we have chosen the services that we have. Nuance is simply the best out there but there is a cost. Google may offer an alternative to us but I don't want to delay the release of EZ-AI any longer in order to find that there is an issue with the API. I think the better use of resources is to go with the proven services that we have and then look at offering the user a replacement to some of the paid services if they would like to go that route.
An example of this is the knowledge service that we use (Wolfram|Alpha). This is a paid for service which is simply the best and most reliable source of information available on the internet. The information that is provided has been verified by a professional in the subject matter for each piece of knowledge. This is important as it promises that the information returned is accurate. We could have gone with Wikipedia but, Wikipedia is in the middle of replacing their API and there is no way of knowing if the information from Wikipedia is accurate. There is a lot of information, but there are issues with the information. This might not seem like a big deal to the hobbyist but when EZ-AI is placed in front of a pharmacist, I want to know the information is accurate. We are looking at offering an alternative to paying for this information which would also eliminate the ability to use Wolfram for the information. This would be chosen by the user which would then be a decision made outside of us and remove liability from us.
The core piece of the AI has its own speech recognition engine. This engine is already a part of the service we use. This is also the piece that allows the end user to take what we have developed and customize it for their own uses. This uses API.AI which has been around longer than SIRI. This will allow you to tie into Cortana apps or into Alexa apps. We simply will provide the zip file that contains the AI that we have built and you then take that zip file and load it to your own API.AI developer instance. This requires you to sign up as a developer for API.AI but this also makes your use of API.AI free. You would work with API.AI for any issues regarding your API.AI instance at that point.
If the statement made isn't understood by API.AI, we send this off to Nuance Cloud to convert the recorded audio to text and then we send that text off to Wolfram. If we went with Google for SST conversion, this is where it would fit in. As Alan stated, this would be free for a while instead of using Nuance, but it would be an option that is available only after the release and after an update to EZ-AI if we decided that we could use it.
From there, if we linked in Wikipedia for information, that part could also become free, but less reliable than the paid for option. This too would only be done after EZ-AI was released and after Wikipedia completed their rewrite of their API. I don't want to have to keep revisiting things instead of adding features, so doing this now is a bad decision.
I am giving away a lot of information about how we do things in EZ-AI but I also wanted to try to give some information about EZ-AI and what is being done with it. It is really hard for me to click the Reply button but I also want to give people a certain level of knowledge about what is coming soon. There is a path to have a free EZ-AI, but we don't have it available yet. I don't know when we will have it available at this point. I do know that the paid for version will be completed on May 1. We will be able to publish more information on what is happening after that date.
Thanks
David
If you dont want to modify the AI for your own purposes, you can just use ours. There is a cost for this, but it also provides a lot of simplicity in deployment. If you want to modify the AI, this is when you would take our zip file and sign up for your own API.AI developer environment. If you chose to publish your environment for others to use, you would then have to work with API.AI on pricing for your environment.
The more uses we get for any of the services we use, the lower the cost per use becomes. This will help us to continue to improve our AI. By customizing your own, you end up loosing the ability to use any improvements that we develop for ours after the point that you load our zip file into your environment. It would be possible to start over with our AI again after changes have happened, but you would loose your changes.
Anyway, it would be better to request changes be made to the common AI than create your own in my opinion, but that is up to you. The option is available. For a business it might be very logical to take our base and modify it to your business.
I estimate the cost per month to be about $30.00. This depends on how quickly things are accepted and how quickly we can get to the price breaks. As we reach those, we will pass along this savings. This ~$30.00 per month will cover the charges passed on to us from API.AI, Wolfram and Nuance for about 33 requests a day or 1000 requests per month. This is really the lowest level of service that we can offer. From there, we would go up by 1000 requests for an additional ~$30.00.
You can have as many clients running against the server as you want, but it is safe to say that the server can handle 5 concurrent requests at a time. This is what we have tested to this point anyway. This means that 5 agents (robot or pc or smart watch or whatever we develop clients for) can be running and submitting requests at the same time. You could be running 100 agents or more but this level of hardware is tested for 5 concurrent connections. As long as you don't exceed this soft threshold you should still be able to get responses back to requests in a couple of seconds.
Larger installations are definitely possible by increasing the cost of the hardware. For about $400 you would have hardware that would support about 20 concurrent requests. This would be more of an office type environment.
Will there also be an option for just getting the hardware and use it without needing a suscription...something like a EZ-AI light for hobbyists?
Here is the deal and why it costs money...
In order for dictated text to be used, you have to incorporate a natural language processor and speech classifier and a relationship extracter. There are some free ones available but it takes quite a bit of programming and a more extensive server to use them. The processing has to happen either on the client or on a server. We started down this path but decided that it would be better to make a great product instead of a weaker product. You could eliminate the $10 dollar charge by setting up your own API.AI service as a developer. This would allow you to then take our AI and use it in your developer environment at API.AI and we wouldn't get charged for your activity.
There is a limited number of requests per month and is public with no way to make the environment private.
Eventually there will be a free version by taking this path, but there is programming that we will have to do to utilize google and Wikipedia, which isn't our focus at this time. You would still need the server @87ish usd and the plug in for ARC, but this is a way to get similar features without a monthly charge. Now, once google starts charging for their API, we would then have to start charging for the use of that.
There is no telling what will be available next week much less by the time we modify EZ-AI to use these services. The technology is changing very quickly right now.
The machine learning parts of EZ-AI are built into the server and there is no charge for those. Those are custom programmed by us and we are not paying for a service for these items. If at some point in the future we go to a service, there could then be a cost associated for a more advanced machine learning mechanism, but our will remain free to use.
I hope this helps. We are simply passing the costs for these advanced services along. When we get corporate customers we will charge a different amount for them to use the product. Right now we are focusing on bringing what the robot community here and elsewhere want their robots to be able to do.
This AI is also used in Rafiki, so there are concerns there also due to it having things like paparazzi mode, and facial detection. The pods also use facial recognition...
Keeping everything in a nice neat package is a consideration. We personally found the services provided by nuance to be the best and return proper nouns correctly, along with support for a huge range of languages.