Asked — Edited

Why Not Use Google Api?

It was suggested on another forum that we tie in to Google API for a Natural detection. The voice rec that we currently have works well for me. But, I understand that the Google app works without any training or personal dictionary. Just download it, tie it in and BANG! use it without any problems.

Just a thought.

:)


ARC Pro

Upgrade to ARC Pro

Join the ARC Pro community and gain access to a wealth of resources and support, ensuring your robot's success.

#1  

Do you have any more info on this? Sounds interesting. Where can it be downloaded and how did you tie it into ARC? confused

#2  

Hey Dave,

I haven't tied it to EZB. It is just a suggestion for D.J.  It would give us a way of speaking naturally to our robot without all of the training and hassle.  It would use the Google Technologies instead of ours. I will try to find you the link.

Mel

#3  

here is the post from the forum that the person wrote: (This is intended for Linux, but I am Sure that google has a windows version of this. Or, it may be cross-platformed.)

I want to test using the Google speech API for speech recognition, instead of Julius. I think it might give better results and remove the need for training the robot to your voice etc. Of course it requires online internet access for operation, but that should not be much of an issue. I found some information about using the API:

https://gist.github.com/alotaiba/1730160

http://achuwilson.wordpress.com/2012/06 ... ros-linux/

There even already is a ROS package using the API:

http://wiki.ros.org/gspeech

This package still uses API v1, but it should be possible to use v2 already:

https://github.com/gillesdemey/google-speech-v2

So I will just update the gspeech package and integrate it into the QBO stack.

Do you have any thoughts on the topic?

#4  

I found this article showing how to use Google Speech Recognition in a windows application. Apparently, you use standard windows functions and available libraries to record a file in a format that Google can understand "flac in 16 kHz and 16 bit per sample with 1(mono) channel format" and then send it to Google through a web service. They will send a text response of the recognized speech as JSON.

http://www.codeproject.com/Articles/338010/Fun-with-Google-Speech-Recognition-service

Alan

#5  

It would be great to compare Windows VR with Google Speech Recognition for accuracy and implementation!

United Kingdom
#6  

Google's API is slow compared to Microsoft's SAPI. It also relies on the internet connection from what I've (quickly) read.

Sure, MS SAPI isn't great, it takes a lot of training to be accurate however 90% of the time the problems are due to the microphone not the training. I have no issues with MS SAPI when I use a good microphone. I have few issues after years of training with the built in microphone in my webcam that is 2-3 meters from where I am usually sat.

Like everything, there are pros and cons.

Personally I would use Dragon Naturally Speaking over Google if changing the Speech Recognition API was an option. It may be more expensive but I couldn't fault it.

#7  

OK, so SLOW is the reason that we are not using Google.

#8  

"OK, so SLOW is the reason that we are not using Google."

Well, I think the fact that Microsoft SAPI is built into Windows 7 and above, and was a free download for XP is probably why we are using SAPI more than the slow response from Google. It is too bad that Dragon changed from being SAPI compliant to their own API's. Earlier versions would just drop in and replace Microsoft Speech Recognition with no work at all for application developers. That way, those who wanted to spend the extra money for more accurate recognition could do so, and those who were happy with M$ could use it and the developer didn't need to worry about it.

Alan

United Kingdom
#9  

I don't think the speed is the reason we don't use Google API. It's likely because Google's API is pretty new for a start.

While I said it is slow it may not be too slow depending on the application. But for conversations it would be like talking to someone over a satellite feed with a 1-2 second delay. Personally I wouldn't be happy with that.

You could always do a work around and use whatever API you like, which then sends the info to ARC via one of the various methods. A middle man if you will.

#10  

What I meant was, the robot having heard what you say could do a voice search if it does not know what you are talking about and use google voice search to get the answer. Even an EZ-Pedia.