United Kingdom
Asked — Edited

Sound Recognition

While I have been setting up some new speech recognition commands, it got me thinking. Has anyone successfully integrated sound recognition with ARC? I found a sound recognition API while doing a Google see how on the subject, and as I don't have much experience with API's (non in fact) I was curious if anyone is using something like this with your projects. I figure this could be something useful to have, recognising a door bell, a phone ringing, dog barking, music recognition ect, the possibilities are many.

Another thing that could be interesting is voice (not speech) recognition, where a robot could recognise different users by their individual voices.

Any thoughts guys?


Upgrade to ARC Pro

Your robot can be more than a simple automated machine with the power of ARC Pro!


Google api allows for very few statements. I had looked at using this with ez-ai but the limited number of statements allowed made it not possible.

The best one out is dragon. If you purchase the professional version, you can include the dll for speech recognition in an application for your computer. It wasn't feeble to ask people to spend that kind of money to use my app. The other option is the developer sdk. It is $5000.00.

As far as recognizing sounds, you would need to do some analysis on the recorded sound and then make a decision as to what a that sound is. You are talking about learning and making a lot of small decisions based on past knowledge that add up to a final decision. This is machine learning, which can be done and is something that I have been looking at with ez-ai. I am far from getting there though.

There are api's out that will recognize a song. The information above is how they do this. Basically, this is a pretty advanced computer programming topic. As I said, I am researching deep learning but am a long way away from understanding how to do anything with it.

United Kingdom


Thanks for the reply. Very interesting. Funnily enough when I was reading about this I thought of your EZ-AI. I don't know if you mis-understood about speech, but I wasn't talking about speech recognition, but rather voice recognition where patten matching is used to determine different peoples voices. If indeed that was what you meant, then I didn't know Dragon did that.

It sounds like sound recognition would be something difficault to implement, but certainly possible though.


Any engine like this basically works the same and requires extensive training in order to make decisions. To understand a dog bark as a dog bark, you would have to have many many recordings of a dog barks to train the computer. Door bells and many other recordings would be needed. Each sound would take thousands of recorded sounds to train the computer to understand what these sounds are.

What is the api that you found? I can look at this, but it would require your bot to be constantly recording it's audio either to memory or disk. When the sound was detected, it would do something like get the last few seconds of the recording and send it to the api to have that sound recognized.


Steve linked to a Java API in his first post. Looks pretty simple for recognizing a couple of well defined sounds, but if course it would be easier to integrate if it was a dotnet API. Interestingly, when trying to search for other sound recognition APIs, Google search thinks it is smarter than me and keeps returning speech recognition APIs instead.

Only doing simple searches from my phone though. I am traveling right now, so can't really sit down and dig for other options.



Speaker recognition (also known as voice biometrics) is a whole different ball of wax. Although, Google just implemented it in Android, where it learns your voice from saying 'OK Google' and will unlock your phone, although they warn it is not highly secure.

I doubt there are any free voice biometrics APIs yet since the companies that are selling services built on it are making boatloads of money, but maybe Google will release what they did as an open API.



Some of the api's I use are java based. .net can use these. Actually, it's easier for .net to tie to java than C++.