Upgrade to ARC Pro

Unleash your creativity with the power of easy robot programming using Synthiam ARC Pro


Voice Activity Detection

Detect the presence or absence of human speech

+ How To Add This Control To Your Project (Click to Expand)
  1. Make sure you have the latest version of ARC installed.
  2. Select the Get button in this page to download the archive file.
  3. Double click the downloaded archive file to execute installer.
  4. The installer will add this control to ARC.
  5. Load ARC and press the Project -> Add Control button from the menu.
  6. Choose the Audio category tab.
  7. Press the Voice Activity Detection icon to add the control to your project.


Voice Activity Detection (VAD) detects the presence or absence of human speech. This uses an advanced algorithm to detect human voice in the microphone input of the PC's default audio device. When a human voice is detected or lost, a respective script will run.

Voice Detected
When voice is detected, the graph will display in green and the voice start script will execute.
User-inserted image

Voice Not Detected
The display will be red when there is an absence of a human voice and the voice end script will execute.
User-inserted image


Upgrade to ARC Pro

Become a Synthiam ARC Pro subscriber to unleash the power of easy and powerful robot programming

This one seems interesting. I wonder if my TV or radio would set it off all the time
I can see great uses for this, particularly in interactive robots.  When it hears a voice it could turn on face detection and start looking around until it sees a face.  (would be super cool if it could support multiple microphones and compare the levels so it knows what direction to start looking.  might not even need face detection for that, but I don't think Windows even deals well with having multiple microphones on at the same time, so probably beyond the scope of this project...).

You could combine this skill with the Kinect 369 depth skill. It returns a variable with the angle of audio.
I should add the reason our client asked for this skill is to have fluent conversational dialog with their robots. Not sure what speech recognition they’re using. But I do know they’re using google dialog flow for nlp
Hmm.   Too bad your results with the Kinect for navigation haven't been as promising as Realsense.  I don't think I need the ability badly enough for the expense of both (or a Kinect and a Lidar).  I'll keep it in mind though.  I think I recall seeing something on one of hte robot part's sites about a sound direction finder.  If I come across it again I'll see whether it is something that is inexpensive and a skill could be written for.
Fixed a bug with an error when closing the skill
I would love to see this have an added feature sometime that allows it to check output audio instead of input audio.
For talking robots,  the ServoTalk skills will estimate how long a jaw should move based on a text string's length and content.  They tend not to be very accurate and often underestimate or overestimate the time it takes for the Text-to-speech app to run.   I thought this skill could help circumvent this limitation by keeping an ear open for when the audio starts and ends.  I also thought of adding some natural neck animation while my robot speaks. 
The problem is that I have no way to trigger my 'stay alive' script when the audio begins and stop it when text to speech ends.
This skill almost got me there, then I realized it worked only with the mic input.  A check box that switches what it listens too from MIC to LINE OUT would be super cool!