Voice Activity Detection

by Synthiam

Detect the presence or absence of human speech

Requires ARC v6 (Updated 2/1/2021)

How to add the Voice Activity Detection robot skill

  1. Load the most recent release of ARC (Get ARC).
  2. Press the Project tab from the top menu bar in ARC.
  3. Press Add Robot Skill from the button ribbon bar in ARC.
  4. Choose the Audio category tab.
  5. Press the Voice Activity Detection icon to add the robot skill to your project.

Don't have a robot yet?

Follow the Getting Started Guide to build a robot and use the Voice Activity Detection robot skill.

How to use the Voice Activity Detection robot skill

Voice Activity Detection (VAD) detects the presence or absence of human speech. This uses an advanced algorithm to detect human voice in the microphone input of the PC's default audio device. When a human voice is detected or lost, a respective script will run.

Voice Detected When voice is detected, the graph will display in green and the voice start script will execute.

User-inserted image

Voice Not Detected The display will be red when there is an absence of a human voice and the voice end script will execute.

User-inserted image


Upgrade to ARC Pro

Subscribe to ARC Pro, and your robot will become a canvas for your imagination, limited only by your creativity.


This one seems interesting. I wonder if my TV or radio would set it off all the time


I can see great uses for this, particularly in interactive robots.  When it hears a voice it could turn on face detection and start looking around until it sees a face.  (would be super cool if it could support multiple microphones and compare the levels so it knows what direction to start looking.  might not even need face detection for that, but I don't think Windows even deals well with having multiple microphones on at the same time, so probably beyond the scope of this project...).



You could combine this skill with the Kinect 369 depth skill. It returns a variable with the angle of audio.


I should add the reason our client asked for this skill is to have fluent conversational dialog with their robots. Not sure what speech recognition they’re using. But I do know they’re using google dialog flow for nlp


Hmm.   Too bad your results with the Kinect for navigation haven't been as promising as Realsense.  I don't think I need the ability badly enough for the expense of both (or a Kinect and a Lidar).  I'll keep it in mind though.  I think I recall seeing something on one of hte robot part's sites about a sound direction finder.  If I come across it again I'll see whether it is something that is inexpensive and a skill could be written for.


Fixed a bug with an error when closing the skill


I would love to see this have an added feature sometime that allows it to check output audio instead of input audio. For talking robots,  the ServoTalk skills will estimate how long a jaw should move based on a text string's length and content.  They tend not to be very accurate and often underestimate or overestimate the time it takes for the Text-to-speech app to run.   I thought this skill could help circumvent this limitation by keeping an ear open for when the audio starts and ends.  I also thought of adding some natural neck animation while my robot speaks.  The problem is that I have no way to trigger my 'stay alive' script when the audio begins and stop it when text to speech ends. This skill almost got me there, then I realized it worked only with the mic input.  A check box that switches what it listens too from MIC to LINE OUT would be super cool!