Thumbnail

Voice Activity Detection

by Synthiam

Detect the presence or absence of human speech

Requires ARC v7 (Updated 1/25/2025)

How to add the Voice Activity Detection robot skill

  1. Load the most recent release of ARC (Get ARC).
  2. Press the Project tab from the top menu bar in ARC.
  3. Press Add Robot Skill from the button ribbon bar in ARC.
  4. Choose the Audio category tab.
  5. Press the Voice Activity Detection icon to add the robot skill to your project.

Don't have a robot yet?

Follow the Getting Started Guide to build a robot and use the Voice Activity Detection robot skill.

How to use the Voice Activity Detection robot skill

The Voice Activity Detector robot skill is designed to detect when speech begins and ends in real time. It processes audio input, identifying periods of active speech and silence. This skill is perfect for robots that need to respond dynamically to human speech.

Key Features:

  • Speech Detection:
    • Detects when speech starts (Speech Begin) and stops (Speech End).
  • Customizable Actions:
    • Allows users to attach custom scripts that execute automatically when speech starts or stops. For example, you can trigger robot movements, lights, or other interactions based on speech activity.
  • Real-Time Audio Visualization:
    • Displays a live graph of the detected speech level, giving a visual representation of the audio activity.
  • Adjustable Sensitivity:
    • Includes settings to fine-tune detection parameters, such as silence thresholds, for optimal performance in various environments.

Practical Applications:

  • Interactive Robots:
    • Enable your robot to react to speech dynamically, such as greeting people when they start talking or pausing when they stop.
  • Hands-Free Control:
    • Use speech detection to trigger actions without needing additional input devices.
  • Speech Analysis:
    • Visualize audio levels for debugging or fine-tuning robot behavior in different environments.

This robot skill combines robust speech detection with seamless integration into Synthiam ARC, empowering your robot to understand better and respond to its surroundings!

Main Window

Voice Detected The graph will display green when the voice is detected, and the voice start script will be executed.

User-inserted image

Voice Not Detected The display will be red when a human voice is absent, and the voice end script will execute.

User-inserted image

Pause You can pause the detection of speech by pressing the pause checkbox on the main form. The PAUSE is also enabled when any of the Audio.say() scripting commands are used and unpaused when the speaking has been completed. This ensures the VAD does not detect the robot speaking and triggers a false positive.

Configuration Window

User-inserted image

The configuration window has several options to customize the behavior of this robot skill.

Voice Start Script This is a script that will be executed when the VAD detects speech.

Voice Stop Script This script will be executed when the VAD detects silence after speech.

Why Use This Skill?

  • Improved Speech Detection:
    • The VAD robot skill provides robust, configurable detection of speech, reducing false positives caused by background noise.
  • Resource Efficiency:
    • Running speech recognition or other resource-intensive skills only during active speech saves computational power and prevents unnecessary processing.
  • Enhanced Interaction:
    • Enables your robot to respond naturally to speech, creating a more interactive and intuitive user experience.

This robot skill is a powerful tool for integrating reliable voice activity detection into your projects, providing the flexibility and control needed to create intelligent, speech-responsive robots.


ARC Pro

Upgrade to ARC Pro

ARC Pro is more than a tool; it's a creative playground for robot enthusiasts, where you can turn your wildest ideas into reality.

#1  

This one seems interesting. I wonder if my TV or radio would set it off all the time

#2  

I can see great uses for this, particularly in interactive robots.  When it hears a voice it could turn on face detection and start looking around until it sees a face.  (would be super cool if it could support multiple microphones and compare the levels so it knows what direction to start looking.  might not even need face detection for that, but I don't think Windows even deals well with having multiple microphones on at the same time, so probably beyond the scope of this project...).

Alan

PRO
Synthiam
#3  

You could combine this skill with the Kinect 369 depth skill. It returns a variable with the angle of audio.

PRO
Synthiam
#4  

I should add the reason our client asked for this skill is to have fluent conversational dialog with their robots. Not sure what speech recognition they’re using. But I do know they’re using google dialog flow for nlp

#5  

Hmm.   Too bad your results with the Kinect for navigation haven't been as promising as Realsense.  I don't think I need the ability badly enough for the expense of both (or a Kinect and a Lidar).  I'll keep it in mind though.  I think I recall seeing something on one of hte robot part's sites about a sound direction finder.  If I come across it again I'll see whether it is something that is inexpensive and a skill could be written for.

PRO
Synthiam
#6  

Fixed a bug with an error when closing the skill

#7  

I would love to see this have an added feature sometime that allows it to check output audio instead of input audio. For talking robots,  the ServoTalk skills will estimate how long a jaw should move based on a text string's length and content.  They tend not to be very accurate and often underestimate or overestimate the time it takes for the Text-to-speech app to run.   I thought this skill could help circumvent this limitation by keeping an ear open for when the audio starts and ends.  I also thought of adding some natural neck animation while my robot speaks.  The problem is that I have no way to trigger my 'stay alive' script when the audio begins and stop it when text to speech ends. This skill almost got me there, then I realized it worked only with the mic input.  A check box that switches what it listens too from MIC to LINE OUT would be super cool!

#8  

v7 has been updated with some additional fine-tuning for more accurate detection and completion of speech.

PRO
Australia
#9  

There are no settings for the adjustable sensitivity mentioned in the above key features.