Voice Activity Detection

Name: Voice Activity Detection
Author: DJ Sures

by Synthiam

Detect the presence or absence of human speech

Requires ARC v7 (Updated 1/25/2025)

How to add the Voice Activity Detection robot skill

Load the most recent release of ARC (Get ARC).
Press the Project tab from the top menu bar in ARC.
Press Add Robot Skill from the button ribbon bar in ARC.
Choose the Audio category tab.
Press the Voice Activity Detection icon to add the robot skill to your project.

Don't have a robot yet?

Follow the Getting Started Guide to build a robot and use the Voice Activity Detection robot skill.

How to use the Voice Activity Detection robot skill

The Voice Activity Detector robot skill is designed to detect when speech begins and ends in real time. It processes audio input, identifying periods of active speech and silence. This skill is perfect for robots that need to respond dynamically to human speech.

Key Features:

Speech Detection:
- Detects when speech starts (Speech Begin) and stops (Speech End).
Customizable Actions:
- Allows users to attach custom scripts that execute automatically when speech starts or stops. For example, you can trigger robot movements, lights, or other interactions based on speech activity.
Real-Time Audio Visualization:
- Displays a live graph of the detected speech level, giving a visual representation of the audio activity.
Adjustable Sensitivity:
- Includes settings to fine-tune detection parameters, such as silence thresholds, for optimal performance in various environments.

Practical Applications:

Interactive Robots:
- Enable your robot to react to speech dynamically, such as greeting people when they start talking or pausing when they stop.
Hands-Free Control:
- Use speech detection to trigger actions without needing additional input devices.
Speech Analysis:
- Visualize audio levels for debugging or fine-tuning robot behavior in different environments.

This robot skill combines robust speech detection with seamless integration into Synthiam ARC, empowering your robot to understand better and respond to its surroundings!

Main Window

Voice Detected The graph will display green when the voice is detected, and the voice start script will be executed.

Voice Not Detected The display will be red when a human voice is absent, and the voice end script will execute.

Pause You can pause the detection of speech by pressing the pause checkbox on the main form. The PAUSE is also enabled when any of the Audio.say() scripting commands are used and unpaused when the speaking has been completed. This ensures the VAD does not detect the robot speaking and triggers a false positive.

Configuration Window

The configuration window has several options to customize the behavior of this robot skill.

Voice Start Script This is a script that will be executed when the VAD detects speech.

Voice Stop Script This script will be executed when the VAD detects silence after speech.

Why Use This Skill?

Improved Speech Detection:
- The VAD robot skill provides robust, configurable detection of speech, reducing false positives caused by background noise.
Resource Efficiency:
- Running speech recognition or other resource-intensive skills only during active speech saves computational power and prevents unnecessary processing.
Enhanced Interaction:
- Enables your robot to respond naturally to speech, creating a more interactive and intuitive user experience.

This robot skill is a powerful tool for integrating reliable voice activity detection into your projects, providing the flexibility and control needed to create intelligent, speech-responsive robots.

Similar Skills

Upgrade to ARC Pro

Stay at the forefront of robot programming innovation with ARC Pro, ensuring your robot is always equipped with the latest advancements.

Compare Pro Features View Subscription Plans

Perry_S

USA

#1 Jan 2021

This one seems interesting. I wonder if my TV or radio would set it off all the time

thetechguru

PRO

USA

#2 Jan 2021

I can see great uses for this, particularly in interactive robots. When it hears a voice it could turn on face detection and start looking around until it sees a face. (would be super cool if it could support multiple microphones and compare the levels so it knows what direction to start looking. might not even need face detection for that, but I don't think Windows even deals well with having multiple microphones on at the same time, so probably beyond the scope of this project...).

Alan

DJ Sures

PRO

Synthiam

#3 Jan 2021

You could combine this skill with the Kinect 369 depth skill. It returns a variable with the angle of audio.

DJ Sures

PRO

Synthiam

#4 Jan 2021

I should add the reason our client asked for this skill is to have fluent conversational dialog with their robots. Not sure what speech recognition they’re using. But I do know they’re using google dialog flow for nlp

thetechguru

PRO

USA

#5 Jan 2021

Hmm. Too bad your results with the Kinect for navigation haven't been as promising as Realsense. I don't think I need the ability badly enough for the expense of both (or a Kinect and a Lidar). I'll keep it in mind though. I think I recall seeing something on one of hte robot part's sites about a sound direction finder. If I come across it again I'll see whether it is something that is inexpensive and a skill could be written for.

DJ Sures

PRO

Synthiam

#6 Feb 2021

Fixed a bug with an error when closing the skill

fxmech

USA

#7 Jun 2021

I would love to see this have an added feature sometime that allows it to check output audio instead of input audio. For talking robots, the ServoTalk skills will estimate how long a jaw should move based on a text string's length and content. They tend not to be very accurate and often underestimate or overestimate the time it takes for the Text-to-speech app to run. I thought this skill could help circumvent this limitation by keeping an ear open for when the audio starts and ends. I also thought of adding some natural neck animation while my robot speaks. The problem is that I have no way to trigger my 'stay alive' script when the audio begins and stop it when text to speech ends. This skill almost got me there, then I realized it worked only with the mic input. A check box that switches what it listens too from MIC to LINE OUT would be super cool!

Synthiam Support

Canada

#8 Jan 3

v7 has been updated with some additional fine-tuning for more accurate detection and completion of speech.

Voice Activity Detection

How to add the Voice Activity Detection robot skill

Don't have a robot yet?

How to use the Voice Activity Detection robot skill

Key Features:

Practical Applications:

Main Window

Configuration Window

Why Use This Skill?

Similar Skills

Advanced Speech Synthesis

Audiotoolbox Plugin

Advanced Speech Recognition

Related Questions

How Can I Use Python Script To Return Values From A Mic Array?

Silence Detection For Bing Speech Recognition Skill

I'd Like To Pause The VAD Robot Skill

Upgrade to ARC Pro

Products

Community

Support

About