9. Audio

Let's plan how your robot will speak and hear!

Robots with audio capabilities provide a very interactive experience. The ARC software includes multiple skills that connect to WiFi, Bluetooth, or USB audio input and output devices. ARC has speech recognition, text-to-speech synthesis, and robot skills for playing music and sound effects!


Choose an Audio Input Device


Wired Microphone

Connects directly to a computer with a USB cable or through an existing soundcard's input port. It is only used in an embedded robot configuration.

Wireless Microphone

Connects wirelessly to a computer with an RF (radio frequency) or Bluetooth connection




Choose an Audio Output Device Type


Wired Speaker

Connects directly to a computer with a USB or Audio cable. It is only used in an embedded robot configuration. You can select whether the speaker is connected via an audio cable to an existing sound card or connects with a USB.

Wireless Speaker

Connects wirelessly to a computer over Bluetooth or WiFi.

EZB Speaker

Use an EZB that supports audio output (i.e., EZ-Robot IoTiny or EZ-B v4). Note that using this option, the EZB on index #0 is preferred. This is also necessary if you wish to use the Audio Effects robot skill found here.


Add Generic Speech Recognition

Once you have selected the type of microphone to use, the next step is to experiment with speech recognition. Many speech recognition robot skills range from Google, Microsoft, and IBM Watson. However, the most popular robot skill to get started is the generic speech recognition robot skill. This uses the Microsoft Windows built-in speech recognition system. Therefore, it's easy to configure, and you can get it up and running with little effort.

Using the Microsoft Windows Speech Recognition Engine, this skill uses your computer's default audio input device and listens for known phrases. Phrases are manually configured in the Settings menu, and custom actions (via script) are assigned to your phrases.

  Get Speech Recognition


Audio Robot Skills

Now that you have selected the audio device your robot will use, many robot skills should be considered. They range from speech recognition to audio players. You can have as many audio robot skills as your robot needs to achieve the goals.

This is an alternative to the Bing Speech Recognition module for ARC. This skill is a paid service from Microsoft.


Allows ARC to use UWP speech voices, change default audio devices, capture sound and route to an EZB.


This speech recognition skill for ARC uses the Bing Speech Recognition cloud service. It is one of the most accurate speech recognition services available. Two Version Of This Skill There are two...


Use the Microsoft Bing Text to Speech cloud service. Allows to add break, change speaking rate, volume and pitch.


Have a verbal conversation with your robot using your voice by navigating through menu options so your robot can perform tasks.


Have servos respond to audio frequencies detected by the PC Microphone


Use the online Google Speech Recognition Service (test beta)


This skill will record audio from your default audio input device and allow you to play it back through the EZ-B v4 speaker. The Settings menu for this skill will enable you to specify some effects...


Connect a MIDI musical instrument or your soundcard to your robot.


The MP3 Trigger is a shield that connects to the EZ-B via a serial port. The MP3 Trigger takes a mini SD card with MP3s loaded on. The mp3s can be triggered from this control. This control and...


Example with source code of how to play audio out of the EZ-B when making a plugin in C#


The Sound movement skill is for embedded devices on your robot that have two integrated microphones for the left and right channels. This skill will allow your robot to respond to which side the sound...


Execute a script when sound is played out of the EZB speaker


Use your EZ-B's audio output to control servos! You can control many servos with this skill by using the Multi Servo option in the settings or adding multiple instances of the robot skill. Specify the...


Use your PC's audio input device (microphone) to control servos! You can control many servos with this skill by using the Multi Servo option in the settings or adding multiple instances of the skill....


Use your PC's audio output (speakers) to control servos! You can control many servos with this skill by using the Multi Servo option in the settings or adding multiple instances of the skill. Specify...


This soundboard will play audio files through the EZ-B v4 speaker. You may load WAV or MP3 files into the library to have multiple audio files. Use the ControlCommand() scripts to trigger specific...


This Soundboard will play MP3 or WAV files out of the selected default sound output device on your computer. Load files into the tracklist and use the Play button to trigger them. This is a great...


This Soundboard will play MP3 or WAV files out of the default sound output device on your PC. Scripts can be added to the timeline of each audio file for automation.


Using the Microsoft Windows Speech Recognition Engine, this skill uses your computer's default audio input device and listens for known phrases. Phrases are manually configured in the Settings menu,...


Execute a script when speech is created.


This text-to-speech skill will verbally speak the user-defined phrase from your PC's default audio output device or EZ-B. However, an alternate way for your robot to speak programmatically from...


Use this robot skill to adjust the speech settings and audio effects for spoken speech synthesis on EZB index #0. Main Window 1. Voice Drop-down This drop-down contains a selection of installed...


Bind a servo to spoken audio to move a mouth similating speaking


AKA the worst speech recognizer :) By request of users who wish for unusable open dictionary offline speech recognition xD


Detect the presence or absence of human speech


Watson Speech to Text is a cloud-native solution that uses deep-learning AI algorithms to apply knowledge about grammar, language structure, and audio/voice signal composition to create customizable speech recognition for optimal text transcription.


Convert written text into natural-sounding audio in a variety of languages and voices.