9. Audio

Let's plan how your robot will speak and hear!

Robots with audio capabilities provide a very interactive experience. The ARC software includes multiple skills that connect to WiFi, Bluetooth, or USB audio input and output devices. ARC has speech recognition, text-to-speech synthesis, and robot skills for playing music and sound effects!


Choose an Audio Input Device


Wired Microphone

Connects directly to a computer with a USB cable or through an existing soundcard's input port. It is only used in an embedded robot configuration.

Wireless Microphone

Connects wirelessly to a computer with an RF (radio frequency) or Bluetooth connection




Choose an Audio Output Device Type


Wired Speaker

Connects directly to a computer with a USB or Audio cable. It is only used in an embedded robot configuration. You can select whether the speaker is connected via an audio cable to an existing sound card or connects with a USB.

Wireless Speaker

Connects wirelessly to a computer over Bluetooth or WiFi.

EZB Speaker

Use an EZB that supports audio output (i.e., EZ-Robot IoTiny or EZ-B v4). Note that using this option, the EZB on index #0 is preferred. This is also necessary if you wish to use the Audio Effects robot skill found here.


Add Generic Speech Recognition

Once you have selected the type of microphone to use, the next step is to experiment with speech recognition. Many speech recognition robot skills range from Google, Microsoft, and IBM Watson. However, the most popular robot skill to get started is the generic speech recognition robot skill. This uses the Microsoft Windows built-in speech recognition system. Therefore, it's easy to configure, and you can get it up and running with little effort.

Using the Microsoft Windows Speech Recognition Engine, this skill uses your computer's default audio input device and listens for known phrases. Phrases are manually configured in the Settings menu, and custom actions (via script) are assigned to your phrases.

  Get Speech Recognition


Audio Robot Skills

Now that you have selected the audio device your robot will use, many robot skills should be considered. They range from speech recognition to audio players. You can have as many audio robot skills as your robot needs to achieve the goals.

This is an alternative to the Bing Speech Recognition module for ARC. This skill allows you to specify your own Azure Cognitive Service credentials using the Bing Speech Recognition cloud service. This is by far the most accurate speech recognition service that weve ever used. Main Window 1. Connect API Button Once you have entered the API Key in the advanced configuration you can connect to it with this button. 2. Pause Checkbox This checkbox pauses the detection from the audio input...


Advanced speech synthesis will use Azure voices which are natural sounding for several languages.


Allows ARC to use UWP speech voices, change default audio devices, capture sound and route to an EZB. *** Version 14 *** Minor bug fix (memory release) *** Version 13 *** Fix for ARC 2020.02.28.00 release *** Version 12 *** Fix for ARC 2020.02.19.00 release *** Version 11 *** Fix for new ARC version *** Version 10 *** ImportUWPVoices command restored *** Version 9 *** Bug Fix (UI sync monitor object waits indefinitely, solution pulse the monitor when leaving ARC)  *** Version 8 *** Build Fix...


Speech recognition engine from Microsoft Azure using your own custom key for billing.


The Synthiam ARC Robot Skill for Azure Text to Speech is a powerful integration that enables your robot to generate human-like speech using Microsofts Azure Text to Speech service. This skill allows you to take your robotics project to the next level by providing your robot with a natural and dynamic voice. Whether you are building a companion robot, educational tool, or any other robotic application, this skill enhances user interaction and engagement through spoken language. Applications...


English-only speech synthesis that uses a remote server to generate the audio.


This speech recognition skill for ARC uses the Bing Speech Recognition cloud service. It is one of the most accurate speech recognition services available. Two Version Of This Skill There are two versions of this robot skill, this one and the Advanced Speech Recognition. This version of the robot skill uses a shared license key with Microsoft that enables ARC users to experiment and demo this robot skill. Because this version of the skill shares a license key, users may encounter errors if...


Have a conversation with your robot using your voice by navigating through menu options so your robot can perform tasks. This is similar to how menu trees on phone systems work. You can define the menu tree in the configuration to run scripts based on options selected by the user with their voice. The menu tree can have as many branches as necessary. The robot will speak each menu prompt and then optionally speak each option to the user. The microphone on the robot or PC is used for the users...


This ARC skill uses the default recording device (i.e. microphone) configured on the Windows PC to detect frequencies. The identified frequency can be assigned to a variable and move servos. A range can be specified to limit the servo movements between Min and Max frequencies. The type of algorithm used to detect the frequency is called Fast Fourier transform, or FFT for short. Detecting multiple frequencies is heard as noise, which will produce bumpy responses in the data log. Ideally, a single...


This skill is a test for the Google Speech Service with ARC. There is some configuration necessary for testing this skill, as the Google Speech Service is paid and not free. There is a 60 day trial, which you can sign up for. This skill requires internet access, so if an I/O controller is also being used a second WiFi connection, wired network or direct UART connection to the PC is needed. Main Window 1. Load Credentials Button This opens the location of your Google speech recognition key...


This skill will record audio from your default audio input device and allow you to play it back through the EZ-B v4 speaker. The Settings menu for this skill will enable you to specify some effects and the sample rate. You may also adjust the recording level to begin recording audio automatically. The audio will be played once the level drops below the threshold. The most recent recording in the buffer can be exported to a SoundBoard EZ-B v4. This allows you to save the current recording from...


Use this skill to send or receive MIDI messages from musical instruments or controllers. Scripts can be added to each received note in the configuration window. Use the ControlCommand() to send MIDI messages to instruments. Any computer with a soundcard will also have a MIDI device, usually called Microsoft GS Wave Table Synth, which will output music instrument from the soundcard output. Otherwise, if you have an external midi device, such as a drum machine or keyboard, that can be selected as...


The MP3 Trigger is a shield that connects to the EZ-B via a serial port. The MP3 Trigger takes a mini SD card with MP3s loaded on. The mp3s can be triggered from this control. This control and hardware have been deprecated with the ezb v4 streaming audio feature. Use the Config button to select the digital port and baud rate of the MP3 Trigger. Note: Synthiam is not a manufacturer of this third-party hardware device. . Nor is ezrobot responsible for the operation of this third-party device....


This skill is an example, with source code, of how to play audio out of the EZ-B while making a custom robot skill. The EZ-B will output audio as a stream or byte array. View the source code of this example to see how it was done. If you are making a skill, be sure to follow the skill tutorial here: https://synthiam.com/Docs/Create-Robot-Skill/Overview If you want your robot to play audio out of an EZB that supports audio stream, have a look at the SoundBoard skill, or many others in the Audio...


The Sound movement skill is for embedded devices on your robot that have two integrated microphones for the left and right channels. This skill will allow your robot to respond to which side the sound is coming from. A script can be applied to each direction of the sound to control movement. *note: not all computers have stereo microphones or stereo microphone inputs. Verify that your computer mic input is stereo, otherwise this robot skill will receive mono audio. Main Window 1. Left...


Execute a script when sound is played out of the EZB speaker. This robot skill will also set 3 variables for the sound level that your script loop can access. The variables include Min Level, Max Level, and Average Level. Because the script is executed once when audio is started to play, it is advised that your on-going script is contained within a loop. Once the audio has stopped playing, the script will cancel and therefore stop running your loop. *Note: To avoid infinite recursive run-away...


Use your EZ-Bs audio output to control servos! You can control many servos with this skill by using the Multi Servo option in the settings or adding multiple instances of the robot skill. Specify the scalar to increase the dynamic range of the audio about the servos position. Dont worry if that sounds confusing; play with it, and see what you get. Use this skill to move the mouth of your robot. If a track is playing, you can move your robots mouth to the audio level. Main Window 1. Level...


Use your PCs audio input device (microphone) to control servos! You can control many servos with this skill by using the Multi Servo option in the settings or adding multiple instances of the skill. Specify the scalar to increase the dynamic range of the audio in relation to the servos position. Dont worry if that sounds confusing; play with it, and see what you get. Use this skill to move the mouth of your robot. If you speak into the microphone, you can have your robots mouth mimic your...


Use your PCs audio output (speakers) to control servos! You can control many servos with this skill by using the Multi Servo option in the settings or adding multiple instances of the skill. Specify the scalar to increase the dynamic range of the audio about the servos position. Dont worry if that sounds confusing; play with it, and see what you get. Use this skill to move the mouth of your robot. If a track is playing, you can move your robots mouth to the audio level. Main Window 1. Level...


This soundboard will play audio files through the EZ-B v4 speaker. You may load WAV or MP3 files into the library to have multiple audio files. Use the ControlCommand() scripts to trigger specific audio files. Main Window 1. Stop Button This button stops the audio coming from the Soundboard. 2. Clean Button If audio files are deleted from the tracklist, they will leave a blank row. This button removes the blank rows from the tracklist. 3. Clipping Indicator If the volume level bar is too...


This Soundboard will play MP3 or WAV files out of the selected default sound output device on your computer. Load files into the tracklist and use the Play button to trigger them. This is a great solution for adding digital sound effects to your project. Main Window 1. Stop Button This button stops the audio coming from the Soundboard. 2. Clean Button If audio files are deleted from the tracklist, they will leave a blank row. This button removes the blank rows from the tracklist. 3. Track...


This Soundboard will play MP3 or WAV files out of your PCs default sound output device. Scripts can be added to the timeline of each audio file for automation. You can use the scripts to create dances and movement events that trigger at specific times along the audio file.  This is similar to the soundboard (PC) but adds the ability to apply scripts to the audio file timeline.  Configuration Scripts can be added to each audio file triggered during the playback timeline. Right-click in the...


Execute scripts based on input from any speech to text recognition.


Using the Microsoft Windows Speech Recognition Engine, this skill uses your computers default audio input device and listens for known phrases. Phrases are manually configured in the Settings menu, and custom actions (via script) are assigned to your phrases. Most robots make a lot of noise, so locating the audio input device on a robot is impractical. It is best to find the microphone on the controlling PC/Laptop, on yourself, or somewhere in the room (away from the robot). Turning the gain...


Execute a script when speech is created. With this skill, you can create a function that will move servos or LEDs based on spoken speech. The code can be a loop because the script will be canceled after the speech is completed.  Variable The variable containing the speech that is currently speaking is set as $SpeechTxt. *Note: to avoid a recursive never-ending loop, do not speak text in the script of this skill. If you do, the text will call this script, which will call this script, which will...


This text-to-speech skill will verbally speak the user-defined phrase from your PCs default audio output device or EZ-B. However, an alternate way for your robot to speak programmatically from programming code is to use the SayEZB() or Say() commands in Blockly, JavaScript, or EZ-Script. Main Window 1. Text Field This field contains the text you would like spoken; it can be as long as you want. 2. Say (PC Speaker) Button This will output the text-to-speech through the PCs audio output...


Use this robot skill to adjust the speech settings and audio effects for spoken speech synthesis on EZB index #0. Main Window 1. Voice Drop-down This drop-down contains a selection of installed voices. 2. Emphasis Drop-down This drop-down contains a selection of speech emphasis levels. 3. Rate Drop-down This drop-down contains a selection of speeds for the emphasis of speech. Do note that the Rate will have no effect unless Emphasis is configured to Not set. 4. Volume Slider This slider...


This skill will bind to the Text-to-speech engine and move servos to simulate jaw movement when speaking. This skill will move the specified servos to simulate a mouth movement whenever a text-to-speech script command is executed (i.e., Say, SayEZB, SayWait, SayEZBWait). If your robot has servos connected to simulate a mouth, this skill will move those servos while speaking. If the robot skill that is speaking is not using the built-in Say() commands (such as the Azure Text To Speech), you can...


AKA the worst speech recognizer - by request of users who wish for unusable open dictionary offline speech recognition. Unlike the regular speech recognition control which allows pre-defined phrases or the Bing Speech Recognition which works, this is an open dictionary, allowing any combination of words. However, the implementation of this type of speech recognition is not great... not great at all! You can get acceptable results sometimes by using a handheld microphone and very well trained...


Voice Activity Detection (VAD) detects the presence or absence of human speech. This uses an advanced algorithm to detect human voice in the microphone input of the PCs default audio device. When a human voice is detected or lost, a respective script will run. Voice Detected When voice is detected, the graph will display in green and the voice start script will execute. Voice Not Detected The display will be red when there is an absence of a human voice and the voice end script will execute.


Watson Speech to Text is a cloud-native solution that uses deep-learning AI algorithms to apply knowledge about grammar, language structure, and audio/voice signal composition to create customizable speech recognition for optimal text transcription. Get Version 11 Version 11 (2020-11-03) ================== compatibility with ARCs new version Version 10 (2020-10-20) ================== Minor changes Version 9 (2020-10-08) ================== I decided to break the plugin in multiple plugins to help...


With Watson Text to Speech, you can generate human-like audio from written text. Improve the customer experience and engagement by interacting with users in multiple languages and tones. Increase content accessibility for users with different abilities, provide audio options to avoid distracted driving, or automate customer service interactions to increase efficiencies. You will need an IBM cloud account (Free Tier) Watson Text To Speech Sign up for IBM Cloud Log in to IBM Cloud IBM Cloud...


Use Microsoft Windows built-in speech synthesis and recognition engine.