9. Audio

Let's plan how your robot will speak and hear!

Robots with audio capabilities provide a very interactive experience. The ARC software includes multiple skills that connect to WiFi, Bluetooth, or USB audio input and output devices. ARC has speech recognition, text-to-speech synthesis, and robot skills for playing music and sound effects!

Choose an Audio Input Device

Wired Microphone

Connects directly to a computer with a USB cable or through an existing soundcard's input port. It is only used in an embedded robot configuration.

Wireless Microphone

Connects wirelessly to a computer with an RF (radio frequency) or Bluetooth connection

Choose an Audio Output Device Type

Wired Speaker

Connects directly to a computer with a USB or Audio cable. It is only used in an embedded robot configuration. You can select whether the speaker is connected via an audio cable to an existing sound card or connects with a USB.

Wireless Speaker

Connects wirelessly to a computer over Bluetooth or WiFi.

EZB Speaker

Use an EZB that supports audio output (i.e., EZ-Robot IoTiny or EZ-B v4). Note that using this option, the EZB on index #0 is preferred. This is also necessary if you wish to use the Audio Effects robot skill found here.

Add Generic Speech Recognition

Once you have selected the type of microphone to use, the next step is to experiment with speech recognition. Many speech recognition robot skills range from Google, Microsoft, and IBM Watson. However, the most popular robot skill to get started is the generic speech recognition robot skill. This uses the Microsoft Windows built-in speech recognition system. Therefore, it's easy to configure, and you can get it up and running with little effort.

Using the Microsoft Windows Speech Recognition Engine, this skill uses your computer's default audio input device and listens for known phrases. Phrases are manually configured in the Settings menu, and custom actions (via script) are assigned to your phrases.

Get Speech Recognition

Audio Robot Skills

Now that you have selected the audio device your robot will use, many robot skills should be considered. They range from speech recognition to audio players. You can have as many audio robot skills as your robot needs to achieve the goals.

Advanced Speech Recognition by Microsoft

This is an advanced alternative to the Bing Speech Recognition (//synthiam.com/Support/Skills/Audio/Bing-Speech-Recognition?id=16209) robot skill for ARC. It allows you to specify your own Azure Cognitive Service credentials using the Bing Speech Recognition cloud service, which is by far the most accurate speech recognition service weve ever used. Due to its nature, this robot skill is meant for advanced users who wish to write scripts to customize the behavior. For most use cases, it is...

Advanced Speech Synthesis by Synthiam Inc.

Advanced speech synthesis will use Azure voices which are natural sounding for several languages.

Audiotoolbox Plugin by ptp

Allows ARC to use UWP speech voices, change default audio devices, capture sound and route to an EZB. *** Version 14 *** Minor bug fix (memory release) *** Version 13 *** Fix for ARC 2020.02.28.00 release *** Version 12 *** Fix for ARC 2020.02.19.00 release *** Version 11 *** Fix for new ARC version *** Version 10 *** ImportUWPVoices command restored *** Version 9 *** Bug Fix (UI sync monitor object waits indefinitely, solution pulse the monitor when leaving ARC) *** Version 8 *** Build Fix...

Azure Speech To Text Engine by Microsoft

Speech recognition engine from Microsoft Azure using your own custom key for billing.

Azure Text To Speech by Microsoft

The Synthiam ARC Robot Skill for Azure Text-to-Speech is a powerful integration that enables your robot to generate human-like speech using Microsofts Azure Text-to-Speech service. This skill allows you to elevate your robotics project to the next level by equipping your robot with a natural and dynamic voice. Whether you are building a companion robot, educational tool, or any other robotic application, this skill enhances user interaction and engagement through spoken language. Applications...

Basic Speech Synthesis by Synthiam Inc.

English-only speech synthesis that uses a remote server to generate the audio.

Bing Speech Recognition by Microsoft

This speech recognition skill for ARC uses the Bing Speech Recognition cloud service. It is one of the most accurate speech recognition services available. Two Version Of This Skill There are two versions of this robot skill, this one and the Advanced Speech Recognition (https://synthiam.com/Support/Skills/Audio/Advanced-Speech-Recognition?id=15894 (https://synthiam.com/Support/Skills/Audio/Advanced-Speech-Recognition?id=15894)). This version of the robot skill uses the ARC Pro subscription...

Conversational Menu by Synthiam

Have a conversation with your robot using your voice by navigating through menu options so your robot can perform tasks. This is similar to how menu trees on phone systems work. You can define the menu tree in the configuration to run scripts based on options selected by the user with their voice. The menu tree can have as many branches as necessary. The robot will speak each menu prompt and then optionally speak each option to the user. The microphone on the robot or PC is used for the users...

Frequency Detector by Synthiam

This ARC skill uses the default recording device (i.e. microphone) configured on the Windows PC to detect frequencies. The identified frequency can be assigned to a variable and move servos. A range can be specified to limit the servo movements between Min and Max frequencies. The type of algorithm used to detect the frequency is called Fast Fourier transform (//en.wikipedia.org/wiki/Fast_Fourier_transform), or FFT for short. Detecting multiple frequencies is heard as noise, which will produce...

Google Speech Recognition by Google

This skill is a test for the Google Speech Service with ARC. There is some configuration necessary for testing this skill, as the Google Speech Service is paid and not free. There is a 60 day trial, which you can sign up for. This skill requires internet access, so if an I/O controller is also being used a second WiFi connection, wired network or direct UART connection to the PC is needed. Main Window (/uploads/user/F894A22A79D8EE11B554FCE4A5AA349F/s5wqzw4l.PNG) 1. Load Credentials Button This...

Microphone by Synthiam

This skill will record audio from your default audio input device and allow you to play it back through the EZ-B v4 speaker. The Settings menu for this skill will enable you to specify some effects and the sample rate. You may also adjust the recording level to begin recording audio automatically. The audio will be played once the level drops below the threshold. The most recent recording in the buffer can be exported to a SoundBoard EZ-B v4. This allows you to save the current recording from...

Midi Control by Synthiam

Use this skill to send or receive MIDI messages from musical instruments or controllers. Scripts can be added to each received note in the configuration window. Use the ControlCommand() to send MIDI messages to instruments. Any computer with a soundcard will also have a MIDI device, usually called Microsoft GS Wave Table Synth, which will output music instrument from the soundcard output. Otherwise, if you have an external midi device, such as a drum machine or keyboard, that can be selected as...

MP3 Trigger by SparkFun Electronics

The MP3 Trigger is a shield that connects to the EZ-B via a serial port. The MP3 Trigger takes a mini SD card with MP3s loaded on. The mp3s can be triggered from this control. This control and hardware have been deprecated with the ezb v4 streaming audio feature. Use the Config button to select the digital port and baud rate of the MP3 Trigger. Note: Synthiam is not a manufacturer of this third-party hardware device. . Nor is ezrobot responsible for the operation of this third-party device....

Openai Text To Speech by OpenAI

The OpenAI Text-to-Speech Robot Skill for Synthiam ARC allows your robot to communicate naturally by converting written text into audible speech using advanced OpenAI models. Simply send any text you want the robot to vocalize using the ControlCommand(), and the robot will produce clear, expressive speech in real-time. One of the unique aspects of this robot skill is that it leverages OpenAIs sophisticated AI-driven speech synthesis. This means that each time you request speech - even when...

Output Audio From EZ-B by Synthiam

This skill is an example, with source code, of how to play audio out of the EZ-B while making a custom robot skill. The EZ-B will output audio as a stream or byte array. View the source code of this example to see how it was done. If you are making a skill, be sure to follow the skill tutorial here: https://synthiam.com/Docs/Create-Robot-Skill/Overview (https://synthiam.com/Docs/Create-Robot-Skill/Overview) If you want your robot to play audio out of an EZB that supports audio stream, have a...

Sound Movement by Synthiam

The Sound movement skill is for embedded devices on your robot that have two integrated microphones for the left and right channels. This skill will allow your robot to respond to which side the sound is coming from. A script can be applied to each direction of the sound to control movement. *note: not all computers have stereo microphones or stereo microphone inputs. Verify that your computer mic input is stereo, otherwise this robot skill will receive mono audio. Main Window 1. Left...

Sound Script (Ezb) by Synthiam

Execute a script when sound is played out of the EZB speaker. This robot skill will also set 3 variables for the sound level that your script loop can access. The variables include Min Level, Max Level, and Average Level. Because the script is executed once when audio is started to play, it is advised that your on-going script is contained within a loop. Once the audio has stopped playing, the script will cancel and therefore stop running your loop. *Note: To avoid infinite recursive run-away...

Sound Servo (EZB) by Synthiam

Use the volume/amplitude of the EZBs audio output to control servos! You can control many servos with this skill by using the Multi Servo option in the settings or adding multiple instances of the robot skill. Specify the scalar to increase the dynamic range of the audio about the servos position. Dont worry if that sounds confusing; play with it and see what you get. Use this skill to move the mouth of your robot, for example. You can move your robots mouth to the audio level if a track is...

Sound Servo (PC Mic) by Synthiam

Use the volume/amplitude of your PCs audio input device (microphone) to control servos! You can control many servos with this skill by using the Multi Servo option in the settings or adding multiple instances of the skill. Specify the scalar to increase the dynamic range of the audio in relation to the servos position. Dont worry if that sounds confusing; play with it and see what you get. Use this skill to move your robots mouth, for example. If you speak into the microphone, your robots mouth...

Sound Servo (PC Speaker) by Synthiam

Use the volume/amplitude of your PCs audio output (speakers) to control servos! This skill can be used to control many servos by using the Multi Servo option in the settings or adding multiple instances of the skill. Specify the scalar to increase the dynamic range of the audio about the servos position. Dont worry if that sounds confusing; play with it and see what you get. Use this skill to move the mouth of your robot, for example. You can move your robots mouth to the audio level if a track...

Soundboard (EZB) by Synthiam

This soundboard will play audio files through the EZ-B v4 speaker. You may load WAV or MP3 files into the library to have multiple audio files. Use the ControlCommand() scripts to trigger specific audio files. Because the EZB can only play one audio stream at a time, you cant have play audio from multiple soundboards to the EZB. Main Window 1. Stop Button This button stops the audio coming from the Soundboard. 2. Clean Button If audio files are deleted from the tracklist, they will leave a...

Soundboard (PC) by Synthiam

This Soundboard will play MP3 or WAV files out of the selected default sound output device on your computer. Load files into the tracklist and use the Play button to trigger them. This is a great solution for adding digital sound effects to your project. The soundboard will only play one file at a time. If you wish to play more than one audio file at a time, you will need multiple soundboards (one per audio). An alternative is to use the scripting Audio.playAudioFile() command Main Window...

Soundboard V2 (PC) by Synthiam

This Soundboard will play MP3 or WAV files out of your PCs default sound output device. Scripts can be added to the timeline of each audio file for automation. You can use the scripts to create dances and movement events that trigger at specific times along the audio file. Optionally, this robot skill can repeat the audio track when invoked using the controlcommand from scripting. (/uploads/user/DB763BE15E695777689418BE7364E0A3/4z1k3qs5.png) This is similar to the soundboard (PC) robot skill,...

Speech Phrase Recognition by Synthiam Inc.

Execute scripts based on input from any speech to text recognition.

Speech Recognition by Synthiam

Using the Microsoft Windows Speech Recognition Engine, this skill uses your computers default audio input device and listens for known phrases. Phrases to be detected are manually configured in the Configuration menu, and custom actions (via script) are assigned to your phrases. Most robots make a lot of noise, so locating the audio input device on a robot is impractical. It is best to find the microphone on the controlling PC/Laptop, on yourself, or somewhere in the room (away from the robot...

Speech Script by Synthiam

Execute a script when speech is created. With this skill, you can create a function that will move servos or LEDs based on spoken speech. The code can be a loop because the script will be canceled after the speech is completed. Variable The variable containing the speech that is currently speaking is set as $SpeechTxt. *Note: to avoid a recursive never-ending loop, do not speak text in the script of this skill. If you do, the text will call this script, which will call this script, which will...

Speech Synthesis by Synthiam

This text-to-speech skill will verbally speak the user-defined phrase from your PCs default audio output device or EZ-B. However, an alternate way for your robot to speak programmatically from programming code is to use the SayEZB() or Say() commands in Blockly, JavaScript, or EZ-Script. Main Window 1. Text Field This field contains the text you would like spoken; it can be as long as you want. 2. Say (PC Speaker) Button This will output the text-to-speech through the PCs audio output...

Speech Synthesis Settings by Synthiam

Use this robot skill to adjust the speech settings and audio effects for spoken speech synthesis that uses Audio.say() and Audio.sayEZB() commands on EZB index #0. This robot skill does not affect other speech synthesis robot skills, such as Azure Text To Speech. This robot skill only modifies the built-in Windows Speech Synthesis voices triggered using the scripting Audio.say() or Audio.sayEZB() commands. The audio commands use the built-in Windows speech system, which is quite limited and does...

Talk Servo V2 by Synthiam

This skill will bind to the Text-to-speech engine and move servos to simulate jaw movement when speaking without writing code. This skill will move the specified servos to simulate a mouth movement whenever a ARC text-to-speech command is executed (i.e., Say, SayEZB, SayWait, SayEZBWait). If your robot has servos connected to simulate a mouth, this skill will move those servos while speaking. If the robot skill that is speaking is not using the built-in Say() commands (such as the Azure Text To...

Total Speech Recognition by Synthiam

Humorously known as the worst speech recognizer - by request of users who wish for an open dictionary, offline speech recognition for Windows 10 and 11. Unlike the regular speech recognition control, which allows pre-defined phrases or the Bing Speech Recognition, which works, this is an open dictionary, allowing any combination of words. However, the implementation of this type of speech recognition is not great... not great at all! You can sometimes achieve acceptable results by using a...

Voice Activity Detection by Synthiam

The Voice Activity Detector robot skill is designed to detect when speech begins and ends in real time. It processes audio input, identifying periods of active speech and silence. This skill is perfect for robots that need to respond dynamically to human speech. #### Key Features: - Speech Detection: - Detects when speech starts (Speech Begin) and stops (Speech End). - Customizable Actions: - Allows users to attach custom scripts that execute automatically when speech starts or stops. For...

Watson Speech To Text by ptp

Watson Speech to Text is a cloud-native solution that uses deep-learning AI algorithms to apply knowledge about grammar, language structure, and audio/voice signal composition to create customizable speech recognition for optimal text transcription. Get Version 11 Version 11 (2020-11-03) ================== compatibility with ARCs new version Version 10 (2020-10-20) ================== Minor changes Version 9 (2020-10-08) ================== I decided to break the plugin in multiple plugins to help...

Watson Text To Speech by ptp

With Watson Text to Speech, you can generate human-like audio from written text. Improve the customer experience and engagement by interacting with users in multiple languages and tones. Increase content accessibility for users with different abilities, provide audio options to avoid distracted driving, or automate customer service interactions to increase efficiencies. You will need an IBM cloud account (Free Tier) Watson Text To Speech (//www.ibm.com/cloud/watson-text-to-speech)...

Windows Speech Engine by Microsoft

Use Microsoft Windows built-in speech synthesis and recognition engine.

Previous Step

Next Step