Accurate Bing cloud speech-to-text for ARC: wake-word, programmable control, $BingSpeech output, Windows language support, headset compatible
How to add the Bing Speech Recognition robot skill
- Load the most recent release of ARC (Get ARC).
- Press the Project tab from the top menu bar in ARC.
- Press Add Robot Skill from the button ribbon bar in ARC.
- Choose the Audio category tab.
- Press the Bing Speech Recognition icon to add the robot skill to your project.
Don't have a robot yet?
Follow the Getting Started Guide to build a robot and use the Bing Speech Recognition robot skill.
How to use the Bing Speech Recognition robot skill
The Bing Speech Recognition robot skill for ARC lets your robot “hear” what you say by using Microsoft’s cloud-based speech-to-text service (often called Bing / Azure Speech). The skill listens to your microphone, sends the audio to Microsoft over the internet, and receives back the recognized words as text.
Once your speech becomes text, ARC can use it to:
- Run robot actions (move, turn on lights, play sounds, etc.)
- Trigger scripts (JavaScript or Python)
- Feed conversational AI (PandoraBots, ChatGPT-style systems, etc.)
Two Versions of This Skill
There are two speech-recognition skills you may see in ARC:
- Bing Speech Recognition (this skill): uses ARC Pro subscription cloud services. Limits are generally generous for normal robot use.
- Advanced Speech Recognition: a separate skill you can switch to if you need different quota/usage behavior or exceed ARC Pro query counts.
Before You Start (Beginner Checklist)
- Install the skill in your ARC project and make sure it opens without errors.
- Confirm your PC has internet access. Cloud speech recognition requires the audio to be uploaded.
- Pick a microphone (USB headset, Bluetooth mic, webcam mic, etc.).
- Test the microphone in Windows first (make sure the input meter moves when you talk).
Microphone Recommendation (Very Important)
Robots are noisy. Motors, servos, fans, and the robot’s own speaker can make speech recognition inaccurate. For best results, do not mount the microphone on the robot unless you have a very quiet robot and a high-quality mic setup.
Recommended mic locations
- On the controlling PC/laptop (desk mic or webcam mic)
- On you (headset mic or lapel mic)
- In the room, away from the robot’s speaker and motors
Gain vs. false positives
Higher microphone gain helps you be heard from farther away, but it also increases the chance that the skill “hears” noise and triggers accidentally (false positives). If you get random triggers, reduce gain, move the mic closer to your mouth, or use push-to-talk.
For best accuracy, use a headset or Bluetooth microphone instead of a built-in laptop microphone.
Main Window (What You’re Seeing)
1. Start Recording / Start Listening Button
Starts speech recognition. The skill begins waiting for speech, captures your voice, converts it to text,
and shows the result in the Response Display. If you are brand new, click this first to confirm your mic works.
2. Audio Waveform
This is your quick microphone test. When you talk, you should see movement. If the waveform is flat,
Windows is probably using the wrong microphone or the mic volume is too low.
3. Response Display (Log)
Shows what ARC recognized, plus silence detection and helpful status/diagnostic messages.
Use this log to troubleshoot recognition quality, wake word confidence, and microphone settings.
Configuration (Settings Explained)
Beginner tip: If you’re not sure what to change, start with just these: Phrase List, All Recognized Script, and Language. Leave the rest at defaults until you have basic recognition working.
Phrase List
A list of phrases you want the skill to recognize as “commands”. When your spoken words match a phrase,
ARC can treat it as handled (a known command). Keep phrases short and distinct for best accuracy.
Examples: “robot start”, “turn left”, “stop”, “look at me”.
Not Handled Script
Runs when speech is recognized but it does not match anything in the Phrase List.
This is useful when you want a fallback behavior (for example: “I didn’t understand” or send text to an AI).
This script is skipped if a phrase match occurs.
All Recognized Script
Runs every time something is recognized, whether or not it matches your Phrase List.
The recognized text is stored in a variable (default: $BingSpeech), so you can read it in your script.
Common use: send all speech to a chatbot, log everything, or do your own custom parsing.
Start Listening Script
Runs whenever the skill begins listening. This is commonly used for feedback such as:
turning on an LED, showing a message on-screen, or playing a “listening” sound so you know it’s active.
Variable Field
The name of the ARC variable that will store the recognized text. Default is $BingSpeech.
Because it’s a normal ARC variable, you can access it from:
- JavaScript or Python
- Other skills that read variables
- Scripts using
GetVar()
Auto Record Using Wake Word
Enables wake-word detection (similar to “Alexa” / “Hey Google”). When the wake word is detected,
the skill automatically starts listening for your command.
Beginner note: Wake words are convenient, but can be unreliable in noisy rooms.
Push-to-talk is usually more dependable for robots.
Wake Word Sound
Plays a sound through the PC’s default speakers when the wake word is detected, so you know the robot “heard” the wake word.
Min Wake Word Confidence (0.0 – 1.0)
How “sure” the system must be before it accepts the wake word. Default is 0.75.
- Increase this if the wake word triggers by accident.
- Decrease this if you have to repeat the wake word too often.
Play Wake Word Sound with ControlCommand()
If enabled, the wake sound will also play when you start listening from a script using ControlCommand().
Useful if you want consistent audio feedback no matter how listening was started.
Stop Punctuation
Removes punctuation from recognized text. This can make command matching easier.
Example: “turn left.” becomes “turn left”.
Setup Microphone
Opens Windows audio settings so you can choose the correct input device and adjust levels.
Use this if the waveform does not move when you talk.
Max Recording Length (Seconds)
Limits how long the skill will listen before it stops automatically. This helps prevent the robot from listening forever,
which can reduce accidental triggers and excessive background audio uploads.
Language Drop-down
Selects the language/locale used for recognition (example: en-US). Available options depend on what is installed
and supported on your system.
If you don’t see the language you want, check Microsoft’s language/locale list and use the correct locale value:
Microsoft Speech-to-Text language support.
Quick Start (Your First Successful Test)
- Open the skill and click Setup Microphone. In Windows, select the mic you want to use.
- Speak normally and confirm the input meter moves in Windows and the audio waveform moves in ARC.
- Select the correct Language (for example: English (United States)).
- Click Start Listening (or Start Recording) and say a short phrase like: “hello robot”.
- Look at the Response Display and confirm your words appear as text.
How to Use Recognized Speech in ARC
When the skill recognizes speech, it stores the text in the variable (default: $BingSpeech).
You can use that variable in scripts to decide what your robot should do next.
Wake word vs. Push-to-talk
Automatic voice activity detection can be unreliable in noisy environments (common with robots). For dependable control, use push-to-talk: you explicitly start listening, speak, then stop listening.
Push-to-Talk Example (Recommended for Beginners)
The example below uses the Script skill and a physical button connected to EZ-B port D0 (configured with a pull-up resistor). The robot only listens while the button is held down. For a more in-depth tutorial on implementing push-to-talk, see the getting-started tutorial.
while (true) {
// Wait for the button press (pulls D0 low)
Digital.wait(d0, false);
// Start Bing Speech Recognition
ControlCommand("Bing Speech Recognition", "StartListening");
// Wait for the button release (D0 returns high)
Digital.wait(d0, true);
// Stop listening and begin transcription
ControlCommand("Bing Speech Recognition", "StopListening");
}
What this script is doing
- Waits until the button is pressed
- Starts listening
- Waits until the button is released
- Stops listening, then the service converts the recorded speech into text
Tutorial: Using This Skill with Conversational AI
A step-by-step tutorial shows how to use speech recognition with a chatbot for conversational robots: AI Robot Chat tutorial.
Videos
Example usage combined with Cognitive Vision and Cognitive Emotion.
Requirements
- Internet connection is required (speech recognition runs in the cloud).
- If your robot uses a separate WiFi network (for example, connecting directly to an EZ-B), you may need dual network connections (ex: Ethernet + WiFi, or two WiFi adapters). Learn more: Dual network connections.
Recommended Hardware
Headset or External Microphone
A headset or external microphone greatly reduces background noise and prevents the recognizer from hearing the robot’s own speaker. This improves accuracy and reduces false positives.
Resources: Configure Your Microphone in Windows
- Right-click the speaker icon in the Windows system tray
- Select Open Sound settings
- Under Input, choose the correct microphone device
- Speak and confirm the input meter responds
- Adjust microphone volume so normal speech peaks around the middle of the meter (avoid constant max/red levels)
Control Commands for the Bing Speech Recognition robot skill
There are Control Commands available for this robot skill which allows the skill to be controlled programmatically from scripts or other robot skills. These commands enable you to automate actions, respond to sensor inputs, and integrate the robot skill with other systems or custom interfaces. If you're new to the concept of Control Commands, we have a comprehensive manual available here that explains how to use them, provides examples to get you started and make the most of this powerful feature.
Control Command Manual// Starts listening and returns the translated text. No scripts are executed. (Returns String)
- controlCommand("Bing Speech Recognition", "GetText")
// Start listening and convert the speech into text. Sets the global variable and executes the script.
- controlCommand("Bing Speech Recognition", "StartListening")
// Stop the current listening process.
- controlCommand("Bing Speech Recognition", "StopListening")
// Pause listening so the wakeword (if enabled) does not trigger listening.
- controlCommand("Bing Speech Recognition", "PauseListening")
// Unpause listening so the wakeword (if enabled) will trigger listening.
- controlCommand("Bing Speech Recognition", "UnpauseListening")
// Return the status of the pause checkbox. (Returns Boolean [true or false])
- controlCommand("Bing Speech Recognition", "GetPause")
Related Tutorials
Speech Recognition Tutorial
Vision Training: Object Recognition
Related Hack Events
Treat-O-Matic 2020 Live Hack Part #6 The Finale
Treat-O-Matic 2020 Live Hack Part #5
Robot Learn A New Object
D-0 Droid Live Hack
Related Robots
Related Questions
Use Voice Recognition For Unsupported Languages
Anyone Having Issues With Bing Speech Recognition?
What Is The Difference Between Pauselistening And...
Upgrade to ARC Pro
With Synthiam ARC Pro, you're not just programming a robot; you're shaping the future of automation, one innovative idea at a time.

I'm having an issue with VAD. When its on I get errors, mostly it hangs sending for a long time then returns an error: a task may only be disposed if it is in a completion state: ran to completion, fault or canceled.
If i turn off VAD i do not get the issue. But then have to press stop recording button which defeats the purpose.
Can anyone repeat this issue? Just about to give up on this and move on to another skill.
No problems here. Just used the skill with "Auto Record Checkbox".
Will, are you using this robot skill on 2 or more ARC instances at the same time? That might cause the issue. This free version of the recognition uses a shared key and has a limit to how many users can use it at once. There's also a limit that each user can only be used once at a time. If you're using a production or dev environment, I'd recommend using the advanced speech recognition here: https://synthiam.com/Support/Skills/Audio/Advanced-Speech-Recognition?id=15894
That way, you're in charge of the key do not need to depend on sharing keys with other users.
Hello,
When trying to add Bing Speech Recognition I'm getting this error message. Please help, thanks in advance.
Version: 2021.11.28.00
NAudio.MmException: InvalidHandle calling waveInStop at NAudio.Wave.WaveInEvent.StopRecording() at ARC.UCForms.FormBingSpeechRecognition.ConfigPressed() at ARC.UCForms.FormMasterBase.O5M2kVWMr4Pp0G1QpBRH(Object ) at ARC.UCForms.FormMasterBase.xQYxVChkhQn(Object , EventArgs ) at System.Windows.Forms.Control.OnClick(EventArgs e) at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks) at System.Windows.Forms.Control.WndProc(Message& m) at System.Windows.Forms.Label.WndProc(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m) at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
Doesn't look like you have a sound card with an input device on that computer. This robot skill can only work if there is a working soundcard with a mic or line in.
Thanks DJ, I’ll get that fixed and try tomorrow.:)
Hey All, I'm pretty sure that anyone with experience with this skill should be able to answer this. Can you tell me how to activate the phrases within the Bing Speech Recognition skill? I must be missing something easy.
I have no trouble having the phrase recognized by Bing. Example: "Look up"
But the action associated with the phrase "Look up" (in the configuration) isn't executed. The characters exactly match but it seems like the phrase isn't detected.
I've even tried removing my "All recognized script" and "Script for not handled" but that didn't help.
Maybe you have a blank space at the end of "Look Up"? And it's written as "Look Up " Edit the phrase and remove the space if there is one.