Accurate Bing cloud speech-to-text for ARC: wake-word, programmable control, $BingSpeech output, Windows language support, headset compatible
How to add the Bing Speech Recognition robot skill
- Load the most recent release of ARC (Get ARC).
- Press the Project tab from the top menu bar in ARC.
- Press Add Robot Skill from the button ribbon bar in ARC.
- Choose the Audio category tab.
- Press the Bing Speech Recognition icon to add the robot skill to your project.
Don't have a robot yet?
Follow the Getting Started Guide to build a robot and use the Bing Speech Recognition robot skill.
How to use the Bing Speech Recognition robot skill
The Bing Speech Recognition robot skill for ARC uses Microsoft’s cloud-based Bing Speech Recognition service, which is one of the most accurate speech-to-text engines available. This skill converts spoken audio into text that can be used to control your robot, trigger scripts, or enable conversational AI.
Two Versions of This Skill
There are two versions of this robot skill: this one and Advanced Speech Recognition . This version uses the ARC Pro subscription cloud services. While the daily and monthly usage limits are very generous, if you exceed the ARC Pro query count you can switch to the Advanced Speech Recognition skill instead.
Microphone Recommendation
Robots typically generate significant background noise from motors, servos, fans, and speakers. For this reason, mounting a microphone directly on the robot is often not ideal. The recommended approach is to place the microphone on the controlling PC or laptop, on yourself, or somewhere in the room away from the robot.
Increasing microphone gain allows speech to be detected across larger rooms, but higher gain also increases the chance of false positives. Experiment with microphone placement and gain levels to find the best balance for your environment. For best results, use a headset or Bluetooth microphone rather than a built-in laptop microphone.
Main Window
1. Start Recording Button
Starts Bing Speech Recognition. The skill waits for detected speech, captures your voice,
converts it to text, and displays the result in the Response Display.
2. Audio Waveform
Provides visual feedback that your microphone is correctly configured and actively receiving
audio input.
3. Response Display
Displays recognized speech as text, silence detection, and diagnostic messages. This log helps
you fine-tune wake word detection, microphone levels, and recognition confidence.
Configuration
Phrase List
A list of predefined phrases that the recognizer will attempt to match exactly.
These phrases can be customized and expanded.
Not Handled Script
Executed when speech is detected that does not match any phrase in the Phrase List.
This script is skipped if a phrase match occurs.
All Recognized Script
Executed for every detected phrase, regardless of whether it matches a phrase list entry.
The recognized text is available in the $BingSpeech variable.
Start Listening Script
Executed whenever the skill begins listening. This is commonly used to turn on an LED,
play a sound, or visually indicate that the robot is listening.
Variable Field
Specifies the variable that stores the recognized text. The default variable is
$BingSpeech, which is global and accessible from JavaScript or Python using
GetVar().
Auto Record Using Wake Word
Enables wake-word detection similar to home assistants like Alexa or Google Home.
When the wake word is detected, the skill automatically begins listening.
Wake Word Sound
Plays a selected sound through the PC’s default audio device when the wake word is detected.
Min Wake Word Confidence
Sets the minimum confidence threshold (0.0 – 1.0) required to trigger the wake word.
The default value is 0.75.
Play Wake Word Sound with ControlCommand()
Plays the wake sound when listening is started programmatically using
ControlCommand().
Stop Punctuation
Removes punctuation from recognized speech to simplify text parsing.
Setup Microphone
Opens the Windows audio configuration dialog to select and adjust the microphone input device.
Max Recording Length (Seconds)
Limits how long the skill listens before stopping automatically, helping prevent false positives.
Language Drop-down
Selects the speech recognition language. Supported languages depend on the Windows Speech
Recognition configuration installed on your system.
How to Use Bing Speech Recognition
A detailed tutorial demonstrates using this skill with PandoraBot AI for conversational robots. View the tutorial here.
Using Bing Speech for Conversational Input
To begin speech recognition, press the Start Listening button or trigger listening
via a wake word or ControlCommand(). Recognized text is stored in the
$BingSpeech variable.
Automatic speech detection using voice activity is unreliable in noisy environments. The recommended approach is a push-to-talk system, where listening is explicitly started and stopped via software or a physical button.
The example below uses the SCRIPT skill and a button connected to EZ-B port D0 (configured with a pull-up resistor).
while (true) {
// Wait for the button press (pulls D0 low)
Digital.wait(d0, false);
// Start Bing Speech Recognition
ControlCommand("Bing Speech Recognition", "StartListening");
// Wait for the button release (D0 returns high)
Digital.wait(d0, true);
// Stop listening and begin transcription
ControlCommand("Bing Speech Recognition", "StopListening");
}
Videos
Example usage combined with Cognitive Vision and Cognitive Emotion.
Requirements
Recommended Hardware
Headset or External Microphone
A headset or external microphone significantly reduces background noise and prevents the recognizer from hearing the robot’s own voice. This improves accuracy and reduces false positives.
Resources
Configure Audio Input Device
- Right-click the speaker icon in the Windows system tray
- Select Open Sound Settings
- Verify the correct microphone is selected and responding to audio
- Adjust microphone volume so normal speech peaks near the middle of the VU meter
Control Commands for the Bing Speech Recognition robot skill
There are Control Commands available for this robot skill which allows the skill to be controlled programmatically from scripts or other robot skills. These commands enable you to automate actions, respond to sensor inputs, and integrate the robot skill with other systems or custom interfaces. If you're new to the concept of Control Commands, we have a comprehensive manual available here that explains how to use them, provides examples to get you started and make the most of this powerful feature.
Control Command Manual// Starts listening and returns the translated text. No scripts are executed. (Returns String)
- controlCommand("Bing Speech Recognition", "GetText")
// Start listening and convert the speech into text. Sets the global variable and executes the script.
- controlCommand("Bing Speech Recognition", "StartListening")
// Stop the current listening process.
- controlCommand("Bing Speech Recognition", "StopListening")
// Pause listening so the wakeword (if enabled) does not trigger listening.
- controlCommand("Bing Speech Recognition", "PauseListening")
// Unpause listening so the wakeword (if enabled) will trigger listening.
- controlCommand("Bing Speech Recognition", "UnpauseListening")
// Return the status of the pause checkbox. (Returns Boolean [true or false])
- controlCommand("Bing Speech Recognition", "GetPause")
Related Tutorials
Speech Recognition Tutorial
Vision Training: Object Recognition
Related Hack Events
Treat-O-Matic 2020 Live Hack Part #6 The Finale
Treat-O-Matic 2020 Live Hack Part #5
Robot Learn A New Object
D-0 Droid Live Hack
Related Robots
Related Questions
Use Voice Recognition For Unsupported Languages
Anyone Having Issues With Bing Speech Recognition?
What Is The Difference Between Pauselistening And...
Upgrade to ARC Pro
Become a Synthiam ARC Pro subscriber to unleash the power of easy and powerful robot programming

One trick I learned was to speak the phrase I wanted into Bing. Then I would copy and paste exactly what Bing returned into the command box. I would get 100% recognition that way. A lot of times what Bing returned and what I thought was the correct way to write the phrases were completely different. Bing would return words starting with capital letters, weird punctuation and such. I would copy what bing thought was correct regardless of proper sentence and word structure. And DJ is correct. Make sure there are no spaces at the beginning or end.
Have you tried to execute your code within the editor? Does that work?
Thanks @DJ, that was the issue, I knew it had to be something simple!
I had cut and pasted everything into the phrase list, line by line. I just figured out how it happened. I was cutting and pasting text that was at the beginning of a line of text. When you use cut & paste like that you can inadvertently grab the spaces between words because you are removing all the text.
That's my bad, something to keep in mind for the future.
I updated ARC for the next release to strip whitespaces from the phrases so it won't happen again
Hi,
I wanted to use a different wake word with Bing, so I set it to "Simone." The problem is that when I save the changes, the wake word is not saved. What am I doing wrong?
Thomas Messerschmidt
I've forwarded this to the team, and someone will look into it for you.
anyone get VAD to work reliably. Robot or other word works fine but there is a delay and you have to wait until after the beep. If it is a short sentence you need to wait until it times out to upload, if it is a long sentence you will be cut off. VAD works sometimes if you talk close to mic but often doesn't always recognize human voice and often drops out mid sentence.
Was really hoping VAD would start recording when it hears a human voice and stop recording and upload when there is silence for 1 second.
Hey Nink same problem I had with 4 types of microphones the vad very unreliable but have not tried any really expensive microphones yet. So was unsure why it is only working with manual pressing record button, vad was very terrible so far.