Thumbnail

Bing Speech Recognition

Accurate Bing cloud speech-to-text for ARC: wake-word, programmable control, $BingSpeech output, Windows language support, headset compatible

How to add the Bing Speech Recognition robot skill

  1. Load the most recent release of ARC (Get ARC).
  2. Press the Project tab from the top menu bar in ARC.
  3. Press Add Robot Skill from the button ribbon bar in ARC.
  4. Choose the Audio category tab.
  5. Press the Bing Speech Recognition icon to add the robot skill to your project.

Don't have a robot yet?

Follow the Getting Started Guide to build a robot and use the Bing Speech Recognition robot skill.


How to use the Bing Speech Recognition robot skill

The Bing Speech Recognition robot skill for ARC uses Microsoft’s cloud-based Bing Speech Recognition service, which is one of the most accurate speech-to-text engines available. This skill converts spoken audio into text that can be used to control your robot, trigger scripts, or enable conversational AI.

Two Versions of This Skill

There are two versions of this robot skill: this one and Advanced Speech Recognition . This version uses the ARC Pro subscription cloud services. While the daily and monthly usage limits are very generous, if you exceed the ARC Pro query count you can switch to the Advanced Speech Recognition skill instead.

Microphone Recommendation

Robots typically generate significant background noise from motors, servos, fans, and speakers. For this reason, mounting a microphone directly on the robot is often not ideal. The recommended approach is to place the microphone on the controlling PC or laptop, on yourself, or somewhere in the room away from the robot.

Increasing microphone gain allows speech to be detected across larger rooms, but higher gain also increases the chance of false positives. Experiment with microphone placement and gain levels to find the best balance for your environment. For best results, use a headset or Bluetooth microphone rather than a built-in laptop microphone.

Main Window

Bing Speech Recognition Main Window

1. Start Recording Button
Starts Bing Speech Recognition. The skill waits for detected speech, captures your voice, converts it to text, and displays the result in the Response Display.

2. Audio Waveform
Provides visual feedback that your microphone is correctly configured and actively receiving audio input.

3. Response Display
Displays recognized speech as text, silence detection, and diagnostic messages. This log helps you fine-tune wake word detection, microphone levels, and recognition confidence.

Configuration

Bing Speech Recognition Configuration

Phrase List
A list of predefined phrases that the recognizer will attempt to match exactly. These phrases can be customized and expanded.

Not Handled Script
Executed when speech is detected that does not match any phrase in the Phrase List. This script is skipped if a phrase match occurs.

All Recognized Script
Executed for every detected phrase, regardless of whether it matches a phrase list entry. The recognized text is available in the $BingSpeech variable.

Start Listening Script
Executed whenever the skill begins listening. This is commonly used to turn on an LED, play a sound, or visually indicate that the robot is listening.

Variable Field
Specifies the variable that stores the recognized text. The default variable is $BingSpeech, which is global and accessible from JavaScript or Python using GetVar().

Auto Record Using Wake Word
Enables wake-word detection similar to home assistants like Alexa or Google Home. When the wake word is detected, the skill automatically begins listening.

Wake Word Sound
Plays a selected sound through the PC’s default audio device when the wake word is detected.

Min Wake Word Confidence
Sets the minimum confidence threshold (0.0 – 1.0) required to trigger the wake word. The default value is 0.75.

Play Wake Word Sound with ControlCommand()
Plays the wake sound when listening is started programmatically using ControlCommand().

Stop Punctuation
Removes punctuation from recognized speech to simplify text parsing.

Setup Microphone
Opens the Windows audio configuration dialog to select and adjust the microphone input device.

Max Recording Length (Seconds)
Limits how long the skill listens before stopping automatically, helping prevent false positives.

Language Drop-down
Selects the speech recognition language. Supported languages depend on the Windows Speech Recognition configuration installed on your system.

How to Use Bing Speech Recognition

A detailed tutorial demonstrates using this skill with PandoraBot AI for conversational robots. View the tutorial here.

Using Bing Speech for Conversational Input

To begin speech recognition, press the Start Listening button or trigger listening via a wake word or ControlCommand(). Recognized text is stored in the $BingSpeech variable.

Automatic speech detection using voice activity is unreliable in noisy environments. The recommended approach is a push-to-talk system, where listening is explicitly started and stopped via software or a physical button.

The example below uses the SCRIPT skill and a button connected to EZ-B port D0 (configured with a pull-up resistor).


while (true) {

  // Wait for the button press (pulls D0 low)
  Digital.wait(d0, false);

  // Start Bing Speech Recognition
  ControlCommand("Bing Speech Recognition", "StartListening");

  // Wait for the button release (D0 returns high)
  Digital.wait(d0, true);

  // Stop listening and begin transcription
  ControlCommand("Bing Speech Recognition", "StopListening");
}
  

Videos

Example usage combined with Cognitive Vision and Cognitive Emotion.

Requirements

Internet connection required. A second WiFi adapter or Ethernet connection may be needed. Learn more about dual network connections here.

Recommended Hardware

Headset or External Microphone

Headset Microphone

A headset or external microphone significantly reduces background noise and prevents the recognizer from hearing the robot’s own voice. This improves accuracy and reduces false positives.

Resources

Configure Audio Input Device

Microphone Settings
  1. Right-click the speaker icon in the Windows system tray
  2. Select Open Sound Settings
  3. Verify the correct microphone is selected and responding to audio
  4. Adjust microphone volume so normal speech peaks near the middle of the VU meter

Control Commands for the Bing Speech Recognition robot skill

There are Control Commands available for this robot skill which allows the skill to be controlled programmatically from scripts or other robot skills. These commands enable you to automate actions, respond to sensor inputs, and integrate the robot skill with other systems or custom interfaces. If you're new to the concept of Control Commands, we have a comprehensive manual available here that explains how to use them, provides examples to get you started and make the most of this powerful feature.

Control Command Manual

// Starts listening and returns the translated text. No scripts are executed. (Returns String)

  • controlCommand("Bing Speech Recognition", "GetText")

// Start listening and convert the speech into text. Sets the global variable and executes the script.

  • controlCommand("Bing Speech Recognition", "StartListening")

// Stop the current listening process.

  • controlCommand("Bing Speech Recognition", "StopListening")

// Pause listening so the wakeword (if enabled) does not trigger listening.

  • controlCommand("Bing Speech Recognition", "PauseListening")

// Unpause listening so the wakeword (if enabled) will trigger listening.

  • controlCommand("Bing Speech Recognition", "UnpauseListening")

// Return the status of the pause checkbox. (Returns Boolean [true or false])

  • controlCommand("Bing Speech Recognition", "GetPause")

Related Tutorials

Related Hack Events

Related Robots

Related Questions


ARC Pro

Upgrade to ARC Pro

Become a Synthiam ARC Pro subscriber to unleash the power of easy and powerful robot programming

#25   — Edited

One trick I learned was to speak the phrase I wanted into Bing. Then I would copy and paste exactly what Bing returned into the command box. I would get 100% recognition that way. A lot of times what Bing returned and what I thought was the correct way to write the phrases were completely different.  Bing would return words starting with capital letters, weird punctuation and such. I would copy what bing thought was correct regardless of proper sentence and word structure. And DJ is correct. Make sure there are no spaces at the beginning or end.

#26  

Have you tried to execute your code within the editor? Does that work?

Author Avatar
PRO
Canada
#27  

Thanks @DJ, that was the issue, I knew it had to be something simple!

I had cut and pasted everything into the phrase list, line by line. I just figured out how it happened. I was cutting and pasting text that was at the beginning of a line of text. When you use cut & paste like that you can inadvertently grab the spaces between words because you are removing all the text.

That's my bad, something to keep in mind for the future.

Author Avatar
PRO
Synthiam
#28  

I updated ARC for the next release to strip whitespaces from the phrases so it won't happen again

#29  

Hi,

I wanted to use a different wake word with Bing, so I set it to "Simone." The problem is that when I save the changes, the wake word is not saved. What am I doing wrong?

Thomas Messerschmidt

#30  

I've forwarded this to the team, and someone will look into it for you.

Author Avatar
PRO
Canada
#31  

anyone get VAD to work reliably.  Robot or other word works fine but there is a delay and you have to wait until after the beep. If it is a short sentence you need to wait until it times out to upload, if it is a long sentence you will be cut off.   VAD works sometimes if you talk close to mic but often doesn't always recognize human voice and often drops out mid sentence.

Was really hoping VAD would start recording when it hears a human voice and stop recording and upload when there is silence for 1 second.

#32  

Hey Nink same problem I had with 4 types of microphones the vad very unreliable but have not tried any really expensive microphones yet. So was unsure why it is only working with manual pressing record button, vad was very terrible so far.