Thumbnail

Bing Speech Recognition

How to add the Bing Speech Recognition robot skill

  1. Load the most recent release of ARC (Get ARC).
  2. Press the Project tab from the top menu bar in ARC.
  3. Press Add Robot Skill from the button ribbon bar in ARC.
  4. Choose the Audio category tab.
  5. Press the Bing Speech Recognition icon to add the robot skill to your project.

Don't have a robot yet?

Follow the Getting Started Guide to build a robot and use the Bing Speech Recognition robot skill.


How to use the Bing Speech Recognition robot skill

This speech recognition skill for ARC uses the Bing Speech Recognition cloud service. It is one of the most accurate speech recognition services available.

Two Version Of This Skill

There are two versions of this robot skill, this one and the Advanced Speech Recognition. This version of the robot skill uses the ARC Pro subscription cloud services. While the limitations are very high for daily/monthly use, if you exceed the ARC Pro query count, you can use the Advanced Speech Recognition instead.

Microphone Recommendation

Most robots make a lot of noise, so locating the audio input device on a robot is not a practical solution. It is best to find the microphone on the controlling PC/Laptop, on yourself, or somewhere in the room (away from the robot). Turning the gain higher on the input device will allow voices to be recognized across large rooms and increase false positives. Test with different gains for the best resolution. Experiment with varying microphone locations and volumes for the best setup for your environment. Ideally, use a headset or Bluetooth mic rather than your laptop microphone.



Main Window




1. Start Recording Button
This button starts the Bing Speech Recognition; it will detect silence until you speak, then detect the words you are saying and display them in the Response Display.

2. Audio Waveform
This gives visual feedback that your audio input device (microphone) is configured correctly and is picking up voice/sounds.

3. Response Display
Here, you will get speech recognition feedback. It will show the text version of your detected words or silence. There is also information displayed to help dial in the wake word. The display log will show suggestions about wake word detection and whether speaking is too quiet or confidence is too low. View the log for assistance getting the wake word working.

Configuration



Phrase List
 This is a list of default phrases that can be customized and added.

Not Handled Script
 This script will execute for every detected phrase not in the phrase list. This script will not be called if there is a match from the phrase list.

All Recognized Script
 This script will execute for all detected phrases. If there is no match for recognition, this script will still be executed. Reference the variable (Default $BingSpeech) to get the detected phrase in text.

Start Listening Script
 The script is executed every time the robot skill begins listening to convert text to speech. You can use this script to turn on an LED to indicate that the robot is listening or performing an action, etc.

Variable Field
 This variable holds the text from the speech recognizer. This may be used in your script to determine what was spoken. No speech was recognized if the variable was empty (i.e., silence). By default, this variable is $BingSpeech and is a global variable that can be retrieved in Python or JavaScript with the GetVar() command.

Auto Record Using Wake Word
 Like a home assistant, such as Alexa or Google Home, the robot will begin listening when the wake word is detected. You can check this box to enable the wake word and enter the wake word you wish to use on the corresponding text field.

Wake Word Sound
 When the wake word is detected, this selected sound will play out of the PC's default sound device.

Min Wake Word Confidence
 Listening for the wake word uses built-in speech recognition. If you know how the built-in speech recognition works, it detects phrases based on a confidence rating. The confidence rating is a value between 0-1. The default confidence is 0.75, so any wake word heard in which the recognizer has more than 0.75 confidence will trigger.

Play Wake Word Sound with ControlCommand()
 You can trigger the speech recognizer to listen to audio with a control command. If this checkbox is checked, the selected audio file will play when the control command instructs the robot skill to begin listening for speech to recognize.

Stop Punctuation
 Strips all punctuation from detected speech. This makes it easier to parse the text from the speech recognizer to look for specific words or phrases.

Setup Microphone
 Opens the Windows Dialog to configure the audio input and microphone input.

Max Recording Length (Seconds)
 Configure how many seconds the robot skill will listen. This is useful to ensure the recognizer does not sit and listen continuously to false positives.

Language Drop-down
 ARC uses Microsoft Speech Recognition, which is included with Windows. All languages supported by Windows Speech Recognition are also supported in ARC. You can configure Windows to listen to any language. ARC will default to EN-US (English) language if installed. Otherwise, ARC will default to the first installed language. If more than one language is installed, a language may be selected with this drop-down.

*Note 2: Languages supported by speech recognition depend on the Microsoft Windows operating system configuration. View the Microsoft speech recognition guide https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt to view supported languages.


How to Use Bing Speech Recongition

A detailed tutorial provides an example of using this robot skill with PandoraBot AI to converse with your robot. You can find the tutorial by clicking here.


Using Bing Speech for Conversational Input

To use Bing Speech Recognition, the "Start Listening" button must be pressed to begin capturing audio. Once active, it transcribes the spoken input into text and assigns it to the $BingSpeech variable. You can activate the "Start Listening" button manually, through the built-in "wake word" feature, or programmatically using the ControlCommand from another robot skill.

While some robot skills attempt to detect speech automatically using Voice Activity Detection (VAD), this method tends to be unreliable because it cannot differentiate between conversations directed at the robot and background speech. The recommended and most reliable method is to trigger listening through software, like a push-to-talk system. This involves pressing and holding a button to start listening and releasing it to stop and begin transcription. Placing a button on or near the microphone makes it intuitive to control the listening session. When using this method, configure Bing Speech Recognition's maximum recording time to the longest duration the settings allow.

An easy way to implement this setup is by using the SCRIPT robot skill. The example script below is written in JavaScript for the Script skill. It monitors a button connected to port D0 of the EZB, which should use a pull-up resistor. In this setup, the released button keeps D0 high, and pressing it lowers the value.

while (true) {

  // Wait for the button press (pulls D0 low)
  Digital.wait(d0, false);

  // Start Bing Speech Recognition
  ControlCommand("Bing Speech Recognition", "StartListening");

  // Wait for the button release (D0 returns high)
  Digital.wait(d0, true);

  // Stop Bing Speech Recognition and start transcription
  ControlCommand("Bing Speech Recognition", "StopListening");
}

Videos




Here's an example of the skill in action combined with the Cognitive Vision and Cognitive Emotion services.

Requirements


This service requires an internet connection, meaning a second USB WiFi adapter or an Ethernet connection may be needed. Read about having two network connections here.

Headset or External Mic


A headset or external mic will produce better results than the internal PC/Laptop mic. A headset or mic will enable the recognition engine to "hear" your voice more clearly with less background noise. The background noise of the laptop, motors, radio, and room echo will cause the recognition software to return False Positives. This means the software recognizes an incorrect phrase. An external mic will also prevent the recognition software from hearing the robot speak. In short, it is important to use a Mic Headset or external Mic for a positive Speech Recognition experience.


Resources


Configure Audio Input Device

You might have to adjust the microphone input volume/gain. To change the mic volume, use the Microsoft Windows volume mixer, and first, make sure you have selected the correct input device. Your laptop or computer may have a few different microphones. Maybe one is on a remote camera. Find the mic you'd like to use and adjust the volume. To find the volume settings that are ideal on your computer, follow these steps:

1) Right-click on the little speaker on your system tray

2) Select "Open Sound Settings."

3) In the "Input" section of the Sound Settings, you'll notice a little VU meter beside the active device. Make sure your active device is indeed the microphone you want to use. By making sounds, the VU meter should move.

4) Click on the "Device Properties" and locate the volume slider for the microphone. We usually have our volume set for 78. Play around with different volumes until you see your voice being picked up by the VU meter. Adjust the volume input level/gain to display your voice's regular volume near the middle of the VU Display graph. If the level/gain is too high, the recognition software will not work because the input audio will be distorted.

Control Commands for the Bing Speech Recognition robot skill

There are Control Commands available for this robot skill which allows the skill to be controlled programmatically from scripts or other robot skills. These commands enable you to automate actions, respond to sensor inputs, and integrate the robot skill with other systems or custom interfaces. If you're new to the concept of Control Commands, we have a comprehensive manual available here that explains how to use them, provides examples to get you started and make the most of this powerful feature.

Control Command Manual

// Starts listening and returns the translated text. No scripts are executed. (Returns String)



  • controlCommand("Bing Speech Recognition", "GetText")


// Start listening and convert the speech into text. Sets the global variable and executes the script.



  • controlCommand("Bing Speech Recognition", "StartListening")


// Stop the current listening process.



  • controlCommand("Bing Speech Recognition", "StopListening")


// Pause listening so the wakeword (if enabled) does not trigger listening.



  • controlCommand("Bing Speech Recognition", "PauseListening")


// Unpause listening so the wakeword (if enabled) will trigger listening.



  • controlCommand("Bing Speech Recognition", "UnpauseListening")


// Return the status of the pause checkbox. (Returns Boolean [true or false])



  • controlCommand("Bing Speech Recognition", "GetPause")

Related Tutorials

Related Hack Events

Related Robots

Related Questions


ARC Pro

Upgrade to ARC Pro

Unleash your creativity with the power of easy robot programming using Synthiam ARC Pro

PRO
USA
#17  

I'm having an issue with VAD. When its on I get errors, mostly it hangs sending for a long time then returns an error: a task may only be disposed if it is in a completion state: ran to completion, fault or canceled.

If i turn off VAD i do not get the issue. But then have to press stop recording button which defeats the purpose.

Can anyone repeat this issue? Just about to give up on this and move on to another skill.

PRO
Portugal
#18  

No problems here. Just used the skill with "Auto Record Checkbox".

PRO
Synthiam
#19  

Will, are you using this robot skill on 2 or more ARC instances at the same time? That might cause the issue. This free version of the recognition uses a shared key and has a limit to how many users can use it at once. There's also a limit that each user can only be used once at a time. If you're using a production or dev environment, I'd recommend using the advanced speech recognition here: https://synthiam.com/Support/Skills/Audio/Advanced-Speech-Recognition?id=15894

That way, you're in charge of the key do not need to depend on sharing keys with other users.

#20  

Hello,

When trying to add Bing Speech Recognition I'm getting this error message.  Please help, thanks in advance.

Version: 2021.11.28.00

NAudio.MmException: InvalidHandle calling waveInStop at NAudio.Wave.WaveInEvent.StopRecording() at ARC.UCForms.FormBingSpeechRecognition.ConfigPressed() at ARC.UCForms.FormMasterBase.O5M2kVWMr4Pp0G1QpBRH(Object ) at ARC.UCForms.FormMasterBase.xQYxVChkhQn(Object , EventArgs ) at System.Windows.Forms.Control.OnClick(EventArgs e) at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks) at System.Windows.Forms.Control.WndProc(Message& m) at System.Windows.Forms.Label.WndProc(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m) at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)

PRO
Synthiam
#21   — Edited

Doesn't look like you have a sound card with an input device on that computer. This robot skill can only work if there is a working soundcard with a mic or line in.

#22  

Thanks DJ, I’ll get that fixed and try tomorrow.:)

PRO
Canada
#23  

Hey All, I'm pretty sure that anyone with experience with this skill should be able to answer this. Can you tell me how to activate the phrases within the Bing Speech Recognition skill? I must be missing something easy.

I have no trouble having the phrase recognized by Bing. Example: "Look up"

User-inserted image

But the action associated with the phrase "Look up" (in the configuration) isn't executed. The characters exactly match but it seems like the phrase isn't detected.

User-inserted image

I've even tried removing my "All recognized script" and "Script for not handled" but that didn't help.

PRO
Synthiam
#24   — Edited

Maybe you have a blank space at the end of "Look Up"? And it's written as "Look Up " Edit the phrase and remove the space if there is one.

User-inserted image