Bing Speech Recognition
How to add the Bing Speech Recognition robot skill
- Load the most recent release of ARC (Get ARC).
- Press the Project tab from the top menu bar in ARC.
- Press Add Robot Skill from the button ribbon bar in ARC.
- Choose the Audio category tab.
- Press the Bing Speech Recognition icon to add the robot skill to your project.
Don't have a robot yet?
Follow the Getting Started Guide to build a robot and use the Bing Speech Recognition robot skill.
How to use the Bing Speech Recognition robot skill
This speech recognition skill for ARC uses the Bing Speech Recognition cloud service. It is one of the most accurate speech recognition services available.
Two Version Of This Skill
There are two versions of this robot skill, this one and the Advanced Speech Recognition. This version of the robot skill uses a shared license key with Microsoft that enables ARC users to experiment and demo this robot skill. Because this version of the skill shares a license key, users may encounter errors if more than one ARC instance uses this skill. For serious testing and development, we recommend setting up your key with Microsoft by using the Advanced Speech Recognition instead.
Most robots make a lot of noise, so locating the audio input device on a robot is not a practical solution. It is best to find the microphone on the controlling PC/Laptop, on yourself, or somewhere in the room (away from the robot). Turning the gain higher on the input device will allow voices to be recognized across large rooms and increase false positives. Test with different gains for the best resolution. Experiment with varying microphone locations and volumes for the best setup for your environment. Ideally, use a headset or Bluetooth mic rather than your laptop microphone.
1. Start Recording Button
This button starts the Bing Speech Recognition; it will detect silence until you speak, then detect the words you are saying and display them in the Response Display.
2. Audio Waveform
This gives visual feedback that your audio input device (microphone) is configured correctly and is picking up voice/sounds.
3. Response Display
Here, you will get speech recognition feedback. It will show the text version of your detected words or silence. There is also information displayed to help dial in the wake word. The display log will show suggestions about the wake word detection and if speaking is too quiet or the confidence is too low. View the log for assistance getting the wake word working.
1. Phrase List
This is a list of default phrases that can be customized and added.
2. Command List
This is a list of default commands corresponding to the phrases in the same row, with the ability to customize and add more commands.
3. All Recognized Script
This script will execute for every detected phrase. If there is no match for recognition, this script will still be executed, but the variable will be empty. You can tell if no speech was spoken because the variable will be empty, and this script will be executed.
4. Variable Field
This variable holds the text from the speech recognizer. This may be used in your script to determine what was spoken. No speech was recognized if the variable was empty (i.e., silence).
5. Auto Record Checkbox
If this checkbox is enabled, the skill will begin recording audio when it has been detected. The threshold is adjusted in the Level Threshold setting.
6. Level Threshold Field
This adjustable threshold level is used for auto-recording (if enabled) and stopping recording. If you find that even in manual mode, the recording stops too quickly before your sentence is complete, lower this value.
7. Silence Count Field
This count sets how often the energy volume is below the threshold before recording is stopped.
8. Max Recording Field
This represents the maximum duration of the recording. *Note: Currently locked to 10 seconds.
9. Strip Punctuation Checkbox
If this checkbox is enabled, punctuation will be stripped from the detected recording when it is displayed on the Response Display. This is helpful when you're looking to detect certain words or phrases.
10. Setup Microphone Button
This button is a shortcut to the properties of your installed audio input devices. Verify that your device works by watching the soundbar for movement when you speak into that device.
11. List Management Buttons
These buttons manage the rows of phrases. They move the rows up and down, insert them, add more to the bottom, and delete them.
12. Language Drop-down
ARC uses the Microsoft Speech Recognition included with Windows. All languages supported by Windows Speech Recognition are also supported in ARC. You can configure Windows to listen to any language. ARC will default to EN-US (English) language if installed. Otherwise, ARC will default to the first installed language. If more than one language is installed, a language may be selected with this drop-down.
*Note 2: Languages supported by speech recognition depend on the Microsoft Windows operating system configuration. View the Microsoft speech recognition guide here to view supported languages.
13. Not Handled Script
This script will execute for every detected phrase, not in the phrase list. This script will not be called if there is a match from the phrase list.
How to Use Bing Speech Recongition
A detailed tutorial provides an example of using this robot skill with PandoraBot AI to converse with your robot. You can find the tutorial by clicking here.
Control commands will send instructions from one robot skill to another. Read more about the ARC control command feature by clicking here.
ControlCommand("Bing Speech Recognition", SetPhrase, $BingSpeech)
When this skill detects a phrase, it will be assigned to the variable, and the specified script will execute. With this model, you can have this skill send the detected speech to the Chat GPT robot skill, for example, using ControlCommand().
ControlCommand("Bing Speech Recognition", PauseListening)
ControlCommand("Bing Speech Recognition", UnpauseListening)
Controls the state of the PAUSE checkbox, which pauses the listening for VAD auto recording. If VAD is disabled, the PAUSE checkbox does not exist. The pause checkbox prevents the VAD from automatically starting recording.
ControlCommand("Bing Speech Recognition", StartListening)
Triggers the Start button, which begins listening to audio through the microphone to convert from speech to text. Sending this control command is the same behavior as pressing the Start button.
ControlCommand("Bing Speech Recognition", StopListening)
Triggers the Stop button, which stops recording audio that will be converted from speech to text. The Stop button is only visible when the recording is active. Recording can be activated by pressing the Start button or auto-recording when VAD is enabled.
Here's an example of the skill in action combined with the Cognitive Vision and Cognitive Emotion services.
This service requires an internet connection, meaning a second USB WiFi adapter or an ethernet connection may be needed. Read about having two network connections here.
Headset or External Mic
A headset or external mic will produce better results than the internal PC/Laptop mic. A headset or mic will enable the recognition engine to "hear" your voice much clearer with less background noise. The background noise of the laptop, motors, radio, and room echo will cause the recognition software to return False Positives. This means the software recognizes an incorrect phrase. An external mic will also prevent the recognition software from hearing the robot speak. In short, it is important to use a Mic Headset or external Mic for a positive Speech Recognition experience.
Configure Audio Input Device
You might have to adjust the microphone input volume/gain. To change the mic volume, use the Microsoft Windows volume mixer, and first, make sure you have selected the correct input device. Your laptop or computer may have a few different mic devices. Maybe one is on a remote camera. Find the mic you'd like to use and adjust the volume. To find the volume settings that are ideal on your computer, follow these steps:
1) Right-click on the little speaker on your system tray
2) Select "Open Sound Settings."
3) In the "Input" section of the Sound Settings, you'll notice a little VU meter beside the active device. Make sure your active device is indeed the microphone you want to use. By making sounds, the VU meter should move.
4) Click on the "Device Properties" and locate the volume slider for the microphone. We usually have our volume set for 78. Play around with different volumes until you see your voice being picked up by the VU meter. Adjust the volume input level/gain to display your voice's regular volume near the middle of the VU Display graph. If the level/gain is too high, the recognition software will not work because the input audio will be distorted.
You may train your computer for speech recognition by using the training wizard. Find the training wizard under Speech Recognition within the Windows Control Panel.