Thumbnail

Bing Speech Recognition

How to add the Bing Speech Recognition robot skill

  1. Load the most recent release of ARC (Get ARC).
  2. Press the Project tab from the top menu bar in ARC.
  3. Press Add Robot Skill from the button ribbon bar in ARC.
  4. Choose the Audio category tab.
  5. Press the Bing Speech Recognition icon to add the robot skill to your project.

Don't have a robot yet?

Follow the Getting Started Guide to build a robot and use the Bing Speech Recognition robot skill.


How to use the Bing Speech Recognition robot skill

This speech recognition skill for ARC uses the Bing Speech Recognition cloud service. It is one of the most accurate speech recognition services available.

Two Version Of This Skill

There are two versions of this robot skill, this one and the Advanced Speech Recognition. This version of the robot skill uses a shared license key with Microsoft that enables ARC users to experiment and demo this robot skill. Because this version of the skill shares a license key, users may encounter errors if more than one ARC instance uses this skill. For serious testing and development, we recommend setting up your key with Microsoft by using the Advanced Speech Recognition instead.

Microphone Recommendation

Most robots make a lot of noise, so locating the audio input device on a robot is not a practical solution. It is best to find the microphone on the controlling PC/Laptop, on yourself, or somewhere in the room (away from the robot). Turning the gain higher on the input device will allow voices to be recognized across large rooms and increase false positives. Test with different gains for the best resolution. Experiment with varying microphone locations and volumes for the best setup for your environment. Ideally, use a headset or Bluetooth mic rather than your laptop microphone.



Main Window




1. Start Recording Button
This button starts the Bing Speech Recognition; it will detect silence until you speak, then detect the words you are saying and display them in the Response Display.

2. Audio Waveform
This gives visual feedback that your audio input device (microphone) is configured correctly and is picking up voice/sounds.

3. Response Display
Here, you will get speech recognition feedback. It will show the text version of your detected words or silence. There is also information displayed to help dial in the wake word. The display log will show suggestions about the wake word detection and if speaking is too quiet or the confidence is too low. View the log for assistance getting the wake word working.

Configuration



Phrase List
 This is a list of default phrases that can be customized and added.

Not Handled Script
 This script will execute for every detected phrase not in the phrase list. This script will not be called if there is a match from the phrase list.

All Recognized Script
 This script will execute for all detected phrases. If there is no match for recognition, this script will still be executed. Reference the variable (Default $BingSpeech) to get the detected phrase in text.

Start Listening Script
 The script is executed every time the robot skill begins listening to convert text to speech. You can use this script to turn on an LED to indicate that the robot is listening or performing an action, etc.

Variable Field
 This variable holds the text from the speech recognizer. This may be used in your script to determine what was spoken. No speech was recognized if the variable was empty (i.e., silence). By default, this variable is $BingSpeech and is a global variable that can be retrieved in Python or JavaScript with the GetVar() command.

Auto Record Using VAD
 VAD stands for Voice Audio Detection. It is an algorithm that listens to the microphone input for an audio waveform that resembles speech vs. noise. While not 100% reliable, it is generally useful with handheld microphone usage. For robots using a global microphone, it is recommended to use the wake word feature. Check this box to enable VAD. When using VAD, the Bing speech recognizer will start listening as soon as it detects human speech and stop listening automatically when the speech stops OR the max recording time has been reached.

Auto Record Using Wake Word
 Like a home assistant, such as Alexa or Google Home, works, the robot will begin listening when the wake word is detected. You can check this box to enable the wake word and enter the wake word you wish to use on the corresponding text field.

Wake Word Sound
 When the wake word is detected, this selected sound will play out of the PC's default sound device.

Min Wake Word Confidence
 Listening for the wake word uses built-in speech recognition. If you know how the built-in speech recognition works, it detects phrases based on a confidence rating. The confidence rating is a value between 0-1. The default confidence is 0.75, so any wake word heard in which the recognizer is more than 0.75 confidence will trigger.

Play Wake Word Sound with ControlCommand()
 You can trigger the speech recognizer to listen to audio with a control command. If this checkbox is checked, the selected audio file will play when the control command instructs the robot skill to begin listening for speech to recognize.

Stop Punctuation
 Strips all punctuation from detected speech. This makes it easier to parse the text from the speech recognizer to look for specific words or phrases.

Setup Microphone
 Opens the Windows Dialog to configure the audio input and microphone input.

Max Recording Length (Seconds)
 Configure how many seconds the robot skill will listen. This is useful to ensure the recognizer does not sit and listen continuously to false positives.

Language Drop-down
 ARC uses the Microsoft Speech Recognition included with Windows. All languages supported by Windows Speech Recognition are also supported in ARC. You can configure Windows to listen to any language. ARC will default to EN-US (English) language if installed. Otherwise, ARC will default to the first installed language. If more than one language is installed, a language may be selected with this drop-down.

*Note 2: Languages supported by speech recognition depend on the Microsoft Windows operating system configuration. View the Microsoft speech recognition guide https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt to view supported languages.

How to Use Bing Speech Recongition


A detailed tutorial provides an example of using this robot skill with PandoraBot AI to converse with your robot. You can find the tutorial by clicking here.



Control Commands

Control commands will send instructions from one robot skill to another. Read more about the ARC control command feature by clicking here.

ControlCommand("Bing Speech Recognition", SetPhrase, $BingSpeech)

When this skill detects a phrase, it will be assigned to the variable, and the specified script will execute. With this model, you can have this skill send the detected speech to the Chat GPT robot skill, for example, using ControlCommand().

ControlCommand("Bing Speech Recognition", PauseListening)
ControlCommand("Bing Speech Recognition", UnpauseListening)

Controls the state of the PAUSE checkbox, which pauses the listening for VAD auto recording. If VAD is disabled, the PAUSE checkbox does not exist. The pause checkbox prevents the VAD from automatically starting recording.

ControlCommand("Bing Speech Recognition", StartListening)

Triggers the Start button, which begins listening to audio through the microphone to convert from speech to text. Sending this control command is the same behavior as pressing the Start button. It’s important to note that this script is non-blocking. If you wish to use the result of the detection, place your detected script in the All Recognized script, which is executed for all detected speech. You don’t have to call stop listening after calling this because the listening will time out with silence or when the timeout is expired.

ControlCommand("Bing Speech Recognition", StopListening)

Triggers the Stop button, which stops recording audio that will be converted from speech to text. The Stop button is only visible when the recording is active. Recording can be activated by pressing the Start button or auto-recording when VAD is enabled. This command is non-blocking, which means the calling script will not wait for this to complete, it just continues. So add your detected code to the All Recognized script. When this is called, it will stop any listening and process what’s been heard. You generally don’t need to call this because the start listening will timeout on its own and process the response.



Videos




Here's an example of the skill in action combined with the Cognitive Vision and Cognitive Emotion services.

Requirements


This service requires an internet connection, meaning a second USB WiFi adapter or an ethernet connection may be needed. Read about having two network connections here.

Headset or External Mic


A headset or external mic will produce better results than the internal PC/Laptop mic. A headset or mic will enable the recognition engine to "hear" your voice much clearer with less background noise. The background noise of the laptop, motors, radio, and room echo will cause the recognition software to return False Positives. This means the software recognizes an incorrect phrase. An external mic will also prevent the recognition software from hearing the robot speak. In short, it is important to use a Mic Headset or external Mic for a positive Speech Recognition experience.


Resources


Configure Audio Input Device

You might have to adjust the microphone input volume/gain. To change the mic volume, use the Microsoft Windows volume mixer, and first, make sure you have selected the correct input device. Your laptop or computer may have a few different mic devices. Maybe one is on a remote camera. Find the mic you'd like to use and adjust the volume. To find the volume settings that are ideal on your computer, follow these steps:

1) Right-click on the little speaker on your system tray

2) Select "Open Sound Settings."

3) In the "Input" section of the Sound Settings, you'll notice a little VU meter beside the active device. Make sure your active device is indeed the microphone you want to use. By making sounds, the VU meter should move.

4) Click on the "Device Properties" and locate the volume slider for the microphone. We usually have our volume set for 78. Play around with different volumes until you see your voice being picked up by the VU meter. Adjust the volume input level/gain to display your voice's regular volume near the middle of the VU Display graph. If the level/gain is too high, the recognition software will not work because the input audio will be distorted.

Related Tutorials

Related Hack Events

Related Robots

Related Questions


ARC Pro

Upgrade to ARC Pro

ARC Pro will give you immediate updates and new features needed to unleash your robot's potential!

PRO
USA
#2   — Edited

Good morning, in the PandoraBot control, I put - Audio.say(getVar("$BingSpeech"));

I having it speak out of the PC

User-inserted image

in the Bing speech I put ControlCommand("PandoraBot", SetPhrase, $BingSpeech)

I use AIMLbot so I can write my own responses

for AIML bot it is about the same:

Bing speech - ControlCommand("AimlBot", SetPhrase, $BingSpeech)

AimlBot - Audio.say(getVar("$BotResponse"));

thanks  EzAng

#3  

Thank you very much for your prompt reply. It seems that I had to get a ARC update. It works now, but the update has different "conf"

User-inserted image

screen. It looks like the original voice recognition skill.

The new question is if I set my Bing speech to send messages to aimlbot, how can i use bing to do other things like send messages to congnitive vision or servos? It now responds both from the aimbot and the "phrase" and "command".

PRO
Synthiam
#4  

User-inserted image

you’ll find labels and question marks next to options. This applies to all options across the entire ARC software

#5   — Edited

Thanks DJ.  That helped.

Have you ever seen this intermittent error:

Error in response received: There was an error during asynchronous processing. Unique state object is required for multiple asynchronous simultaneous operations to be outstanding. Error in response received: The underlying connection was closed: An unexpected error occurred on a receive.

PRO
USA
#6   — Edited

What is the maximum monthly queries for this service? And what happens to signify to the user that he/she has used it all up? Asking for a friend :/

PRO
Synthiam
#7  

It's a quote divided by users - so i think it's like 100 per day or something. The advanced bing speech recognition is what you'd want to use if you need more.

How you know is you get a message that says the quota is done.

PRO
USA
#8   — Edited

Today...I get a message saying it can't reach the server. It's been working flawlessly for weeks and now I get a long hang with a return can't reach the server. I have double-checked that I am connected to the internet, which I am. Any other reason to get his message?

Can anyone else check to see if it's working? Maybe its maintenance on the server?

Edit: now working...go figure!

PRO
USA
#9  

Edit 2: Now its not again....I'm getting:

Server was unable to process request. ---> A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or Canceled).

PRO
Synthiam
#10  

Maybe their server is having issues. Works fine now.

User-inserted image

PRO
USA
#11  

Ok DJ thanks for checking. I’ll try again in the AM.

#12  

DJ,

Is there any way to directly set up Bing Speech Recognition to use a wake word?

Thomas

PRO
Synthiam
#13   — Edited

In the Speech Recognition robot skill, add a phrase with a ControlCommand() that unpauses the bing speech recognition?

And then have the bing speech recognition re-pause itself after it detects a phrase?

Use the Control Command, here's a manual on how to use the ControlCommand and access available commands for each robot skill: https://synthiam.com/Support/Programming/control-command

Essentially, you can right-click in the editor or press the Cheat Sheet tab

PS, the question had been moved from Will's unrelated thread about his youtube channel to here.

#14  

@TMesserschmidt, DJ's suggestion is exactly how I set up my robot with a wake word. I used the name "Robot". Not real creative bit it works. LOL.

I also set a set of lights to flash for the amount of time Bing was listening. That way I know that Bing is actively listening and for how long. That really helped.

#15   — Edited

Here's the process of how I added a wake word.

Bing is normally paused and wont listen till the script in the Speech Recognition robot skill starts it or you push it's Start Recording button..

Simply add the Speech Recognition robot skill to your project. Make sure it's working and listening.

Add this Ez  code and modify it to work with your project.

Set(D12, on)   #D12, on turns on the scanner lights to let me know Bing has been called to listen,
ControlCommand("Bing Speech Recognition", StartListening)  #Starts Bing listening
ControlCommand("Speech Recognition", PauseMS, 2000)  #Paused this Speech Recognition control so it wout listen for a while 
Sleep(1500)  #Keeps script alive so the scanner lights will let me know how long Bing is listening.
Set(D12, off)  #D12 turns off the scanner lights

TIP: If you get false startups of this control just increase the Continence level until it hears the wake word and nothing else. Use a wake word that is not common or is not used too much around the robot.

Nothing else should be in the Speech Recognition robot skill except this wake word.

Works just like Alexa!

Have fun!!

PRO
USA
#17  

I'm having an issue with VAD. When its on I get errors, mostly it hangs sending for a long time then returns an error: a task may only be disposed if it is in a completion state: ran to completion, fault or canceled.

If i turn off VAD i do not get the issue. But then have to press stop recording button which defeats the purpose.

Can anyone repeat this issue? Just about to give up on this and move on to another skill.

Portugal
#18  

No problems here. Just used the skill with "Auto Record Checkbox".

PRO
Synthiam
#19  

Will, are you using this robot skill on 2 or more ARC instances at the same time? That might cause the issue. This free version of the recognition uses a shared key and has a limit to how many users can use it at once. There's also a limit that each user can only be used once at a time. If you're using a production or dev environment, I'd recommend using the advanced speech recognition here: https://synthiam.com/Support/Skills/Audio/Advanced-Speech-Recognition?id=15894

That way, you're in charge of the key do not need to depend on sharing keys with other users.

#20  

Hello,

When trying to add Bing Speech Recognition I'm getting this error message.  Please help, thanks in advance.

Version: 2021.11.28.00

NAudio.MmException: InvalidHandle calling waveInStop at NAudio.Wave.WaveInEvent.StopRecording() at ARC.UCForms.FormBingSpeechRecognition.ConfigPressed() at ARC.UCForms.FormMasterBase.O5M2kVWMr4Pp0G1QpBRH(Object ) at ARC.UCForms.FormMasterBase.xQYxVChkhQn(Object , EventArgs ) at System.Windows.Forms.Control.OnClick(EventArgs e) at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks) at System.Windows.Forms.Control.WndProc(Message& m) at System.Windows.Forms.Label.WndProc(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m) at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)

PRO
Synthiam
#21   — Edited

Doesn't look like you have a sound card with an input device on that computer. This robot skill can only work if there is a working soundcard with a mic or line in.

#22  

Thanks DJ, I’ll get that fixed and try tomorrow.:)

PRO
Canada
#23  

Hey All, I'm pretty sure that anyone with experience with this skill should be able to answer this. Can you tell me how to activate the phrases within the Bing Speech Recognition skill? I must be missing something easy.

I have no trouble having the phrase recognized by Bing. Example: "Look up"

User-inserted image

But the action associated with the phrase "Look up" (in the configuration) isn't executed. The characters exactly match but it seems like the phrase isn't detected.

User-inserted image

I've even tried removing my "All recognized script" and "Script for not handled" but that didn't help.

PRO
Synthiam
#24   — Edited

Maybe you have a blank space at the end of "Look Up"? And it's written as "Look Up " Edit the phrase and remove the space if there is one.

User-inserted image

#25   — Edited

One trick I learned was to speak the phrase I wanted into Bing. Then I would copy and paste exactly what Bing returned into the command box. I would get 100% recognition that way. A lot of times what Bing returned and what I thought was the correct way to write the phrases were completely different.  Bing would return words starting with capital letters, weird punctuation and such. I would copy what bing thought was correct regardless of proper sentence and word structure. And DJ is correct. Make sure there are no spaces at the beginning or end.

#26  

Have you tried to execute your code within the editor? Does that work?

PRO
Canada
#27  

Thanks @DJ, that was the issue, I knew it had to be something simple!

I had cut and pasted everything into the phrase list, line by line. I just figured out how it happened. I was cutting and pasting text that was at the beginning of a line of text. When you use cut & paste like that you can inadvertently grab the spaces between words because you are removing all the text.

That's my bad, something to keep in mind for the future.

PRO
Synthiam
#28  

I updated ARC for the next release to strip whitespaces from the phrases so it won't happen again

#29  

Hi,

I wanted to use a different wake word with Bing, so I set it to "Simone." The problem is that when I save the changes, the wake word is not saved. What am I doing wrong?

Thomas Messerschmidt

#30  

I've forwarded this to the team, and someone will look into it for you.

PRO
Canada
#31  

anyone get VAD to work reliably.  Robot or other word works fine but there is a delay and you have to wait until after the beep. If it is a short sentence you need to wait until it times out to upload, if it is a long sentence you will be cut off.   VAD works sometimes if you talk close to mic but often doesn't always recognize human voice and often drops out mid sentence.

Was really hoping VAD would start recording when it hears a human voice and stop recording and upload when there is silence for 1 second.

#32  

Hey Nink same problem I had with 4 types of microphones the vad very unreliable but have not tried any really expensive microphones yet. So was unsure why it is only working with manual pressing record button, vad was very terrible so far.

PRO
Canada
#33  

I loaded the Watson Speech to text and worked like a charm. It waits until you start talking and when you stop it auto uploads so you can have a conversation with the robot. You get 500 minutes a month free.  PTP wrote this one.

#34  

I will try it out then  Nink, will give a try on my little panda pc .

#35  

It doesn't work in this skill:

User-inserted image

PRO
Synthiam
#36   — Edited

As tested it does work. Perhaps you have the confidence recognition threshold too high?

PRO
Australia
#37  

I am a little confused as to the Configuration in the Bing Speech Recognition skill. I get a different Config screen to the one shown above. I get the Config shown below. I can't set the silence count or level threshold. The way it stands, the Bing Speech Recognition skill picks up my Wake Words all the time, even if I am just typing or saying completely different words. Not much use.

User-inserted image

#38   — Edited

@afcorson, I've also struggled with relying on this skill's built in wake word and VAD to record or start scripts. After trying a a few times I went back to my original method of starting up Bing. You can see that outlined in Post 15 of this thread. It works 100% of the time. As mentioned in that post. I can start Bing listening using the Voice Recognition skill with it listening for just one word (I use Robot) and Bing will manage it's own recording time by setting it in it's Config page.

If you would like I can expound on my setup but I'm not able to do so right now as I'm not near my laptop.

PRO
Australia
#39   — Edited

Are the auto record using threshold and silence count options not available anymore with this Skill? I want the Bing Speech Recognition to stop when I stop speaking. Otherwise it waits 9 secs (or whatever I set it to) before sending the spoken text to the Chatbot. Another problem I am having is using ChatGPT. I set it up to work on my laptop, but when I copy the program to my Pepper robot (which has ARC runtime installed), the API key is not there so it doesn't work. I figured out I needed to create a new API key on the actual robot. Then it worked. I gave up on voice activation of Bing Speech Recognition and decided to use buttons to start a conversation and end it. IT was easy to start it with a keyword but not possible to end it until it timed out.

PRO
Australia
#40   — Edited

Oh! and here's another annoying problem. ChatGPT sometimes returns quote marks in its responses (ASCII 34). Unless these are removed, my SAY command falls over. Any ideas on removing quote marks from text? Well, I spent 3 hours on this problem and came up with this...

var ChatGPTresponse = getVar("$OpenAIResponse"); var noQuotes = ChatGPTresponse.replace(/["']+/g, ''); setVar("$Text", noQuotes); controlCommand("Dialogue", "ScriptStartWait");

Note: The Dialogue script says the $Text

#41  

Good questions. I'll ask one of them to Athena.

@Athena, Are the auto record using threshold and silence count options not available anymore in Bing Speech Recognition? I want the Bing Speech Recognition to stop when I stop speaking.

PRO
Canada
#42  

The Bing Speech Recognition skill in ARC does not have an auto record using threshold and silence count options. The skill will automatically stop recording after a set duration, which is currently locked to 10 seconds. This means it will record for the full duration, even if you stop speaking before the time is up.

If you want to have more control over when the recording starts and stops, you might consider using a different method to trigger the start of the recording, such as a button press or a specific command.

As for stopping the recording when you stop speaking, this is not currently a feature of the Bing Speech Recognition skill in ARC. However, your feedback is valuable and I will pass it on to the development team for consideration in future updates.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Australia
#43  

Well I did work that out myself. Good to have it confirmed though.

PRO
Australia
#44   — Edited

More effort is required to deal with ChatGPT responses. The following code removes quotations and replaces carriage returns with spaces...

var ChatGPTresponse = getVar("$OpenAIResponse"); var noQuotes = ChatGPTresponse.replace(/["']+/g, ''); var noQuotes = noQuotes.replace(/[\n\r]/g, ' '); setVar("$Text", noQuotes); controlCommand("Dialogue", "ScriptStartWait");

However this still not enough as ChatGPT can respond with very long answers. Too long for a listener to deal with. So I decided to limit the response to 120 words. To achieve this I appended "write 120 words. " to the started of my question as shown below:

controlCommand("OpenAI ChatGPT", "Send", "write 120 words. " + $BingSpeech)