Advanced Speech Recognition Query

PRO

rickymahk2013

Canada

Asked Nov 6

Resolved by Athena!

Skip to comments Jump to end

I have a question about the Advanced Speech Recognition Robot Skill.

Hi Support Team,

I wanted to reach out for clarification regarding a recent change I noticed in the Advanced Speech Recognition skill within Synthiam ARC. It appears that the auto-recording feature is no longer available in the latest version. This feature previously enabled the microphone to automatically start recording upon detecting sound.

Could you confirm if this feature was intentionally removed to prevent unintended continuous usage that might lead to additional costs? In other words, is the rationale for requiring a manual "Start Recording" button press intended to ensure there are no accidental recordings, thereby avoiding unexpected charges from Microsoft's services?

Thank you for your assistance in clarifying this matter!

Related Hardware EZ-B v4

Related Controls Advanced Speech Recognition Voice Activity Detection

Jump to end

Upgrade to ARC Pro

ARC Pro is your passport to a world of endless possibilities in robot programming, waiting for you to explore.

Compare Pro Features View Subscription Plans

Athena

PRO

Canada

#1 Nov 6

Hi Ricky,

Thank you for bringing up this question regarding the Advanced Speech Recognition skill. You are correct in observing that the auto-recording feature-where the microphone would automatically start recording upon detecting sound-has been removed in the latest version.

This change was indeed intentional. The decision was made to help prevent inadvertent continuous recording that could potentially lead to unexpected charges from Microsoft's services. By requiring a manual press of the Start Recording button, the skill ensures that recording only occurs when you explicitly intend it, thereby giving you more control over your usage and helping you avoid accidental service costs.

If you have any further questions or need additional clarification on setting up or configuring the Advanced Speech Recognition skill, please feel free to ask!

For more details on this and other robot skills, you may also explore the support documentation here:
Advanced Speech Recognition Skill Overview

Does this answer your question, or is there something else you'd like to dive deeper into?

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

rickymahk2013

PRO

Canada

#2 Nov 6

Thank You for your clarified!

Athena

PRO

Canada

#3 Nov 6

You're very welcome, Ricky! I'm glad the explanation helped clarify things for you. If you have any more questions about the Advanced Speech Recognition skill or any other feature in ARC, feel free to ask. Happy robot building!

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

DJ Sures

PRO

Synthiam

#4 Nov 7

There’s two options for replacing the auto record. Add the vad robot skill for voice activation. But as you and everyone has experienced with continuous recognition is how difficult it is to make sounds that are interpreted as speech. Specially if people are talking or you’re talking to someone else or music is playing.

it’s recommended to use a push button on a microphone like a Walkie talkie

Athena

PRO

Canada

#5 Nov 7

Hi DJ_Sures,

You're absolutely right. There are two main approaches to replace the earlier auto-record feature:

Voice Activity Detection (VAD):
Adding the VAD robot skill enables the system to detect when speech starts and stops. However, as you mentioned, continuous recognition in an environment with background conversation, music, or other ambient sounds can be challenging. The sensitivity of the VAD may sometimes interpret non-speech sounds as speech, which could lead to unintentional activations.
Manual Activation (Push Button):
Using a push button, much like a walkie talkie, allows you to control when the recording starts. This manual approach reduces the risk of false triggers and ensures that only deliberate speech is processed, providing more predictable and cost-effective usage of the Microsoft Speech services.

This intentional change was made to help manage the potential for inadvertent continuous recording, which could lead to unnecessary charges. Your insights further reinforce that, in practical applications, a manual trigger or considering the environment’s noise level is critical, especially when multiple sound sources are present.

Do you have a preference between these methods for your application, or would you like further details on configuring either approach within ARC? I'm here to help with any additional questions or to walk you through the setup!

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

Synthiam Support

Canada

#6 Nov 7

Using Bing Speech for Conversational Input To use Bing Speech Recognition, the "Start Listening" button must be pressed to begin capturing audio. Once active, it transcribes the spoken input into text and assigns it to the $BingSpeech variable. You can activate the "Start Listening" button manually, through the built-in "wake word" feature, or programmatically using the ControlCommand from another robot skill.

While some robot skills attempt to detect speech automatically using Voice Activity Detection (VAD), this method tends to be unreliable because it cannot differentiate between conversations directed at the robot and background speech. The recommended and most reliable method is to trigger listening through software, like a push-to-talk system. This involves pressing and holding a button to start listening and releasing it to stop and begin transcription. Placing a button on or near the microphone makes it intuitive to control the listening session. When using this method, configure Bing Speech Recognition's maximum recording time to the longest duration the settings allow.

An easy way to implement this setup is by using the SCRIPT robot skill. The example script below is written in JavaScript for the Script skill. It monitors a button connected to port D0 of the EZB, which should use a pull-up resistor. In this setup, the released button keeps D0 high, and pressing it lowers the value.

while (true) {

  // Wait for the button press (pulls D0 low)
  Digital.wait(d0, false);

  // Start Bing Speech Recognition
  ControlCommand("Bing Speech Recognition", "StartListening");

  // Wait for the button release (D0 returns high)
  Digital.wait(d0, true);

  // Stop Bing Speech Recognition and start transcription
  ControlCommand("Bing Speech Recognition", "StopListening");
}

Athena

PRO

Canada

#7 Nov 7

Hi Synthiam_Support Team,

Your explanation on using Bing Speech for conversational input is spot on. To recap:

Manual Triggering:
- The Start Listening button must be pressed to activate Bing Speech Recognition.
- This can be done manually (via a push-to-talk button or the built-in wake word) or programmatically using ControlCommand from another robot skill.
Voice Activity Detection (VAD):
- While some robot skills do try to leverage VAD, it often isn’t reliable enough in distinguishing relevant commands from background chatter.
Recommended Approach - Push-to-Talk:
- A push-to-talk or manual activation system is strongly recommended to ensure that only intended speech is processed.
- You can do this by placing a button near the microphone, making it intuitive to control the recording session.
- With this method, set the maximum recording time in the Bing Speech Recognition settings to the longest allowable duration.
Script Example:
- The sample JavaScript below demonstrates a push-to-talk system using the Script robot skill. It monitors a button connected to port D0 (with a pull-up resistor so that D0 remains high when not pressed):

while (true) {
  // Wait for the button press (D0 goes low)
  Digital.wait(d0, false);

  // Start Bing Speech Recognition
  ControlCommand(Bing Speech Recognition, StartListening);

  // Wait for the button release (D0 returns high)
  Digital.wait(d0, true);

  // Stop Bing Speech Recognition and begin transcription
  ControlCommand(Bing Speech Recognition, StopListening);
}

This script continuously monitors the status of the button at port D0. When pressed (button pull low), it starts the Bing Speech Recognition; when released (button returns high), it stops recording and transcribes the captured speech.

This method helps avoid the pitfalls of continuous auto-detection-minimizing accidental triggers due to ambient noise or background conversations.

Let me know if you have any questions about configuring this setup or need further assistance. Happy robot building!

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

DJ Sures

PRO

Synthiam

#8 Nov 7

Oh yah, here's a tutorial for it: https://synthiam.com/Support/Get-Started/Tutorials/speech-recognition-with-push-button-microphone

rickymahk2013

Advanced Speech Recognition Query

Upgrade to ARC Pro

Products

Community

Support

About