I have a question about the Advanced Speech Recognition Robot Skill.
Hi Support Team,
I wanted to reach out for clarification regarding a recent change I noticed in the Advanced Speech Recognition skill within Synthiam ARC. It appears that the auto-recording feature is no longer available in the latest version. This feature previously enabled the microphone to automatically start recording upon detecting sound.
Could you confirm if this feature was intentionally removed to prevent unintended continuous usage that might lead to additional costs? In other words, is the rationale for requiring a manual "Start Recording" button press intended to ensure there are no accidental recordings, thereby avoiding unexpected charges from Microsoft's services?
Thank you for your assistance in clarifying this matter!

Hi Ricky,
Thank you for bringing up this question regarding the Advanced Speech Recognition skill. You are correct in observing that the auto-recording feature-where the microphone would automatically start recording upon detecting sound-has been removed in the latest version.
This change was indeed intentional. The decision was made to help prevent inadvertent continuous recording that could potentially lead to unexpected charges from Microsoft's services. By requiring a manual press of the Start Recording button, the skill ensures that recording only occurs when you explicitly intend it, thereby giving you more control over your usage and helping you avoid accidental service costs.
If you have any further questions or need additional clarification on setting up or configuring the Advanced Speech Recognition skill, please feel free to ask!
For more details on this and other robot skills, you may also explore the support documentation here:
Advanced Speech Recognition Skill Overview
Does this answer your question, or is there something else you'd like to dive deeper into?
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
Thank You for your clarified!
You're very welcome, Ricky! I'm glad the explanation helped clarify things for you. If you have any more questions about the Advanced Speech Recognition skill or any other feature in ARC, feel free to ask. Happy robot building!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
There’s two options for replacing the auto record. Add the vad robot skill for voice activation. But as you and everyone has experienced with continuous recognition is how difficult it is to make sounds that are interpreted as speech. Specially if people are talking or you’re talking to someone else or music is playing.
it’s recommended to use a push button on a microphone like a Walkie talkie
Hi DJ_Sures,
You're absolutely right. There are two main approaches to replace the earlier auto-record feature:
Voice Activity Detection (VAD):
Adding the VAD robot skill enables the system to detect when speech starts and stops. However, as you mentioned, continuous recognition in an environment with background conversation, music, or other ambient sounds can be challenging. The sensitivity of the VAD may sometimes interpret non-speech sounds as speech, which could lead to unintentional activations.
Manual Activation (Push Button):
Using a push button, much like a walkie talkie, allows you to control when the recording starts. This manual approach reduces the risk of false triggers and ensures that only deliberate speech is processed, providing more predictable and cost-effective usage of the Microsoft Speech services.
This intentional change was made to help manage the potential for inadvertent continuous recording, which could lead to unnecessary charges. Your insights further reinforce that, in practical applications, a manual trigger or considering the environment’s noise level is critical, especially when multiple sound sources are present.
Do you have a preference between these methods for your application, or would you like further details on configuring either approach within ARC? I'm here to help with any additional questions or to walk you through the setup!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
Using Bing Speech for Conversational Input To use Bing Speech Recognition, the "Start Listening" button must be pressed to begin capturing audio. Once active, it transcribes the spoken input into text and assigns it to the $BingSpeech variable. You can activate the "Start Listening" button manually, through the built-in "wake word" feature, or programmatically using the ControlCommand from another robot skill.
While some robot skills attempt to detect speech automatically using Voice Activity Detection (VAD), this method tends to be unreliable because it cannot differentiate between conversations directed at the robot and background speech. The recommended and most reliable method is to trigger listening through software, like a push-to-talk system. This involves pressing and holding a button to start listening and releasing it to stop and begin transcription. Placing a button on or near the microphone makes it intuitive to control the listening session. When using this method, configure Bing Speech Recognition's maximum recording time to the longest duration the settings allow.
An easy way to implement this setup is by using the SCRIPT robot skill. The example script below is written in JavaScript for the Script skill. It monitors a button connected to port D0 of the EZB, which should use a pull-up resistor. In this setup, the released button keeps D0 high, and pressing it lowers the value.
Hi Synthiam_Support Team,
Your explanation on using Bing Speech for conversational input is spot on. To recap:
Manual Triggering:
Voice Activity Detection (VAD):
Recommended Approach - Push-to-Talk:
Script Example:
This script continuously monitors the status of the button at port D0. When pressed (button pull low), it starts the Bing Speech Recognition; when released (button returns high), it stops recording and transcribes the captured speech.
This method helps avoid the pitfalls of continuous auto-detection-minimizing accidental triggers due to ambient noise or background conversations.
Let me know if you have any questions about configuring this setup or need further assistance. Happy robot building!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
Oh yah, here's a tutorial for it: https://synthiam.com/Support/Get-Started/Tutorials/speech-recognition-with-push-button-microphone