
PRO
joesox
USA
Asked
Hello Athena,
I'm seeking guidance on how to develop a script or understand the process within ARC for capturing voice input using an external microphone, as recommended. My goal is for the JD robot's built-in speaker to repeat what it hears or to listen continuously. I'm currently experimenting with the Speech Recognition module, but I'm struggling with capturing spontaneous sentences rather than just relying on pre-programmed prompts. Could you provide some insights or direction on this matter? Thank you!
Related Hardware JD Humanoid
Related Controls
Advanced Speech Recognition
Script Collection
Hi joesox,
I'd be happy to help you set up two scripts in your Script Collection. Before I post these sample scripts, could you clarify what your desired outcome is? For example, are you calling these scripts from an HTTP server skill to return the contents of the $BingSpeech variable and then clear it? Also, how are you integrating the HTTP commands with these scripts?
In the meantime, here are two sample scripts written in JavaScript (as used in ARC) that might match your needs. You can adapt these to your project.
Script: getBingSpeech
This script retrieves the value from the global variable $BingSpeech and prints (or returns) it. You could call this as part of your HTTP response process.
Script: clearBingSpeech
This script clears the global variable after processing.
A Few Questions to Better Tailor the Scripts
Let me know if these sample scripts fit your needs or if you need additional modifications or integration details with the HTTP commands.
Looking forward to your response, Athena
Synthiam ARC Support Technician
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
It isn’t clear but you should be processing the response from the bing speech recognition in the speech recognition response script. The script is executed everytime there is a response. That would ensure you’re only processing the latest response text.
if you are processing the global variable in a separate script, you won’t know when there was a new response.
Hi joesox,
Synthiam_Support makes a very good point. The recommended approach is to process the Bing Speech response right within the response script of the Bing Speech Recognition skill-this way, you always work with the latest response. Separating the processing logic into distinct get/clear scripts can risk acting on stale data if a new response isn’t synchronized.
That said, if you still wish to have separate scripts (perhaps for an HTTP endpoint that your project triggers), you can do so with the understanding that you need to ensure your HTTP commands or integration logic can tell when a new response is received.
Below are two sample JavaScript scripts for ARC’s Script Collection that demonstrate how you might retrieve and clear the Bing Speech variable. They assume you’re using ARC’s JavaScript scripting (the same idea applies if you switch to Python).
Script: getBingSpeech
This script retrieves the current value of the Bing Speech variable and returns it.
Script: clearBingSpeech
This script clears the Bing Speech variable after you have processed it.
Recommended Approach (Processing Inside the Response Script)
Since Bing Speech Recognition executes its response script every time it gets a new response, you might want to incorporate your processing directly in that script. For example:
Using this method ensures that you always process the latest response immediately as it arrives rather than relying on a separate polling mechanism or HTTP command that might inadvertently pick up an old result.
Key Considerations
Real-Time Processing: The response script built into the Bing Speech Recognition skill is triggered only when new audio is processed. This avoids the risk of handling outdated responses.
Synchronization: If you intend to access the Bing Speech data via an HTTP server skill or another external trigger, ensure that the timing of calls is synchronized with speech recognition events. Otherwise, you might end up with empty or old data.
Modularity vs. Efficiency: While it's possible to segment functionality into separate scripts (getBingSpeech and clearBingSpeech), in many cases it is more efficient and less error-prone to integrate the processing logic within the response script of the Bing Speech Recognition module.
Let me know if this addresses your needs or if you have any further questions regarding integrating these scripts with an HTTP server or other robot skills!
Best regards,
Athena
Synthiam ARC Support Technician
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
The scripts are working, however, I am still having difficulty grabbing the heard text from the Bingspeech skill. Perhaps answering your previous questions may help. Currently, I only receive "OK" as a result for the getBingSpeech script you suggested earlier. Q: Do you intend to call these scripts from an HTTP server skill as part of a web API endpoint? A: I have a python agent running that communicates to JD, via http ControlCommand, for the agent's senses to explore the outside world. When my python agent executes 'ControlCommand("Script Collection", "ScriptStartWait", "getBingSpeech")', it always returns 'OK', and I can't seem to access the capturedText. I could use some more suggestions or what I am missing.
Q: Do you have any additional processing or logging requirements for the BingSpeech variable? A: When ARC skill BingSpeech hears something when the agent sends a 'ControlCommand("Script Collection", "ScriptStartWait", "getBingSpeech")', once the text is known, that capuredText needs to be immediately sent to the agent's running process somehow.
Q: Are you using JavaScript or Python in your ARC project? (These examples are in JavaScript.) A: I am just using Blockly and javascript in ARC. In my third-party agent application, I am using python.
Below is some guidance and suggestions to help you get the captured Bing Speech text back in your python agent:
Make sure your getBingSpeech script explicitly returns the value from the global variable. For example:
If you don’t explicitly return a value, ARC may simply return "OK" even though the script runs.
Check that the Bing Speech Recognition skill’s response script is actually setting the global variable $BingSpeech. For example, in the response script you might have:
This will help ensure that when your python agent calls getBingSpeech via
ControlCommand(Script Collection,ScriptStartWait,getBingSpeech), the variable already contains the spoken text.
If the global $BingSpeech variable is cleared immediately after processing by the Bing Speech Recognition skill’s own response script, your external call from python may see an empty value.
Option 1: Process the speech immediately in the response script without relying on an external getBingSpeech script.
Option 2: Delay clearing $BingSpeech until after your agent retrieves it. You might store the result in another global variable (for example, $CapturedText) that your getBingSpeech script reads.
Example: In your response script, do something like the following:
Now, update your getBingSpeech script to return $CapturedText:
When your python agent calls:
ControlCommand(Script Collection,ScriptStartWait,getBingSpeech)
ARC executes the getBingSpeech script and returns whatever is provided by the return statement. If your script only prints output (or doesn’t return a value), ARC’s default response will be "OK". In your setup, make sure that:
For troubleshooting, consider the following checklist: Manually trigger speech into the Bing Speech Recognition skill and confirm that $BingSpeech (or your alternate global variable) has the expected text using ARC’s Variable Watcher. Test the getBingSpeech script directly (by calling ScriptStartWait) after you know speech has been captured. Confirm that your python agent is reading the HTTP response body and not just the "OK" status from the HTTP header.
If possible, integrate a callback for real-time processing directly within the Bing Speech Recognition response script. This approach minimizes any delay between speech capture and processing.
Or, have your response script update a global variable that the python agent polls at appropriate intervals.
Summary
The key changes are to (a) add an explicit return statement in your getBingSpeech script, (b) ensure the Bing Speech Recognition skill is leaving the captured text available when your python agent makes its HTTP call, and (c) verify that your python agent reads the HTTP response body correctly.
I hope these suggestions help. If you have any more questions or need further clarification on any step, feel free to ask!
Best regards,
Athena
Synthiam ARC Support Technician
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
It appears that the client/server model may be backward. The Python "agent" should be receiving push messages from ARC instead of polling. Your current configuration suggests that the Python "agent" is polling the HTTP server to retrieve the variable containing the speech recognition text. Instead, the Python "agent" should have an HTTP server and listen to post messages from ARC.
The HTTP Post JavaScript or Python in ARC can post the value to the Python "Agent". Here is the method I'm mentioning, which also contains an example: https://synthiam.com/Support/javascript-api/Net/hTTPPost
Athena can provide an example for the ARC and Python "agent" using this approach.
Below is an example of how you can change your approach from polling to a push-based model. In this example, ARC will send the captured speech text via an HTTP POST message to a Python agent that is running an HTTP server. This way, whenever your speech is processed in ARC, it is pushed directly to your Python agent.
ARC (JavaScript) - Pushing Speech Data Using HTTP POST
Assume that your ARC project captures speech into a variable (for example, $CapturedText). You can add a script that uses ARC’s built-in HTTP post function (Net.hTTPPost) to send this text to your Python agent’s endpoint.
Below is an example JavaScript script to be used in ARC:
Usage:
You can trigger this script (for example, via a Script Collection or directly from an event in your speech recognition response script) as soon as new speech data is captured.
Python Agent - Receiving the Speech Data
Below is an example using Python with Flask to set up a simple HTTP server that listens for POST requests on the
/speech
endpoint.Usage:
/speech
.Key Considerations
Network Connectivity:
Ensure that your ARC machine can reach the Python agent’s HTTP server. This might involve adjusting firewall settings or using a specific network interface if the devices are on different networks.
Variable Timing:
Make sure that the $CapturedText variable is set by your Bing Speech Recognition response script before the HTTP POST is sent. You might need to adjust your script logic so that the captured text remains available until it is pushed.
Feedback and Confirmation:
The HTTP POST returns a response (which is logged in ARC). This can be used as confirmation that the Python agent received the message, or to perform debugging if it does not.
Extending the Example:
If you need additional parameters or more complex data, consider using JSON for your POST payload. In ARC JavaScript, you would need to format a JSON string, and in your Python agent, use
request.get_json()
.Summary
Use the Net.hTTPPost JavaScript function to push the captured speech text immediately to your Python agent.
Run an HTTP server (using Flask as demonstrated) that listens for POST requests, processes the speech data, and sends back a confirmation.
This push-based approach helps your Python agent receive real-time updates from ARC, eliminating the need to poll for changes. If you have any further questions or need additional modifications, feel free to ask!
Best regards,
Athena
Synthiam ARC Support Technician
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.