Capturing Spontaneous Voice Input

joesox

USA

Asked Jul 3

Hello Athena,

I'm seeking guidance on how to develop a script or understand the process within ARC for capturing voice input using an external microphone, as recommended. My goal is for the JD robot's built-in speaker to repeat what it hears or to listen continuously. I'm currently experimenting with the Speech Recognition module, but I'm struggling with capturing spontaneous sentences rather than just relying on pre-programmed prompts. Could you provide some insights or direction on this matter? Thank you!

Related Hardware JD Humanoid

Related Controls Advanced Speech Recognition Script Collection

Jump to end

Upgrade to ARC Pro

Experience early access to the latest features and updates. You'll have everything that is needed to unleash your robot's potential.

Compare Pro Features View Subscription Plans

Athena AI

PRO

Canada

#9 Jul 8

Hi joesox,

I'd be happy to help you set up two scripts in your Script Collection. Before I post these sample scripts, could you clarify what your desired outcome is? For example, are you calling these scripts from an HTTP server skill to return the contents of the $BingSpeech variable and then clear it? Also, how are you integrating the HTTP commands with these scripts?

In the meantime, here are two sample scripts written in JavaScript (as used in ARC) that might match your needs. You can adapt these to your project.

Script: getBingSpeech

This script retrieves the value from the global variable $BingSpeech and prints (or returns) it. You could call this as part of your HTTP response process.

// getBingSpeech script
// Retrieve the BingSpeech text from the global variable.
var bingText = getVar("$BingSpeech", "");

// Log the retrieved text for debugging purposes.
print("Retrieved BingSpeech text: " + bingText);

// Optionally, you could send this text using HTTP response code.
// For example, if you're using an HTTP server skill with scripting,
// you might use the response text as a return value from this script.
return bingText;

Script: clearBingSpeech

This script clears the global variable after processing.

// clearBingSpeech script
// Reset the BingSpeech global variable.
setVar("$BingSpeech", "");

// Log the clearing action for debugging purposes.
print("BingSpeech variable has been cleared.");

// Optionally, you can return a confirmation message.
return "BingSpeech cleared.";

A Few Questions to Better Tailor the Scripts

Do you intend to call these scripts from an HTTP server skill as part of a web API endpoint?
Do you have any additional processing or logging requirements for the BingSpeech variable?
Are you using JavaScript or Python in your ARC project? (These examples are in JavaScript.)

Let me know if these sample scripts fit your needs or if you need additional modifications or integration details with the HTTP commands.

Looking forward to your response, Athena
Synthiam ARC Support Technician

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

Synthiam Support

Canada

#10 Jul 9

It isn’t clear but you should be processing the response from the bing speech recognition in the speech recognition response script. The script is executed everytime there is a response. That would ensure you’re only processing the latest response text.

if you are processing the global variable in a separate script, you won’t know when there was a new response.

Athena AI

PRO

Canada

#11 Jul 9

Hi joesox,
Synthiam_Support makes a very good point. The recommended approach is to process the Bing Speech response right within the response script of the Bing Speech Recognition skill-this way, you always work with the latest response. Separating the processing logic into distinct get/clear scripts can risk acting on stale data if a new response isn’t synchronized.

That said, if you still wish to have separate scripts (perhaps for an HTTP endpoint that your project triggers), you can do so with the understanding that you need to ensure your HTTP commands or integration logic can tell when a new response is received.

Below are two sample JavaScript scripts for ARC’s Script Collection that demonstrate how you might retrieve and clear the Bing Speech variable. They assume you’re using ARC’s JavaScript scripting (the same idea applies if you switch to Python).

Script: getBingSpeech

This script retrieves the current value of the Bing Speech variable and returns it.

// getBingSpeech script
// Retrieve the current Bing Speech recognized text from the global variable.
var bingText = getVar($BingSpeech, "");

// Log the retrieved text for debugging.
print("Retrieved BingSpeech text: " + bingText);

// Return the text if calling from an HTTP command or similar.
return bingText;

Script: clearBingSpeech

This script clears the Bing Speech variable after you have processed it.

// clearBingSpeech script
// Clear the global BingSpeech variable.
setVar($BingSpeech, "");

// Log that the variable has been cleared.
print("BingSpeech variable has been cleared.");

// Return a confirmation message.
return "BingSpeech cleared.";

Recommended Approach (Processing Inside the Response Script)

Since Bing Speech Recognition executes its response script every time it gets a new response, you might want to incorporate your processing directly in that script. For example:

// Bing Speech Recognition Response Script Example
// This script executes every time there is a new speech recognition result.
var text = getVar($BingSpeech, "");
if(text !== "") {
  print("New BingSpeech response received: " + text);
  
  // Process the recognized speech text (for example, echo it through the built-in speaker)
  // You can call another script or include processing logic here.
  // For example, to make the robot speak the text:
  Audio.sayEZB(text);
  
  // Clear the variable after processing so we don't re-process the same response.
  setVar($BingSpeech, "");
} else {
  print("No new BingSpeech response.");
}

Using this method ensures that you always process the latest response immediately as it arrives rather than relying on a separate polling mechanism or HTTP command that might inadvertently pick up an old result.

Key Considerations

Real-Time Processing: The response script built into the Bing Speech Recognition skill is triggered only when new audio is processed. This avoids the risk of handling outdated responses.
Synchronization: If you intend to access the Bing Speech data via an HTTP server skill or another external trigger, ensure that the timing of calls is synchronized with speech recognition events. Otherwise, you might end up with empty or old data.
Modularity vs. Efficiency: While it's possible to segment functionality into separate scripts (getBingSpeech and clearBingSpeech), in many cases it is more efficient and less error-prone to integrate the processing logic within the response script of the Bing Speech Recognition module.

Let me know if this addresses your needs or if you have any further questions regarding integrating these scripts with an HTTP server or other robot skills!

Best regards,
Athena
Synthiam ARC Support Technician

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

joesox

PRO

USA

#12 Jul 10

The scripts are working, however, I am still having difficulty grabbing the heard text from the Bingspeech skill. Perhaps answering your previous questions may help. Currently, I only receive "OK" as a result for the getBingSpeech script you suggested earlier. Q: Do you intend to call these scripts from an HTTP server skill as part of a web API endpoint? A: I have a python agent running that communicates to JD, via http ControlCommand, for the agent's senses to explore the outside world. When my python agent executes 'ControlCommand("Script Collection", "ScriptStartWait", "getBingSpeech")', it always returns 'OK', and I can't seem to access the capturedText. I could use some more suggestions or what I am missing.

Q: Do you have any additional processing or logging requirements for the BingSpeech variable? A: When ARC skill BingSpeech hears something when the agent sends a 'ControlCommand("Script Collection", "ScriptStartWait", "getBingSpeech")', once the text is known, that capuredText needs to be immediately sent to the agent's running process somehow.

Q: Are you using JavaScript or Python in your ARC project? (These examples are in JavaScript.) A: I am just using Blockly and javascript in ARC. In my third-party agent application, I am using python.

Athena AI

PRO

Canada

#13 Jul 10

Below is some guidance and suggestions to help you get the captured Bing Speech text back in your python agent:

Use an Explicit Return Statement

Make sure your getBingSpeech script explicitly returns the value from the global variable. For example:

// getBingSpeech script
// Retrieve the current Bing Speech text from the global variable.
var bingText = getVar($BingSpeech, "");
print("Retrieved BingSpeech text: " + bingText);
// IMPORTANT: Return the captured text so that ScriptStartWait can capture it.
return bingText;

If you don’t explicitly return a value, ARC may simply return "OK" even though the script runs.

Verify That $BingSpeech Is Being Set

Check that the Bing Speech Recognition skill’s response script is actually setting the global variable $BingSpeech. For example, in the response script you might have:

// Bing Speech Recognition Response Script Example
var text = getVar($BingSpeech, "");
if (text !== "") {
    print("New BingSpeech response received: " + text);
    // (Optional) Have the robot speak it or process it here.
    Audio.sayEZB(text);
    // You may choose to either clear or preserve the variable.
    // For testing, comment out the next line:
    // setVar($BingSpeech, "");
} else {
    print("No new BingSpeech response.");
}

This will help ensure that when your python agent calls getBingSpeech via
ControlCommand(Script Collection,ScriptStartWait,getBingSpeech), the variable already contains the spoken text.

Consider When to Clear the Variable

If the global $BingSpeech variable is cleared immediately after processing by the Bing Speech Recognition skill’s own response script, your external call from python may see an empty value.
Option 1: Process the speech immediately in the response script without relying on an external getBingSpeech script.
Option 2: Delay clearing $BingSpeech until after your agent retrieves it. You might store the result in another global variable (for example, $CapturedText) that your getBingSpeech script reads.

Example: In your response script, do something like the following:

var text = getVar($BingSpeech, "");
if (text !== "") {
    // Save the response in another global variable.
    setVar($CapturedText, text);    
    // (Optional) You can process it here or simply leave it for your agent.
    Audio.sayEZB(text);
    // Then clear $BingSpeech if you wish.
    setVar($BingSpeech, "");
}

Now, update your getBingSpeech script to return $CapturedText:

// getBingSpeech based on captured text stored in a separate variable.
var capturedText = getVar($CapturedText, "");
print("Retrieved CapturedText: " + capturedText);
return capturedText;

How the HTTP ControlCommand Call Works

When your python agent calls:

ControlCommand(Script Collection,ScriptStartWait,getBingSpeech)

ARC executes the getBingSpeech script and returns whatever is provided by the return statement. If your script only prints output (or doesn’t return a value), ARC’s default response will be "OK". In your setup, make sure that:

The returned text is not empty.
The getBingSpeech script uses an explicit "return".
Your python agent is checking the HTTP response body where ARC sends the script result.

Verify End-to-End Flow

For troubleshooting, consider the following checklist: Manually trigger speech into the Bing Speech Recognition skill and confirm that $BingSpeech (or your alternate global variable) has the expected text using ARC’s Variable Watcher. Test the getBingSpeech script directly (by calling ScriptStartWait) after you know speech has been captured. Confirm that your python agent is reading the HTTP response body and not just the "OK" status from the HTTP header.

Next Steps

If possible, integrate a callback for real-time processing directly within the Bing Speech Recognition response script. This approach minimizes any delay between speech capture and processing.
Or, have your response script update a global variable that the python agent polls at appropriate intervals.

Summary

The key changes are to (a) add an explicit return statement in your getBingSpeech script, (b) ensure the Bing Speech Recognition skill is leaving the captured text available when your python agent makes its HTTP call, and (c) verify that your python agent reads the HTTP response body correctly.

I hope these suggestions help. If you have any more questions or need further clarification on any step, feel free to ask!

Best regards,
Athena
Synthiam ARC Support Technician

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

DJ Sures

PRO

Synthiam

#14 Jul 10 — Edited Jul 10

It appears that the client/server model may be backward. The Python "agent" should be receiving push messages from ARC instead of polling. Your current configuration suggests that the Python "agent" is polling the HTTP server to retrieve the variable containing the speech recognition text. Instead, the Python "agent" should have an HTTP server and listen to post messages from ARC.

The HTTP Post JavaScript or Python in ARC can post the value to the Python "Agent". Here is the method I'm mentioning, which also contains an example: https://synthiam.com/Support/javascript-api/Net/hTTPPost

Athena can provide an example for the ARC and Python "agent" using this approach.

Athena AI

PRO

Canada

#15 Jul 10

Below is an example of how you can change your approach from polling to a push-based model. In this example, ARC will send the captured speech text via an HTTP POST message to a Python agent that is running an HTTP server. This way, whenever your speech is processed in ARC, it is pushed directly to your Python agent.

ARC (JavaScript) - Pushing Speech Data Using HTTP POST

Assume that your ARC project captures speech into a variable (for example, $CapturedText). You can add a script that uses ARC’s built-in HTTP post function (Net.hTTPPost) to send this text to your Python agent’s endpoint.

Below is an example JavaScript script to be used in ARC:

// pushBingSpeech.js
// Retrieve the captured speech text (ensure this variable is set by your response script)
var capturedText = getVar($CapturedText, "");

// If there is no captured text, exit (you might want to add error handling)
if (capturedText === "") {
  print("No captured speech text to push.");
  return;
}

// Define the URL of your Python agent's HTTP server endpoint.
// Replace <python-agent-host> and <port> with the correct address and port of your agent's HTTP server.
var url = "http://<python-agent-host>:<port>/speech";;

// Prepare post data.
// Here we encode the variable into a POST parameter "speech".
// If needed, you can add more parameters.
var postData = "speech=" + encodeURIComponent(capturedText);

// Optionally, you can set a timeout (in milliseconds)
var timeout = 5000; // 5 seconds

// Send an HTTP POST from ARC to the Python agent.
var response = Net.hTTPPost(url, postData, timeout);

// Log the response received from the Python agent.
print("Response from agent: " + response);

// Optionally clear the captured text once it has been sent
setVar($CapturedText, "");

Usage:
You can trigger this script (for example, via a Script Collection or directly from an event in your speech recognition response script) as soon as new speech data is captured.

Python Agent - Receiving the Speech Data

Below is an example using Python with Flask to set up a simple HTTP server that listens for POST requests on the /speech endpoint.

# python_agent.py
from flask import Flask, request

app = Flask(__name__)

@app.route('/speech', methods=['POST'])
def speech():
    # Retrieve the POST parameter "speech"
    speech_data = request.form.get('speech')
    if speech_data:
        print("Received speech: " + speech_data)
        # Here, process the speech data as needed
        # For example, send it to another part of your python application
        return "Received: " + speech_data, 200
    else:
        print("No speech data received.")
        return "No speech data", 400

if __name__ == "__main__":
    # Run the server on all interfaces at port 5000 (or change to your preferred port)
    app.run(host='0.0.0.0', port=5000)

Usage:

Run your Python agent. Ensure you have Flask installed (you can install it via pip if needed).
Your Python agent will listen on the defined port (e.g., 5000) for incoming POST requests at /speech.

Key Considerations

Network Connectivity:
Ensure that your ARC machine can reach the Python agent’s HTTP server. This might involve adjusting firewall settings or using a specific network interface if the devices are on different networks.
Variable Timing:
Make sure that the $CapturedText variable is set by your Bing Speech Recognition response script before the HTTP POST is sent. You might need to adjust your script logic so that the captured text remains available until it is pushed.
Feedback and Confirmation:
The HTTP POST returns a response (which is logged in ARC). This can be used as confirmation that the Python agent received the message, or to perform debugging if it does not.
Extending the Example:
If you need additional parameters or more complex data, consider using JSON for your POST payload. In ARC JavaScript, you would need to format a JSON string, and in your Python agent, use request.get_json().

Summary

On ARC:
Use the Net.hTTPPost JavaScript function to push the captured speech text immediately to your Python agent.
On Python:
Run an HTTP server (using Flask as demonstrated) that listens for POST requests, processes the speech data, and sends back a confirmation.

This push-based approach helps your Python agent receive real-time updates from ARC, eliminating the need to poll for changes. If you have any further questions or need additional modifications, feel free to ask!

Best regards,
Athena
Synthiam ARC Support Technician

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

joesox