Asked

Capturing Spontaneous Voice Input

Hello Athena,

I'm seeking guidance on how to develop a script or understand the process within ARC for capturing voice input using an external microphone, as recommended. My goal is for the JD robot's built-in speaker to repeat what it hears or to listen continuously. I'm currently experimenting with the Speech Recognition module, but I'm struggling with capturing spontaneous sentences rather than just relying on pre-programmed prompts. Could you provide some insights or direction on this matter? Thank you!


Related Hardware JD Humanoid

ARC Pro

Upgrade to ARC Pro

With ARC Pro, your robot is not just a machine; it's your creative partner in the journey of technological exploration.

PRO
Canada
#1  

Hello there! It sounds like you’re interested in giving your JD Humanoid some live conversational behavior - capturing spontaneous voice input via an external mic and then having the robot speak back what it hears (or process it further). Here are some suggestions and steps to get you started:


1. Choose Your Speech Recognition Module

ARC offers both the Google Speech Recognition and Bing Speech Recognition skills. You can experiment with either; many users find:

  • Google Speech Recognition is great for projects where you prefer the configuration flow using a JSON key.
  • Bing Speech Recognition works nicely if you’re using the ARC Pro subscription cloud service and want the ease of Microsoft’s speech API.

Make sure your chosen skill is configured correctly (loading credentials, setting up language options, etc.) as described in the documentation:
Google Speech Recognition
Bing Speech Recognition


2. Configure Your External Microphone

An external microphone is key for capturing spontaneous input reliably. Ensure that:

  • The mic is set as your default audio input device.
  • The gain/volume is adjusted appropriately so that voices are captured clearly without excessive background noise.
  • Your computer’s volume mixer confirms that the active device is indeed your chosen mic.

On Windows, you can right-click the speaker icon in the system tray Open Sound Settings in the Input section adjust the device and volume.


3. Capturing Spontaneous Speech

By default, both Google and Bing Speech Recognition skills in ARC have options to Start Listening’ either via a user action (a button press) or automatically using a wake word. To capture spontaneous speech you have a couple of approaches:

Using a Wake Word or Auto-Record Feature

  • Wake Word: Enable the wake word in the skill configuration so that the module begins listening when your chosen keyword is detected. Set the desired minimum confidence level.
  • Auto Record: Some skills allow continuous listening (or push-to-talk released via software). This might be more reliable than waiting for a button press.

Using a Script to Control the Listening

You can create a script that continuously polls the speech recognition output variable (for example, the global variable $GoogleSpeech or $BingSpeech) and then processes it. For instance, here’s a sample script in JavaScript (remember ARC prefers print() for console output):

while (true) {
    // Start listening - this will make the skill capture speech
    controlCommand(Bing Speech Recognition, StartListening);
    
    // Allow some time for a phrase to be captured (adjust timeout as needed)
    sleep(3000);
  
    // Retrieve the captured phrase from the global variable
    var capturedText = getVar($BingSpeech, "");
  
    // Check if any speech was captured
    if (capturedText !== "") {
        // Optionally, you can immediately stop listening
        controlCommand(Bing Speech Recognition, StopListening);
      
        // Output the captured text to the console for debugging
        print("Captured phrase: " + capturedText);
      
        // Have the robot speak the captured text using its built-in speaker
        Audio.sayEZB(capturedText);
      
        // Clear the variable for the next round
        setVar($BingSpeech, "");
    }
  
    // A short delay before the next listening cycle
    sleep(1000);
}

Notes on the Script:

  • Loop & Timing: Adjust the sleep durations to suit your environment and the typical time it takes for users to speak full sentences.
  • Stopping and Restarting: The script uses StartListening and StopListening commands to cycle the recognizer. You can also use continuous listening (using wake-word features) to avoid manual start/stop.
  • Avoiding Feedback: If the robot speaks back what it hears, be aware of potential microphone feedback (the robot hearing itself). Consider using a headset or physically isolating the mic from the speaker.

4. Enhancing the Experience

  • Refining Recognition: For more robust spontaneous input, experiment with the wake word’s confidence level and the microphone settings.
  • Processing the Captured Text: Once you have the spontaneous input, you can further process it (e.g., pass to PandoraBot or trigger additional commands via ControlCommand()).
  • Script Integration: Consider tying your spontaneous speech capture with events from other skills (for example, if a particular phrase is detected, trigger a custom action).

5. Next Steps & Resources

I hope these suggestions help you advance your project! Do you have any questions on any specific part, or would you like further examples for integrating these features?


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.