Asked

Reducing Speech Recognition Latency Tips

I am currently using the "Bing Speech Recognition" skill in the Synthiam ARC environment and am looking for ways to reduce the latency between when speech input ends and when the text is processed and available. At the moment, there's approximately a 7-second delay when utilizing a "max recording length" of 5 seconds.

I am considering implementing a manual start/stop mechanism to improve this performance. Could someone advise if there is an existing robot skill within Synthiam ARC that can interface with a manually operated remote switch? Specifically, I am looking for a method to execute the following command: ControlCommand("Bing Speech Recognition", "StopListening"). Any suggestions or guidance on how to effectively achieve this setup would be greatly appreciated.


Related Hardware EZ-B v4

ARC Pro

Upgrade to ARC Pro

ARC Pro is your passport to a world of endless possibilities in robot programming, waiting for you to explore.

PRO
Canada
#1  

Great question - push-to-talk is the best way to remove the extra latency from fixed-length recordings. You already have the right idea: use a manual start/stop so the Bing Speech skill only records while the user intends to speak. Below I outline the recommended approach (preferred), an alternative using On Variable Changed, wiring notes, Bing Speech settings to minimize latency, and links to the Synthiam docs.

Short summary

Option A - Preferred: Script skill (push-to-talk loop)

  1. Project Add robot skill Scripting Script (choose JavaScript).
  2. Open the Script editor and paste this (replace d0 with the digital port you wired to):
// Push-to-talk for Bing Speech Recognition using a button on d0
// Button released = high (pulled up). Button pressed = low (to ground).
while (true) {
  // Wait for button press (low)
  Digital.wait(d0, false, 10);

  // Start Bing listening
  controlCommand("Bing Speech Recognition", "StartListening");

  // Short debounce to avoid very fast taps
  sleep(200);

  // Wait for button release (high)
  Digital.wait(d0, true, 10);

  // Stop listening and let Bing process immediately
  controlCommand("Bing Speech Recognition", "StopListening");

  // Optional small delay to avoid immediate restart
  sleep(100);
}

Notes:

  • Use d0, d1, etc. as unquoted constants (do not use "d0").
  • Use sleep(ms) (milliseconds) - ARC’s built-in sleep is preferred.
  • The short sleep after StartListening gives the skill time to initialize for very quick taps.

Option B - Use On Variable Changed (if you already have a variable source) If your Read Digital (or external skill) sets a global variable (for example $MyButton) on press/release, you can use On Variable Changed to run a small script:

Example On Variable Changed JS script:

// This runs when $MyButton changes
var val = getVar("$MyButton");
if (val == "0" || val == 0) {
  // pressed (low)
  controlCommand("Bing Speech Recognition", "StartListening");
} else {
  // released (high)
  controlCommand("Bing Speech Recognition", "StopListening");
}

(Confirm what variable Read Digital exposes - otherwise prefer the Script skill approach.)

Wiring and hardware notes

  • Use a momentary push-button wired so that:
    • Released = pulled high (to Vcc via pull-up)
    • Pressed = connects the digital port to GND (reads low)
  • Typical wiring: Button between D0 and GND, plus pull-up resistor from D0 to 3.3V/5V (or enable internal pull-up if your controller supports it).
  • If you want a remote (wireless) switch, use the receiver output into an EZ-B digital input the same way (or use a relay to ground the pin on press).

Bing Speech / ARC settings to reduce latency

  • In the Bing Speech skill settings:
    • Turn off wake word if you will rely on push-to-talk.
    • Set the max recording length to a higher value (don’t let Bing auto-cut recording early). With push-to-talk StopListening is what ends the capture - avoid short max cutoffs that force transcription after timeout.
    • Use the skill’s All Recognized Script (or watch $BingSpeech variable) - that script runs immediately when transcription returns.
  • StopListening triggers the Bing skill to send the captured audio for recognition immediately - thus Stop transcript time is much shorter than waiting for a timeout.

Other tips to improve responsiveness

  • Use a wired microphone on the PC running ARC or a low-latency audio input device.
  • Avoid overlapping speech while the robot is speaking; you can prevent new starts by monitoring $IsSpeaking (getVar("$IsSpeaking")).
  • If you need even lower end-to-end latency, try keeping audio quality consistent (mono, 16k/16bit is typically supported) and avoid heavy CPU loads on the ARC machine.

Links

Questions for you

  • Which EZ-B digital port do you plan to use (d0..d23)?
  • Do you want a momentary push-button (hold to talk) or a toggle button (press to start, press to stop)?
  • Would you like me to make a ready-to-paste Script skill that matches your port and preferred behavior?

If you tell me the port and whether you want hold-to-talk or press-to-toggle, I’ll post the exact script and short setup steps (kebab menu settings) you can drop into your project.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.