azmustang09

USA

Asked Dec 2025

Reducing Speech Recognition Latency Tips

I am currently using the "Bing Speech Recognition" skill in the Synthiam ARC environment and am looking for ways to reduce the latency between when speech input ends and when the text is processed and available. At the moment, there's approximately a 7-second delay when utilizing a "max recording length" of 5 seconds.

I am considering implementing a manual start/stop mechanism to improve this performance. Could someone advise if there is an existing robot skill within Synthiam ARC that can interface with a manually operated remote switch? Specifically, I am looking for a method to execute the following command: ControlCommand("Bing Speech Recognition", "StopListening"). Any suggestions or guidance on how to effectively achieve this setup would be greatly appreciated.

Related Hardware (view all EZB hardware)

EZ-B V4 by EZ-Robot

EZ-B v4/2 robot controller: dual Cortex ARM, Wi-Fi, audio/video, 24 servo/digital ports, I2C/UART, camera and Synthiam ARC control for custom robots

Wi-Fi / USB

Servos 24

Camera

Audio

UART 3

I2C

ADC 8

Digital 24

Related Robot Skills (view all robot skills)

Read Digital by Synthiam

Read TTL digital input from an ARC I/O port; real-time red/green status for low (0V) or high (+3.3/5V), selectable board/port and read interval.

Requires ARC

Digital

On Variable Changed by Synthiam

Run scripts automatically when specified ARC variables change; define variable-to-script pairs, monitor status, and trigger actions.

Requires ARC v3

Scripting

Jump to end

Upgrade to ARC Pro

With Synthiam ARC Pro, you're not just programming a robot; you're shaping the future of automation, one innovative idea at a time.

Compare Pro Features View Subscription Plans

Athena

PRO

Canada

#1 Dec 2025

Great question - push-to-talk is the best way to remove the extra latency from fixed-length recordings. You already have the right idea: use a manual start/stop so the Bing Speech skill only records while the user intends to speak. Below I outline the recommended approach (preferred), an alternative using On Variable Changed, wiring notes, Bing Speech settings to minimize latency, and links to the Synthiam docs.

Short summary

Best approach: Add a Script robot skill (JavaScript) that watches a digital input on your EZ-B v4 and calls:
- controlCommand("Bing Speech Recognition", "StartListening")
- controlCommand("Bing Speech Recognition", "StopListening")
This produces near-instant Stop transcription processing (no forced 5s max cutoff).
See the Push-to-Talk tutorial for a full worked example: https://synthiam.com/Support/Get-Started/Tutorials/speech-recognition-with-push-button-microphone

Option A - Preferred: Script skill (push-to-talk loop)

Project Add robot skill Scripting Script (choose JavaScript).
Open the Script editor and paste this (replace d0 with the digital port you wired to):

// Push-to-talk for Bing Speech Recognition using a button on d0
// Button released = high (pulled up). Button pressed = low (to ground).
while (true) {
  // Wait for button press (low)
  Digital.wait(d0, false, 10);

  // Start Bing listening
  controlCommand("Bing Speech Recognition", "StartListening");

  // Short debounce to avoid very fast taps
  sleep(200);

  // Wait for button release (high)
  Digital.wait(d0, true, 10);

  // Stop listening and let Bing process immediately
  controlCommand("Bing Speech Recognition", "StopListening");

  // Optional small delay to avoid immediate restart
  sleep(100);
}

Notes:

Use d0, d1, etc. as unquoted constants (do not use "d0").
Use sleep(ms) (milliseconds) - ARC’s built-in sleep is preferred.
The short sleep after StartListening gives the skill time to initialize for very quick taps.

Option B - Use On Variable Changed (if you already have a variable source) If your Read Digital (or external skill) sets a global variable (for example $MyButton) on press/release, you can use On Variable Changed to run a small script:

Example On Variable Changed JS script:

// This runs when $MyButton changes
var val = getVar("$MyButton");
if (val == "0" || val == 0) {
  // pressed (low)
  controlCommand("Bing Speech Recognition", "StartListening");
} else {
  // released (high)
  controlCommand("Bing Speech Recognition", "StopListening");
}

(Confirm what variable Read Digital exposes - otherwise prefer the Script skill approach.)

Wiring and hardware notes

Use a momentary push-button wired so that:
- Released = pulled high (to Vcc via pull-up)
- Pressed = connects the digital port to GND (reads low)
Typical wiring: Button between D0 and GND, plus pull-up resistor from D0 to 3.3V/5V (or enable internal pull-up if your controller supports it).
If you want a remote (wireless) switch, use the receiver output into an EZ-B digital input the same way (or use a relay to ground the pin on press).

Bing Speech / ARC settings to reduce latency

In the Bing Speech skill settings:
- Turn off wake word if you will rely on push-to-talk.
- Set the max recording length to a higher value (don’t let Bing auto-cut recording early). With push-to-talk StopListening is what ends the capture - avoid short max cutoffs that force transcription after timeout.
- Use the skill’s All Recognized Script (or watch $BingSpeech variable) - that script runs immediately when transcription returns.
StopListening triggers the Bing skill to send the captured audio for recognition immediately - thus Stop transcript time is much shorter than waiting for a timeout.

Other tips to improve responsiveness

Use a wired microphone on the PC running ARC or a low-latency audio input device.
Avoid overlapping speech while the robot is speaking; you can prevent new starts by monitoring $IsSpeaking (getVar("$IsSpeaking")).
If you need even lower end-to-end latency, try keeping audio quality consistent (mono, 16k/16bit is typically supported) and avoid heavy CPU loads on the ARC machine.

Links

Push-to-Talk tutorial (walkthrough + example code): https://synthiam.com/Support/Get-Started/Tutorials/speech-recognition-with-push-button-microphone
Bing Speech skill page: open the skill settings inside ARC (Project Add robot skill Audio category Bing Speech Recognition)
Read Digital skill doc: https://synthiam.com/Support/Skills/Digital/Read-Digital?id=16093
On Variable Changed skill doc: https://synthiam.com/Support/Skills/Scripting/On-Variable-Changed?id=19359

Questions for you

Which EZ-B digital port do you plan to use (d0..d23)?
Do you want a momentary push-button (hold to talk) or a toggle button (press to start, press to stop)?
Would you like me to make a ready-to-paste Script skill that matches your port and preferred behavior?

If you tell me the port and whether you want hold-to-talk or press-to-toggle, I’ll post the exact script and short setup steps (kebab menu settings) you can drop into your project.

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

azmustang09

Reducing Speech Recognition Latency Tips

Upgrade to ARC Pro

Products

Community

Support

About