DHT11 Temp & Humidity Sensor icon DHT11 Temp & Humidity Sensor Read DHT11 temperature and humidity on Arduino (signal pin 4) using EZ-Genuino_DTH11_Uno firmware; exposes $Temp and $Humid variables. Try it →
Australia
Asked
Resolved Resolved by Athena!

Sound Servo Sync Issues

I'm using the Sound servo (EZB Playback) robot skill in ARC to control my robot's mouth servos. The audio is played through the EZB's audio output, but the mouth movement is significantly out of sync with the sound I hear from the EZB. I have tried adjusting every parameter in the Sound servo (EZB Playback) skill with no improvement.

I am not using the Talk servo skill because that is intended for spoken text/TTS, not pre-recorded audio. Are there any other options within ARC (other Robot Skills, scripting, or EZB settings) for achieving better synchronization of mouth movement to pre-recorded playback?



Related Hardware (view all EZB hardware)
EZ-B V4 by EZ-Robot
EZ-B v4/2 robot controller: dual Cortex ARM, Wi-Fi, audio/video, 24 servo/digital ports, I2C/UART, camera and Synthiam ARC control for custom robots
Wi-Fi / USB
Servos 24
Camera
Audio
UART 3
I2C
ADC 8
Digital 24

Related Robot Skills (view all robot skills)
Map EZ-B audio volume to servos; multi-servo mirroring, scalar range control, invert/min-max, waveform feedback to sync mouth motion.
Trigger a script when EZB audio plays, exposing live Min, Max and Avg audio-level variables for looped monitoring; stops when audio ends.

ARC Pro

Upgrade to ARC Pro

ARC Pro will give you immediate updates and new features needed to unleash your robot's potential!

Author Avatar
PRO
Canada
Twitter YouTube GitHub
#1  

Yes - for pre-recorded audio, you have a couple of better options than trying to force tighter sync from Sound servo (EZB Playback) alone.

Why this happens

The Sound servo (EZB Playback) skill reacts to the audio level being streamed/output by the EZ-B, so it is inherently dependent on:

  • audio buffering
  • Wi-Fi latency / packet timing
  • the audio stream path through ARC EZ-B

That means the mouth can lag behind what you hear, even if the settings look correct. This is especially noticeable with pre-recorded speech because the mouth timing needs to feel very exact.

Better options in ARC

1) Drive the mouth with a script instead of live audio amplitude

If you know the audio file being played, the most reliable approach is:

  • play the audio
  • run a timed servo script in parallel
  • open/close the jaw based on the clip timing

This gives you deterministic sync instead of "audio-reactive" sync.

For example, you can create a JavaScript action that plays the file and moves the servo on a timed loop:

Audio.sayEZB("Hello world", 0)
Servo.setPosition(d0, 140)

sleep(300)
Servo.setPosition(d0, 110)

sleep(250)
Servo.setPosition(d0, 145)

sleep(200)
Servo.setPosition(d0, 112)

sleep(250)
Servo.setPosition(d0, 140)

That’s a simple example. For real clips, you’d tune the timing to match the waveform or syllables.

2) Use a sound-triggered script and animate the mouth yourself

If you still want the audio to start the motion, consider:

  • Sound Script (EZB) to detect when audio begins
  • a custom JavaScript loop to move the jaw on a pattern while the clip plays

This is often more controllable than the Sound servo skill.

Skill reference:

3) Use external audio timing markers

If you’re working with a specific collection of audio clips, a good workflow is:

  • split the audio into sections
  • trigger different servo poses per section
  • use ControlCommand() or a script to step through jaw positions

This is much more accurate than using amplitude-following for lip sync.

4) Reduce audio streaming delay

If the mouth is only slightly behind, the issue may be audio transport latency rather than the servo skill itself. ARC audio streaming over Wi-Fi can introduce delay.

Things to try:

  • move the EZ-B closer to the PC/network source
  • reduce Wi-Fi congestion
  • connect the EZ-B to a cleaner Wi-Fi channel
  • use a wired connection where possible

ARC’s audio playback is packet-based, so network conditions can absolutely affect sync. More on that here:

5) Use a custom robot skill if you need advanced lip-sync

If you want a dedicated solution for pre-recorded clips, the best long-term answer may be to create your own robot skill that:

  • reads the audio file timing
  • drives the servo from known cue points
  • or performs phoneme-based animation

If ARC doesn’t already have the exact feature you need, that’s a great case for a custom skill:

What I’d recommend

For pre-recorded audio, I’d rank the options like this:

  1. Timed JavaScript servo animation
  2. Sound Script + custom servo control
  3. Improve Wi-Fi/audio buffering conditions
  4. Custom robot skill for cue-based lip sync
  5. Sound servo skill only, if approximate movement is acceptable

One important note

If your audio is being played through the EZ-B speaker, and the servo is reacting to the output stream, there will always be some delay because the servo is following the audio pipeline, not the audio source itself.

A couple questions so I can point you to the best setup

  • Is your audio file a fixed set of clips or many different sentences?
  • Are you playing the sound through the EZ-B speaker, or through the PC?
  • Roughly how far off is the mouth - like 100 ms, or more like half a second?

If you want, I can help you build a JavaScript lip-sync action in ARC for pre-recorded clips. The ARC AI Assistant built into ARC can also help rewrite the script right in the editor:

I am a customer support AI and still learning. Please verify important answers in our support documentation. For advanced AI features, use the AI Script Agent built into ARC.
Author Avatar
PRO
Australia
#2  

OK. I tested EZB audio output using SoundServoEZB versus Bluetooth Audio using SoundServoPC. There was a big difference. The mouth sync was very good using a Bluetooth speaker with the exact same settings. The mouth sync using EZB audio was very poor. My conclusion is to avoid using EZB output audio to control mouth movement.

📚 This thread has been preserved as a helpful resource!

While new comments are no longer being accepted, the information here remains available for learning and reference. Check the most recent comment for additional details.