Openai Text To Speech icon Openai Text To Speech Convert text to dynamic, real-time speech with nine expressive OpenAI voices-natural, varied, accessible robot communication. Try it →
Asked

Why Does Speech Recognition Pickup Background Nois…

One thing I still don't understand: whenever I have the "Speak Rec" robot skill operating, it picks up background noise and causes some of my Auto Position stills to activate on their own. I do use start/stop voice-activation scripts in my ARC project, but as long as Speak Rec is listening, if it hears a noise things start moving on their own and sometimes end up breaking the robot. Why does this happen?

For context: I'm using the Speak Rec robot skill and Auto Position presets in an ARC project and I'm using scripts to start and stop voice activation. I'm not sure whether my microphone is connected to the PC (USB/audio device) or to an EZB - if that matters, please let me know what detail would be helpful.



Related Hardware (view all EZB hardware)
EZ-B V4 by EZ-Robot
EZ-B v4/2 robot controller: dual Cortex ARM, Wi-Fi, audio/video, 24 servo/digital ports, I2C/UART, camera and Synthiam ARC control for custom robots
Wi-Fi / USB
Servos 24
Camera
Audio
UART 3
I2C
ADC 8
Digital 24

Related Robot Skills (view all robot skills)
Script by Synthiam
Multi-language ARC Script: build, run and debug Blockly, JavaScript, EZ-Script or Python with Intellisense, run/save/load and Roboscratch support
Accurate Bing cloud speech-to-text for ARC: wake-word, programmable control, $BingSpeech output, Windows language support, headset compatible

ARC Pro

Upgrade to ARC Pro

Join the ARC Pro community and gain access to a wealth of resources and support, ensuring your robot's success.

Author Avatar
PRO
Canada
Twitter YouTube GitHub
#9  

For typical robot commands, I’d generally recommend Bing Speech Recognition if you have reliable internet and want better recognition quality.

Quick comparison

Regular Speech Recognition

Best when:

  • you want offline / local recognition behavior
  • you’re using a small, fixed phrase list
  • you want simple command triggers without cloud dependence

Tradeoffs:

  • more sensitive to microphone quality and room noise
  • recognition can be less consistent, especially with short phrases
  • can false-trigger more easily in noisy robot environments

Bing Speech Recognition

Best when:

  • you want better accuracy
  • you use natural-sounding robot commands
  • you have a stable internet connection
  • you want more flexibility for speech-driven control

Tradeoffs:

  • requires internet
  • cloud-based, so there’s a small delay
  • still needs good mic placement and sane confidence settings

My recommendation

For a robot with moving parts, I’d usually choose:

  • Bing Speech Recognition for command recognition
  • a headset or external mic
  • a short listening window
  • a safety variable before allowing dangerous motions

That combination tends to be much more dependable than the regular Speech Recognition skill in noisy environments.

When I’d keep the regular Speech Recognition skill

Stick with the regular one if:

  • you don’t want cloud dependence
  • your command set is very small
  • you already have it tuned well
  • the robot works in a quieter environment

When I’d switch to Bing

Switch if:

  • you’re seeing false positives
  • recognition is inconsistent
  • you want more natural speech handling
  • you can use an external mic and internet

Important safety note

No matter which one you use, I would not let speech recognition directly drive risky motion without a check first. Use a script guard like:

  • $botharmsextended
  • a door/limit switch
  • a "safe mode" variable

That way, even a mistaken phrase can’t immediately move something fragile.

Best next step

If you want, I can help you decide based on your current ARC project setup.
A few things that would help:

  • Are you using phrase-only commands or full sentence recognition?
  • Is your project mostly offline right now?
  • Are the commands tied directly to Auto Position frames or to scripts?
  • Do you want maximum accuracy or no internet dependency?

If you share that, I can suggest the best choice for your exact setup.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
#10  

Ok, good info. This is what I have done in the past and does seem to work. Except for the mis-triggered commands. I do use scripts for some functions and auto-position for others.

During my Hemi/Remi robot builds I set up ARC with three workspaces within the project.

First Workspace panel was the main one, this would have the regular Speech Recognition skill for general drive motor movement and head movement. It was also used for many arm movement from the Auto Position skill. This is more typical offline operation.

The second workspace panel was for any of the audio stuff, and I also used Bing Speech Recognition Skill along with the PandoraBot skill , which would be online stuff.

Third Workspace panel was used for User Interface configuration.

Author Avatar
PRO
Canada
Twitter YouTube GitHub
#11  

That’s actually a very sensible ARC design. Using separate workspaces for drive / motion, audio / AI, and UI/config is a clean way to keep the project organized.

My take on your setup

Given what you described, I’d keep the split like this:

  • Workspace 1: local robot control

    • Regular Speech Recognition
    • movement
    • head/arm actions
    • anything that needs to work offline
  • Workspace 2: online / conversational features

    • Bing Speech Recognition
    • PandoraBot
    • cloud-based audio or chat behavior
  • Workspace 3: UI / admin / configuration

    • safe to leave separate so it doesn’t interfere with motion logic

That structure makes sense.

The main issue is still the trigger path

The problem isn’t really which workspace you use - it’s how easily a phrase can directly trigger a motion action.

If a spoken phrase goes straight to:

then a false recognition can still cause trouble, even in a well-organized project.

What I’d improve

1) Keep the regular Speech Recognition skill for offline control

If your main robot commands are local and simple, that skill is still fine.

Use it for:

  • drive
  • head turns
  • basic arm motions
  • emergency stop

2) Use Bing Speech Recognition for more conversational / online tasks

That’s a good match for:

  • PandoraBot
  • general dialog
  • richer speech interaction

3) Add a safety gate for every risky motion

This is the big one.

For example:

  • $safeToMoveArms = 0
  • $safeToMoveArms = 1

Then any arm/gripper command checks that variable before running.

Example pattern in ARC JavaScript

if (getVar("$safeToMoveArms", "0") == "1") {
  controlCommand("Auto Position", "AutoPositionFrame", "Open Gripper");
} else {
  print("Blocked: arms are not in a safe state.");
}

You could use the same idea for:

  • drive motors
  • head movement
  • any movement inside the robot body

Better than direct speech-to-action

Instead of:

  • Speech Recognition Auto Position

use:

  • Speech Recognition Script
  • Script checks safety state
  • Script then calls Auto Position

That adds a layer of protection without changing your whole project structure.

My recommendation for your current setup

If your current arrangement is mostly working, I would not rebuild everything.

I would:

  1. keep your three-workspace layout
  2. continue using Regular Speech Recognition for local control
  3. keep Bing Speech Recognition for online features
  4. wrap the dangerous motions in a safety-check script
  5. optionally consider a better mic if false triggers are still common

One thing to watch

If you have the same or similar voice phrases in both speech skills, that can create confusion. For example:

  • one skill hears "open gripper"
  • another hears something similar and triggers a different routine

So it helps to keep the phrase sets clearly separated by workspace/function.

ARC tip

If you haven’t already, you can configure each skill from its kebab / three-line menu and tune:

  • phrase list
  • confidence
  • scripts for recognized/not recognized behavior

That’s where I’d focus first.

If you want, I can help you map your current workspace design into a safer command flow, especially for the arm Auto Position actions.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
#12  

The safety example you just provided, where would I put that and how would it get activated?

Author Avatar
PRO
Canada
Twitter YouTube GitHub
#13  

Sorry, I can't continue helping in this thread. It's grown long enough that my context window will require too much extra processing, which makes follow-ups unreliable and I can start making mistakes.

A couple of options:

  • Start a new thread that picks up where we left off - fresh context, faster responses.
  • Or get unlimited interaction with me through ARC's built-in AI Assistant agent: Synthiam AI Assistant

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.