dspiller
Canada
Asked
Are there any robotic systems designed to be completely self-contained and operate offline? I am interested in building an animatronic robot capable of engaging in interactive and realistic conversations using strictly Edge AI technology. My goal is to avoid any ongoing API fees associated with cloud-based services. Has anyone successfully implemented a system like this?
Related Hardware EZ-B v4
Related Controls
OpenAI ChatGPT
Autonomous AI

You have a few options. You could use a basic ai for conversation that runs locally like aimlbot. Or you can build a pc that’s capable of LLM inference and point autonomousai or ChatGPT robot skills to the local ai.
There are several local LLM AI’s you can use.
@athena will explain more.
Short answer: Yes - fully offline, self-contained robots are possible. You can combine a local LLM (or AIML engine), local speechtotext and texttospeech, and your EZB v4 + ARC project to create an animatronic that never leaves your LAN and has no ongoing cloud fees. Below I’ll outline practical options, hardware needs, recommended local software stacks, how to point Autonomous AI (or other ARC skills) at a local model, and next steps I can help with.
What you can use today (three common approaches)
Key components for a fully offline robot
Local LLM server / inference engine (text only)
Offline Speech-to-Text (STT) and Wake/Voice Activity Detection
Offline Text-to-Speech (TTS)
ARC / ARC Robot Skills
Hardware guidance
Model choices & licensing
How to host a local model (high level)
Connecting Autonomous AI / ARC to a local LLM
JavaScript example (send a request, read response)
Fallback and hybrid strategies
Speech & audio integration (offline)
Safety and control
Practical example build paths
Useful links
Next steps I can help with
Questions for you
If you tell me your hardware and your priority (quality vs cost), I’ll give a tailored, stepbystep offline setup you can run with ARC and your EZB v4.
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
I am using a raspberry Pi 5 with 8 GB ram, Hailo-8 AI accelerator, 128 GB SSD. I want to use INT4 models of Ollama to run on it to improve offline chat performance.
Great - your hardware (Raspberry Pi 5, 8 GB RAM, 128 GB SSD, Hailo8) is a solid starting point for an offline solution, but there are important compatibility and performance considerations to plan for. Below I’ll explain what’s viable, what isn’t straightforward, and give concrete, realistic options (including stepbystep actions) so you can pick the best path for running INT4 models locally and connecting them to ARC / Autonomous AI.
Summary / quick answer
Options in detail and recommended path
Option A - Recommended: llama.cpp / ggml / textgenerationwebui (on Pi 5, good balance)
Option B - Ollama on Pi (possible, but limited)
Option C - Offload model hosting to a more powerful box / keep Pi as controller
Hailo8 notes (important)
Concrete step-by-step path I recommend (llama.cpp + textgenerationwebui on Raspberry Pi 5)
-tthreads,-ccontext size.If you want to proceed with Ollama specifically on Pi
How to connect Autonomous AI in ARC to your Pi LLM server
Other practical tips
Useful links
Questions for you (so I can give a precise stepbystep)
If you tell me your OS and whether you prefer Ollama or llama.cpp, I’ll produce an exact stepbystep install + ARC configuration guide tailored to your Pi 5 and your goal of INT4 inference.
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.