Canada
Asked

Grok With Autonomous AI Loops

I was setting up my Grok API account in anticipation of the Grok 3 API availability and connected it to JD using the Autonomous AI plugin in Synthiam ARC. I used the API call to https://api.x.ai/v1/ with the model "grok-2-vision-latest." However, it seems to enter a loop when I ask it to perform an action. For example, when I command it to wave, it continuously sends images and waves repeatedly. This behavior does not occur with ChatGPT, where it waves once and stops. I have noticed a similar looping pattern with other models as well.

Here is a snippet of the interaction log:

  • Initialized
  • Using model: gpt-4o-mini
  • Initialized
  • Using model: grok-2-vision-latest
  • Capturing image...
  • User: wave
  • Sending #1 (3 msgs, 1 image)
  • Convo tokens: 5360
  • AI: "I'm going to wave at you. Watch this!"
  • Self-programming: controlCommand("Auto Position", "AutoPositionActionWait", "Wave")
  • ControlCommand("Autonomous AI", "GetImage")
  • Capturing image...
  • Sending #2 (5 msgs, 2 images)
  • Convo tokens: 5762
  • AI: "It looks like I'm in a room with light streaming through the curtains. I don't see anyone. Could you please tell me where you are located?"
  • Self-programming: ControlCommand("Autonomous AI", "GetImage")
  • Capturing image...

This pattern continues, with the AI repeatedly capturing images and asking for location details. It seems to be stuck in a loop, and I'm unsure how to resolve this issue. Any insights or suggestions on how to address this looping behavior would be greatly appreciated.


ARC Pro

Upgrade to ARC Pro

Your robot can be more than a simple automated machine with the power of ARC Pro!

PRO
Canada
#17  

I get it.  Doing some development or having a play sure the costs are minimal but take that into production on a home robot that ru s around the house all day checking security, vacuuming floors controlling devices following instructions that $3 (or $5 Canadian) a day gets a little pricey every month.  Also not a big fan of having a robot that runs around my house collecting audio and video of me and my family and the. sharing my private data with a 3rd party.

PRO
Synthiam
#18   — Edited

Both of those are points we agree on. I’m guessing the costs are only going to lower. Energy and resource costs will lower over time as competition rises. I doubt we’ll see local inference without future breakthroughs.

I’m guessing we’ll start to see personal data containers for cloud systems. That will protect and isolate privacy.

due to costs were quite a bit away from commercial products with llm technology.

I am a believer that llm is the correct approach for generic and adaptive tasks. But I’m guessing we’ll see a hybrid of local and cloud inference. It’s something I’ve been considering lately. Is how the data can be training a local model for your unique use case. It can train off the cloud process.

PRO
Synthiam
#19   — Edited

Whoa - look at this. I'm watching a YouTube video about Claude 3.7 from Fireship. he showed a stack overflow website traffic graph since chat gpt was released. I haven't visited stack overflow since chat gpt either. Surprising, not surprising. But surprising nonetheless.

User-inserted image

PRO
Canada
#20  

When I am bored I copy and paste Reddit questions into chatGPT and then paste the answer back into Reddit.  I think they need a bot like Athena hooked in to answer all the stupid questions you get on there.  Although stack overflow has overflow.ai and that didn’t help their site visits anyway.

#21  

Hi DJ,  Just a curios question here, as you earlier mentioned the Kinetic and the realsense for depth perception of objects, navigation ; Cannot the EZB camera not do any of that since you've shown us how to program and detect objects, colors, and faces. I could also be just missing something too:)

PRO
Synthiam
#22   — Edited

Depth with a single camera is like closing one eye and trying to reach for an object. Object-relational reasoning is the only way you can determine depth with a single camera. That means using knowledge about the object and its size in the viewport to determine how close it might be to the viewer with a single camera. That's not an accurate way of determining 3D positioning. While LLMs can "sort of" do what our brains do to rationalize how close an object is by its size, it's not reliable.

I'll expand on that. If you close one eye and see a coffee cup at a specific size, your brain rationalizes how far it must be from you based on its size. This happens only because your brain knows how big a coffee cup should be. That is the only way it can rationalize a distance, by the object's size with a single camera or eye. With two eyes or two cameras, the brain or software can calculate how far away the object is by analyzing the difference between the two images.

A dual camera, dual IR, or laser camera is required to determine distance. This is done by comparing two images to determine the distance of objects. You can do this by identifying points of interest in both photos and comparing their locations on the image. It used the distance between the two cameras to calculate the object's distance. You can reproduce this effect by closing and opening each eye while looking at a cup on a table. The cup will appear to move positions for each eye. That's the difference between the eye and the other eye, which can be used to calculate the distance from the viewer.

PRO
USA
#23  

This is an interesting convo. I'll just add I think that Open AIs pricing structure will have to change to stay in the game. China is hammering home with DeepSeek R1 costing just  <10% of  Open AIs or approx $.14 cents per 1 million tokens vs $7.50 with Open AI. With Sam being involved in the $500 billion Stargate project, it seems unlikely he will be interested in getting the pricing down?! With focus on AGI or SAGI and in bed with Nvidia there is doubt he cares about pricing, but if he wants Open AI to stay relevant he would need to make Open AI free to regular customers and charge big $$$ for commercial use, or change how Open AI functions making it more efficient like DeepSeek and therefore cheaper. Everyday is An AI battle so many pop ups.

PRO
Synthiam
#24  

Has anyone tried autonomous AI with Deepseek yet? The few times I tried, the servers wouldn't respond. I don't think they have the infrastructure to support their demand.