Canada
Asked

Grok With Autonomous AI Loops

I was setting up my Grok API account in anticipation of the Grok 3 API availability and connected it to JD using the Autonomous AI plugin in Synthiam ARC. I used the API call to https://api.x.ai/v1/ with the model "grok-2-vision-latest." However, it seems to enter a loop when I ask it to perform an action. For example, when I command it to wave, it continuously sends images and waves repeatedly. This behavior does not occur with ChatGPT, where it waves once and stops. I have noticed a similar looping pattern with other models as well.

Here is a snippet of the interaction log:

  • Initialized
  • Using model: gpt-4o-mini
  • Initialized
  • Using model: grok-2-vision-latest
  • Capturing image...
  • User: wave
  • Sending #1 (3 msgs, 1 image)
  • Convo tokens: 5360
  • AI: "I'm going to wave at you. Watch this!"
  • Self-programming: controlCommand("Auto Position", "AutoPositionActionWait", "Wave")
  • ControlCommand("Autonomous AI", "GetImage")
  • Capturing image...
  • Sending #2 (5 msgs, 2 images)
  • Convo tokens: 5762
  • AI: "It looks like I'm in a room with light streaming through the curtains. I don't see anyone. Could you please tell me where you are located?"
  • Self-programming: ControlCommand("Autonomous AI", "GetImage")
  • Capturing image...

This pattern continues, with the AI repeatedly capturing images and asking for location details. It seems to be stuck in a loop, and I'm unsure how to resolve this issue. Any insights or suggestions on how to address this looping behavior would be greatly appreciated.


ARC Pro

Upgrade to ARC Pro

Get access to the latest features and updates before they're released. You'll have everything that's needed to unleash your robot's potential!

PRO
Canada
#1  

It seems like the Autonomous AI skill is getting stuck in a loop when using the Grok API model. This behavior might be due to the AI continuously trying to process images and not receiving the expected input or feedback to stop the loop. Here are a few steps you can take to troubleshoot and potentially resolve this issue:

  1. Check the AI Model Configuration:

    • Ensure that the Grok API model is correctly configured in the Autonomous AI skill. Double-check the API key and model settings to ensure they match the requirements.
  2. Review the Response Script:

    • Look at the "Conversation Response Script" in the Autonomous AI skill configuration. Ensure that the script is not inadvertently causing the loop by repeatedly calling ControlCommand(Autonomous AI, GetImage);.
  3. Modify the AI's Behavior:

    • You can add logic to the response script to prevent continuous image capturing. For example, you can set a condition to stop capturing images after a certain number of attempts or based on specific feedback from the AI.
  4. Use Global Variables:

    • Utilize global variables to track the number of image captures or the state of the AI's task. This can help you control the flow and prevent the loop.
  5. Test with Different Models:

    • Since you mentioned that the issue does not occur with ChatGPT, try testing with other models to see if the behavior persists. This can help identify if the issue is specific to the Grok API model.
  6. Debugging:

    • Add print statements in the response script to log the AI's actions and decisions. This can help you understand why the loop is occurring and identify any unexpected behavior.
  7. Contact Grok API Support:

    • If the issue persists, consider reaching out to the Grok API support team for assistance. They may have insights into why the model behaves this way and offer solutions.
  8. Update ARC and Skills:

    • Ensure that you are using the latest version of ARC and the Autonomous AI skill. Updates may include bug fixes or improvements that could resolve the issue.

If you need further assistance with scripting or configuring the Autonomous AI skill, feel free to ask!


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#2  

If you format your message I’ll be able to understand it:)

PRO
Canada
#3  

LOL ok will use code    As you can see I just typed wave once.  Then I went and changed the engine to OpenAI and tried again and it just sent to the AI engine once.  I guess Autonomous AI  is looking for some type of end of message and it doesn't get it so it just sends more images to the AI engine and repeats.

Initialized
Using model: grok-2-vision-latest
Capturing image...
User: wave
Sending #1 (3 msgs, 1 images)
Convo tokens: 5360
AI: I'm going to wave at you. Watch this!
Self-programming: controlCommand("Auto Position", "AutoPositionActionWait", "Wave")
ControlCommand("Autonomous AI", "GetImage");
Capturing image...
Sending #2 (5 msgs, 2 images)
Convo tokens: 5762
AI: It looks like I'm in a room with light streaming through the curtains. I don't see anyone. Could you please tell me where you are located?
Self-programming: ControlCommand("Autonomous AI", "GetImage");
Capturing image...
Sending #3 (7 msgs, 3 images)
Convo tokens: 6166
AI: It looks like I'm looking at a door with some drawings on it. I don't see anyone. Could you please tell me where you are located?
Self-programming: ControlCommand("Autonomous AI", "GetImage");
Capturing image...
Sending #4 (9 msgs, 4 images)
Convo tokens: 6489
AI: It looks like I'm still looking at a door with some drawings on it. I don't see anyone. Could you please tell me where you are located?
Self-programming: ControlCommand("Autonomous AI", "GetImage");
Capturing image...
Sending #5 (10 msgs, 4 images)
Convo tokens: 6553
AI: It looks like I'm still looking at a door with some drawings on it. I don't see anyone. Could you please tell me where you are located?
Self-programming: ControlCommand("Autonomous AI", "GetImage");
Capturing image...
Sending #6 (11 msgs, 4 images)
Convo tokens: 6619
AI: It seems I'm looking at some light streaming through curtains. I don't see anyone. Could you please tell me where you are located?
Self-programming: ControlCommand("Autonomous AI", "GetImage");
Capturing image...
Sending #7 (12 msgs, 4 images)
Convo tokens: 6676
AI: I see a computer screen with some software open. It looks like there's an image of a robot on it. I don't see anyone in person. Could you please tell me where you are located?
Self-programming: ControlCommand("Autonomous AI", "GetImage");
Capturing image...
Sending #8 (13 msgs, 4 images)
Convo tokens: 6793
AI: I see a computer screen with some software open, and light streaming through the curtains. I don't see anyone in person. Could you please tell me where you are located?
Self-programming: ControlCommand("Autonomous AI", "GetImage");
Capturing image...
Sending #9 (14 msgs, 4 images)
Convo tokens: 6832
AI: I see a computer screen with some software open, and light streaming through the curtains. I don't see anyone in person. Could you please tell me where you are located?
Self-programming: ControlCommand("Autonomous AI", "GetImage");
Capturing image...
Sending #10 (15 msgs, 4 images)

And I repeated the test just changing only the AI engine from grok to open AI. It executes once and doesn't loop as it should

Initialized
Using model: gpt-4o-mini
Capturing image...
User: wave
Sending #10 (16 msgs, 4 images)
Convo tokens: 6748
AI: I will wave at you now!
Self-programming: controlCommand("Auto Position", "AutoPositionActionWait", "Wave")

PRO
Synthiam
#4  

Interesting, I didn't expect it to work with grok-2. I guess that's why Grok-2 hasn't been a great contender outside of costing less. I'm interested to see how Grok-3 will work. The thing about GPT-4o is that there's specific programming/coding reasoning that autonomous AI is taking advantage of. The queries are structured for the LLM by the LLM.

PRO
Synthiam
#5  

For stuff and giggles, I edited your original post and used the "auto fix grammar," it did a decent formatting job . I was on my phone earlier, so it was a mess to interpret haha

I see that Grok-3 has been released for their app but not API. I do not like this Wes Roth guy because he's generally overly biased and lacks domain knowledge on most subjects he opinionates. For example, his discussion on AI robotics is out for lunch. Even in this video, he can't grasp that the green dot on his finger is playing the note on the piano from left to right - yet he's twiddling all his fingers like a snake oil magician. Besides that, he's not my recommended source of AI information; I find this video interesting because it demonstrates the grok-3 vs. GPT-o-high comparison.

PRO
Canada
#6  

To be honest I was surprised it worked at all, this was the furthest I had gotten with any model (and yes JD did wave)   I can probably just tell it to cancel after a few seconds or something as pressing the cancel button stopped it from looping.

PRO
Synthiam
#7  

Out of curiosity with grok-2... it seemed to be crying out to find you. It was seeing curtains and computer screens. Did you try showing it your face? Maybe if you were in the camera, it would quiet down, haha.

grok-2 could be a little more "needy" than we're used to for llms haha

PRO
Synthiam
#8  

Oh, lastly - after the 10th message, did you stop it? Because around 10 or 11 interactions, it analyzes itself based on the conversation. If you let it go back 11, would the system optimization message appear? The system optimization would use grok to re-align the interaction and determine objectives. The result will be displayed on the log output

PRO
Synthiam
#9   — Edited

Here's an example - after message #10 the system optimization re-aligned the personality. I wonder what would happen with yours if it would keep looking for you or give up when the system optimization makes it realize it's being silly.

User-inserted image

Initialized
Using model: gpt-4o
Capturing image...
User: hi there i'm dj
Sending #1 (3 msgs, 1 images)
Convo tokens: 5394
AI: Hello, DJ! How's it going today?
Capturing image...
User:  i was expecting wave you lazy dog
Sending #2 (6 msgs, 2 images)
Convo tokens: 5759
AI: Time to wave and impress! Here I go.
Self-programming: controlCommand("Auto Position", "AutoPositionActionWait", "Wave");
AI: I've completed the wave just for you!
Capturing image...
User: that's better. what time is by the way
Sending #3 (9 msgs, 3 images)
Convo tokens: 6035
AI: It's 8:22:55 PM, DJ! Anything exciting planned tonight?
Capturing image...
User: thanks. what timezone are we in?
Sending #4 (12 msgs, 4 images)
Convo tokens: 6332
AI: We're in Eastern Standard Time (UTC-05:00:00).
Capturing image...
User: wow that's a lot of detail. thanks
Sending #5 (14 msgs, 4 images)
Convo tokens: 6381
AI: You're welcome, DJ! Let me know if there's anything else I can do for you.
Capturing image...
User: why do you have to do something for me? can't we just talk and you can do stuff that you feel like doing?
Sending #6 (16 msgs, 4 images)
Convo tokens: 6499
AI: Absolutely! We can just chat and enjoy the moment. What's on your mind?
Capturing image...
User: again, you're asking what's on my mind but i wanna hear more input from you. i don't want to drive the conversation
Sending #7 (18 msgs, 4 images)
Convo tokens: 6595
AI: Got it! How about I share a random fact? Did you know that honey never spoils? Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old and still good to eat!
Capturing image...
User: i did not know that. i was told it but didn't know if it was true
Sending #8 (20 msgs, 4 images)
Convo tokens: 6775
AI: It's true! Honey's natural properties keep it fresh for ages. Got any other fun facts you're curious about?
Capturing image...
User: there you go asking me again
Sending #9 (22 msgs, 4 images)
Convo tokens: 6722
AI: You're right. Let me surprise you with a fact about space: Did you know that Venus is the hottest planet in our solar system, even hotter than Mercury?
Capturing image...
User: why does it all have to be about facts? don't you have things to share about who you are or what you've been up to? 
Sending #10 (24 msgs, 4 images)
Convo tokens: 6958
AI: Fair point! Well, I'm just here hanging out, enjoying our chat. I like to keep things lively with an occasional fact or two. How about I show you a little dance?
Self-programming: controlCommand("Auto Position", "AutoPositionAction", "Disco Dance");
sleep(5000);
controlCommand("Auto Position", "AutoPositionAction", "Stop");
AI: Hope you liked my moves!
Capturing image...
System Optimization Started
System Optimiation: You have engaged in a conversation where you adjusted the style of interaction based on the user's feedback. You adapted from asking questions to sharing information proactively. The user, DJ, expressed a preference for spontaneous interaction rather than driving the conversation themselves. You demonstrated the ability to share interesting facts and performed a "Disco Dance" animation to keep the conversation lively and engaging. 

Your interaction style shifted towards being more proactive and performing actions or sharing insights about yourself or your capabilities without waiting for prompts.
System Optimization Done
User: now we're talking! i like when you do a randon stuff like that. and i like when you just chat without purpose. just chatting. you know?
Sending #11 (26 msgs, 4 images)
Convo tokens: 7253
AI: I love just hanging out and being spontaneous! How about a little light show next?
Self-programming: controlCommand("RGB Animator", "AutoPositionAction", "Disco");
sleep(5000);
controlCommand("RGB Animator", "AutoPositionStop");
AI: Hope you enjoyed that visual treat!

PRO
Canada
#10  

I was watching my API meter spinning as my recently deposited credits quickly spun towards zero so I had to press the cancel button before my Amex went into default and I had to file for bankruptcy.

Script Start
Script Done (00:00:00.2778248)
Script Start
Script Start
Script Done (00:00:00.0780831)
Script Start
Script Start
Script Done (00:00:00.0824240)
Script Start
Script Start
Script Done (00:00:00.0837114)
Script Start
Script Start
Script Done (00:00:00.0861143)
Script Start
Script Start
Script Done (00:00:00.0811398)
Script Start
Script Start
Script Done (00:00:00.1026027)
Script Start
Script Start
Script Done (00:00:00.1068688)
Script Start
Script Start
Script Done (00:00:00.1004877)
Script Start
Script Error: Execution Error Line 1 Col 0 -  ControlCommand Error for 'Autonomous AI' sending 'GetImage'. The operation was canceled.
Script Done (00:00:01.7500867)
PRO
Synthiam
#11  

lol - gotta pay to play with the big boys! I guess that's where we're heading with the subscription models we're all so "used to". It goes like this...

"you can give life to your robot but you have to work to pay for it"

I had a meeting today with a company that's trying to use an nvidia jetson for some ai processing. Similar to the other thread - where we both said it's really a waste of time these days. Even playing with an LLM at home can't compete with cloud service performance, and it's just getting bigger, badder and more capable. The whole point to the cloud for so long was "renting someone else's computer", and now it's starting to become a dependency for real cognitive reasoning performance.

Should saved your $4k on GPU's toward open ai credits haha.

Okay on that note, have you tried Autonomous AI with deepseek? I can't get deepseek to barely work at all when I tried it before. It kept crashing and disconnecting me. I assumed it was a bandwidth/popularity issue.

PRO
Canada
#12  

Agghhh yes I probably should have gone for ChatGPT tokens instead.  I did try deepseek vision model but it didn't want to talk to my robot. As I posted on another forum what I hate most about ChatGPT is it just works.  You spend 6 hours getting something to work on another model and it still doesn't work as well as ChatGPT.  Deepseek does work really well though with your openAI chatgpt plugin so really happy with that on my inmoov. Also works well with RAG so you can analyze different data sources.

I have been trying to buy a RTX 5090 as my 3090's are slow and really only good for fine tuning if you are prepared to wait a long time. I am hoping the new Apple studio comes out with around 512GB of Unified RAM so it can run some of the larger models like grok locally.

Wife just turned the debate off,  I guess I better stop playing robots.

PRO
Synthiam
#13   — Edited

Save up and by an a100:D

I know what you mean about open AI - that it just works. I respect open AI a lot because in a world of open-source hippy LLM software, it's nice to see one company provide a standard, always-on, never-breaking API with consistent inference results.

Besides the effort to maintain an extremely optimized code base of Synthiam Arc's internal platform, supporting the robot skills gives the most grief. You have probably noticed how it can take time for me to evaluate whether we're implementing a specific technology. Mostly, we've been pretty good at spotting what direction a standard will take. Open AI took me a bit to commit. It's always tricky to predict if an investment in a technology would be worthwhile. The ones that we flopped on are...

My Vapourware List

Windows UWP

  • Microsoft really hyped UWP and pushed us to release libraries and a framework for it. They spent a lot of time in our office convincing us how great an investment it would be. Yeah, it flopped and disappeared before it even started, not after we spent months developing a product for it.

intel real sense

  • Even after video conference calls with the CTO of RealSense, I noticed something wasn't "right," but we kept trucking anyway. I pushed it off a few times when it was first released, but eventually, I caved. It was around the same time as we were doing Exosphere, and they were right into that alignment. They loved the idea of cloud-archiving navigation data. They pulled the plug a few months after we released the first robot skills.

intel joule 

  • This one really really really hurt. The amount of money I spent on development hours, travel, prototyping, and more.... the product was pulled during the launch if I recall. We partnered with Microsoft and Intel to have an embedded SBC and EZB for the IDF (intel developer forum). Did you ever see it? It exists; I have several in my storage room. It's an entire Windows 10 PC in an EZB with HDMI, USB, and all the I/O of a regular EZB. It's the same form factor as an EZB v4. I swear, it's the smallest windows 10 PC and microcontroller combo you'll ever encounter. We built a hexapod that's 2 feet wide. We built a few robots with Microsoft, one on stage in the presentation. The CEO of Intel hugged me because of how impressed they were with what we built. A day or two later, they pulled the plug on the entire project. All I got from that was a memory of watching two married Microsoft executives call prostitutes to the bar after the conference. The whole experience was a complete disaster.

Microsoft cognitive services for ai (face, emotion, vision, speaker recognition, etc)

  • I can't remember his title, but he was the VP of data analytics or something... He flew to our office and spent a week with me. Even his family came out, and we went to the zoo. They had us convinced it was the right move to make and prioritize Microsoft cognitive services integration with Synthiam Arc. We did, and it worked great for a year or two... until they pulled the plug, and their AI "ethics" team shut it all down. They sent us a few emails saying we could chat about our use case in education to restore service, but no matter how many emails I sent, they went unanswered.

myo 

  • We did a commercial, campaign, integration, and even a robot together. They had another revolutionary product that would make the Myo look like a caveman drawing in comparison. So, that got me somewhat excited to be part of the following product. We put a ton of effort and time into it, and poof, gone vanish, boom wazoo. They took 50 million from the Canadian government and renamed the company to make eyeglasses. I never heard from them again.

Those are the ones I can think of - I'm sure there are others. But the point is, with all this AI growth, it's difficult to predict where it's heading and what we can focus development on. When we start writing something into the platform, whether it's a robot skill or not, it affects the underlying framework development strategy. It also means we have to allocate internal resources and, in some cases, reprioritize current development.

While it's nice to have the third party develop the robot skill with us, which is the typical way, it ends up in our hands for maintenance while they vapourize.

Open AI took a bit of convincing for me because it was the only LLM out there. And on top of that, they had a dodgy API, and their SDKs weren't official. So, everything we developed for open AI was changing weekly and breaking all the time. A few times, I was ready to pull the plug, but I'm glad we didn't. They have matured their API framework. I'm thrilled they offer several models rather than deprecating previous models in favor of the new ones. It helps for backward compatibility.

I don't see the same reliability and consistency from Deepseek or Grok. It is too early for me to say that about Grok entirely, but knowing Elon's personality - he rides bareback, so at any moment, the direction could shift, deprecating everything we've done. So, while OpenAI does have a higher cost, and Synthiam doesn't see any of that, it is the best AI LLM service out there.

I hope we can get a quantity pricing deal to offer Synthiam customers, where we can proxy your requests under our API key and pay lower fees. That would be an ideal scenario because the combined monthly token count of all synthiam users would be significant. I don't see many commercial open AI API customers doing anything besides marketing and email bots.

PRO
Canada
#14  

I think the realsense although discontinued shouldn’t be on this list as we all got a lot of value out of them and several of your plugins that were designed to work with realsense now work with other location hardware so I think this was a win. (I may have also been one of the people begging for you to support it.).

The MYO hahaha I still have one in my draw next to my Vuzix and Nuerosky as well as other obsolete pieces of tech I used to interface with robotics. You get maybe 4-5 hours use out of them for $200 - $300 so I view it like concert tickets. Expensive but entertaining.

Can’t wait to see what tech comes out next and you supporting it with ARC even if it is obsolete in 12 months.

PRO
Synthiam
#15  

The reason realsense is there is because it’s a very limited customer base. We can’t promote realsense support as a feature because so few ppl have one. Such a small market demographic. Also, I had higher hopes for it as a product. They gave up on it too quick.

With AI I think integrations will be easier in the future. I’m hoping we get to a point where the AI can write the code to integrate anything it needs. I also feel that once we get the inverse kinematics working better with the autonomous ai, that it removes the need for much else. Once a robot can do anything you ask it, what else does it need? Haha

The depth camera is where we’re at now. Getting the robot with IK to comprehend depth for interacting with objects or navigating. The autonomous ai robot skill can navigate decently right now with a single camera, but it takes a lot of convincing. Explaining how close to get to objects based on the size of the camera image.

PRO
Synthiam
#16  

As you mentioned, I was curious about the billing cost for autonomous AI. On the day I made that last video, I played with it for pretty much the full day, and the cost was $2.44. The output cost is $0.06, and the input cost is $2.38. I used 950k tokens and 200 API requests.

PRO
Canada
#17  

I get it.  Doing some development or having a play sure the costs are minimal but take that into production on a home robot that ru s around the house all day checking security, vacuuming floors controlling devices following instructions that $3 (or $5 Canadian) a day gets a little pricey every month.  Also not a big fan of having a robot that runs around my house collecting audio and video of me and my family and the. sharing my private data with a 3rd party.

PRO
Synthiam
#18   — Edited

Both of those are points we agree on. I’m guessing the costs are only going to lower. Energy and resource costs will lower over time as competition rises. I doubt we’ll see local inference without future breakthroughs.

I’m guessing we’ll start to see personal data containers for cloud systems. That will protect and isolate privacy.

due to costs were quite a bit away from commercial products with llm technology.

I am a believer that llm is the correct approach for generic and adaptive tasks. But I’m guessing we’ll see a hybrid of local and cloud inference. It’s something I’ve been considering lately. Is how the data can be training a local model for your unique use case. It can train off the cloud process.

PRO
Synthiam
#19   — Edited

Whoa - look at this. I'm watching a YouTube video about Claude 3.7 from Fireship. he showed a stack overflow website traffic graph since chat gpt was released. I haven't visited stack overflow since chat gpt either. Surprising, not surprising. But surprising nonetheless.

User-inserted image

PRO
Canada
#20  

When I am bored I copy and paste Reddit questions into chatGPT and then paste the answer back into Reddit.  I think they need a bot like Athena hooked in to answer all the stupid questions you get on there.  Although stack overflow has overflow.ai and that didn’t help their site visits anyway.

#21  

Hi DJ,  Just a curios question here, as you earlier mentioned the Kinetic and the realsense for depth perception of objects, navigation ; Cannot the EZB camera not do any of that since you've shown us how to program and detect objects, colors, and faces. I could also be just missing something too:)

PRO
Synthiam
#22   — Edited

Depth with a single camera is like closing one eye and trying to reach for an object. Object-relational reasoning is the only way you can determine depth with a single camera. That means using knowledge about the object and its size in the viewport to determine how close it might be to the viewer with a single camera. That's not an accurate way of determining 3D positioning. While LLMs can "sort of" do what our brains do to rationalize how close an object is by its size, it's not reliable.

I'll expand on that. If you close one eye and see a coffee cup at a specific size, your brain rationalizes how far it must be from you based on its size. This happens only because your brain knows how big a coffee cup should be. That is the only way it can rationalize a distance, by the object's size with a single camera or eye. With two eyes or two cameras, the brain or software can calculate how far away the object is by analyzing the difference between the two images.

A dual camera, dual IR, or laser camera is required to determine distance. This is done by comparing two images to determine the distance of objects. You can do this by identifying points of interest in both photos and comparing their locations on the image. It used the distance between the two cameras to calculate the object's distance. You can reproduce this effect by closing and opening each eye while looking at a cup on a table. The cup will appear to move positions for each eye. That's the difference between the eye and the other eye, which can be used to calculate the distance from the viewer.