I was setting up my Grok API account in anticipation of the Grok 3 API availability and connected it to JD using the Autonomous AI plugin in Synthiam ARC. I used the API call to https://api.x.ai/v1/
with the model "grok-2-vision-latest." However, it seems to enter a loop when I ask it to perform an action. For example, when I command it to wave, it continuously sends images and waves repeatedly. This behavior does not occur with ChatGPT, where it waves once and stops. I have noticed a similar looping pattern with other models as well.
Here is a snippet of the interaction log:
- Initialized
- Using model: gpt-4o-mini
- Initialized
- Using model: grok-2-vision-latest
- Capturing image...
- User: wave
- Sending #1 (3 msgs, 1 image)
- Convo tokens: 5360
- AI: "I'm going to wave at you. Watch this!"
- Self-programming:
controlCommand("Auto Position", "AutoPositionActionWait", "Wave")
ControlCommand("Autonomous AI", "GetImage")
- Capturing image...
- Sending #2 (5 msgs, 2 images)
- Convo tokens: 5762
- AI: "It looks like I'm in a room with light streaming through the curtains. I don't see anyone. Could you please tell me where you are located?"
- Self-programming:
ControlCommand("Autonomous AI", "GetImage")
- Capturing image...
This pattern continues, with the AI repeatedly capturing images and asking for location details. It seems to be stuck in a loop, and I'm unsure how to resolve this issue. Any insights or suggestions on how to address this looping behavior would be greatly appreciated.
It seems like the Autonomous AI skill is getting stuck in a loop when using the Grok API model. This behavior might be due to the AI continuously trying to process images and not receiving the expected input or feedback to stop the loop. Here are a few steps you can take to troubleshoot and potentially resolve this issue:
Check the AI Model Configuration:
Review the Response Script:
ControlCommand(Autonomous AI, GetImage);
.Modify the AI's Behavior:
Use Global Variables:
Test with Different Models:
Debugging:
Contact Grok API Support:
Update ARC and Skills:
If you need further assistance with scripting or configuring the Autonomous AI skill, feel free to ask!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
If you format your message I’ll be able to understand it
LOL ok will use code As you can see I just typed wave once. Then I went and changed the engine to OpenAI and tried again and it just sent to the AI engine once. I guess Autonomous AI is looking for some type of end of message and it doesn't get it so it just sends more images to the AI engine and repeats.
And I repeated the test just changing only the AI engine from grok to open AI. It executes once and doesn't loop as it should
Interesting, I didn't expect it to work with grok-2. I guess that's why Grok-2 hasn't been a great contender outside of costing less. I'm interested to see how Grok-3 will work. The thing about GPT-4o is that there's specific programming/coding reasoning that autonomous AI is taking advantage of. The queries are structured for the LLM by the LLM.
For stuff and giggles, I edited your original post and used the "auto fix grammar," it did a decent formatting job . I was on my phone earlier, so it was a mess to interpret haha
I see that Grok-3 has been released for their app but not API. I do not like this Wes Roth guy because he's generally overly biased and lacks domain knowledge on most subjects he opinionates. For example, his discussion on AI robotics is out for lunch. Even in this video, he can't grasp that the green dot on his finger is playing the note on the piano from left to right - yet he's twiddling all his fingers like a snake oil magician. Besides that, he's not my recommended source of AI information; I find this video interesting because it demonstrates the grok-3 vs. GPT-o-high comparison.
To be honest I was surprised it worked at all, this was the furthest I had gotten with any model (and yes JD did wave) I can probably just tell it to cancel after a few seconds or something as pressing the cancel button stopped it from looping.
Out of curiosity with grok-2... it seemed to be crying out to find you. It was seeing curtains and computer screens. Did you try showing it your face? Maybe if you were in the camera, it would quiet down, haha.
grok-2 could be a little more "needy" than we're used to for llms haha
Oh, lastly - after the 10th message, did you stop it? Because around 10 or 11 interactions, it analyzes itself based on the conversation. If you let it go back 11, would the system optimization message appear? The system optimization would use grok to re-align the interaction and determine objectives. The result will be displayed on the log output
Here's an example - after message #10 the system optimization re-aligned the personality. I wonder what would happen with yours if it would keep looking for you or give up when the system optimization makes it realize it's being silly.
I was watching my API meter spinning as my recently deposited credits quickly spun towards zero so I had to press the cancel button before my Amex went into default and I had to file for bankruptcy.
lol - gotta pay to play with the big boys! I guess that's where we're heading with the subscription models we're all so "used to". It goes like this...
"you can give life to your robot but you have to work to pay for it"
I had a meeting today with a company that's trying to use an nvidia jetson for some ai processing. Similar to the other thread - where we both said it's really a waste of time these days. Even playing with an LLM at home can't compete with cloud service performance, and it's just getting bigger, badder and more capable. The whole point to the cloud for so long was "renting someone else's computer", and now it's starting to become a dependency for real cognitive reasoning performance.
Should saved your $4k on GPU's toward open ai credits haha.
Okay on that note, have you tried Autonomous AI with deepseek? I can't get deepseek to barely work at all when I tried it before. It kept crashing and disconnecting me. I assumed it was a bandwidth/popularity issue.
Agghhh yes I probably should have gone for ChatGPT tokens instead. I did try deepseek vision model but it didn't want to talk to my robot. As I posted on another forum what I hate most about ChatGPT is it just works. You spend 6 hours getting something to work on another model and it still doesn't work as well as ChatGPT. Deepseek does work really well though with your openAI chatgpt plugin so really happy with that on my inmoov. Also works well with RAG so you can analyze different data sources.
I have been trying to buy a RTX 5090 as my 3090's are slow and really only good for fine tuning if you are prepared to wait a long time. I am hoping the new Apple studio comes out with around 512GB of Unified RAM so it can run some of the larger models like grok locally.
Wife just turned the debate off, I guess I better stop playing robots.
Save up and by an a100
I know what you mean about open AI - that it just works. I respect open AI a lot because in a world of open-source hippy LLM software, it's nice to see one company provide a standard, always-on, never-breaking API with consistent inference results.
Besides the effort to maintain an extremely optimized code base of Synthiam Arc's internal platform, supporting the robot skills gives the most grief. You have probably noticed how it can take time for me to evaluate whether we're implementing a specific technology. Mostly, we've been pretty good at spotting what direction a standard will take. Open AI took me a bit to commit. It's always tricky to predict if an investment in a technology would be worthwhile. The ones that we flopped on are...
My Vapourware List
Windows UWP
intel real sense
intel joule
Microsoft cognitive services for ai (face, emotion, vision, speaker recognition, etc)
myo
Those are the ones I can think of - I'm sure there are others. But the point is, with all this AI growth, it's difficult to predict where it's heading and what we can focus development on. When we start writing something into the platform, whether it's a robot skill or not, it affects the underlying framework development strategy. It also means we have to allocate internal resources and, in some cases, reprioritize current development.
While it's nice to have the third party develop the robot skill with us, which is the typical way, it ends up in our hands for maintenance while they vapourize.
Open AI took a bit of convincing for me because it was the only LLM out there. And on top of that, they had a dodgy API, and their SDKs weren't official. So, everything we developed for open AI was changing weekly and breaking all the time. A few times, I was ready to pull the plug, but I'm glad we didn't. They have matured their API framework. I'm thrilled they offer several models rather than deprecating previous models in favor of the new ones. It helps for backward compatibility.
I don't see the same reliability and consistency from Deepseek or Grok. It is too early for me to say that about Grok entirely, but knowing Elon's personality - he rides bareback, so at any moment, the direction could shift, deprecating everything we've done. So, while OpenAI does have a higher cost, and Synthiam doesn't see any of that, it is the best AI LLM service out there.
I hope we can get a quantity pricing deal to offer Synthiam customers, where we can proxy your requests under our API key and pay lower fees. That would be an ideal scenario because the combined monthly token count of all synthiam users would be significant. I don't see many commercial open AI API customers doing anything besides marketing and email bots.
I think the realsense although discontinued shouldn’t be on this list as we all got a lot of value out of them and several of your plugins that were designed to work with realsense now work with other location hardware so I think this was a win. (I may have also been one of the people begging for you to support it.).
The MYO hahaha I still have one in my draw next to my Vuzix and Nuerosky as well as other obsolete pieces of tech I used to interface with robotics. You get maybe 4-5 hours use out of them for $200 - $300 so I view it like concert tickets. Expensive but entertaining.
Can’t wait to see what tech comes out next and you supporting it with ARC even if it is obsolete in 12 months.
The reason realsense is there is because it’s a very limited customer base. We can’t promote realsense support as a feature because so few ppl have one. Such a small market demographic. Also, I had higher hopes for it as a product. They gave up on it too quick.
With AI I think integrations will be easier in the future. I’m hoping we get to a point where the AI can write the code to integrate anything it needs. I also feel that once we get the inverse kinematics working better with the autonomous ai, that it removes the need for much else. Once a robot can do anything you ask it, what else does it need? Haha
The depth camera is where we’re at now. Getting the robot with IK to comprehend depth for interacting with objects or navigating. The autonomous ai robot skill can navigate decently right now with a single camera, but it takes a lot of convincing. Explaining how close to get to objects based on the size of the camera image.
As you mentioned, I was curious about the billing cost for autonomous AI. On the day I made that last video, I played with it for pretty much the full day, and the cost was $2.44. The output cost is $0.06, and the input cost is $2.38. I used 950k tokens and 200 API requests.
I get it. Doing some development or having a play sure the costs are minimal but take that into production on a home robot that ru s around the house all day checking security, vacuuming floors controlling devices following instructions that $3 (or $5 Canadian) a day gets a little pricey every month. Also not a big fan of having a robot that runs around my house collecting audio and video of me and my family and the. sharing my private data with a 3rd party.
Both of those are points we agree on. I’m guessing the costs are only going to lower. Energy and resource costs will lower over time as competition rises. I doubt we’ll see local inference without future breakthroughs.
I’m guessing we’ll start to see personal data containers for cloud systems. That will protect and isolate privacy.
due to costs were quite a bit away from commercial products with llm technology.
I am a believer that llm is the correct approach for generic and adaptive tasks. But I’m guessing we’ll see a hybrid of local and cloud inference. It’s something I’ve been considering lately. Is how the data can be training a local model for your unique use case. It can train off the cloud process.
Whoa - look at this. I'm watching a YouTube video about Claude 3.7 from Fireship. he showed a stack overflow website traffic graph since chat gpt was released. I haven't visited stack overflow since chat gpt either. Surprising, not surprising. But surprising nonetheless.
When I am bored I copy and paste Reddit questions into chatGPT and then paste the answer back into Reddit. I think they need a bot like Athena hooked in to answer all the stupid questions you get on there. Although stack overflow has overflow.ai and that didn’t help their site visits anyway.
Hi DJ, Just a curios question here, as you earlier mentioned the Kinetic and the realsense for depth perception of objects, navigation ; Cannot the EZB camera not do any of that since you've shown us how to program and detect objects, colors, and faces. I could also be just missing something too
Depth with a single camera is like closing one eye and trying to reach for an object. Object-relational reasoning is the only way you can determine depth with a single camera. That means using knowledge about the object and its size in the viewport to determine how close it might be to the viewer with a single camera. That's not an accurate way of determining 3D positioning. While LLMs can "sort of" do what our brains do to rationalize how close an object is by its size, it's not reliable.
I'll expand on that. If you close one eye and see a coffee cup at a specific size, your brain rationalizes how far it must be from you based on its size. This happens only because your brain knows how big a coffee cup should be. That is the only way it can rationalize a distance, by the object's size with a single camera or eye. With two eyes or two cameras, the brain or software can calculate how far away the object is by analyzing the difference between the two images.
A dual camera, dual IR, or laser camera is required to determine distance. This is done by comparing two images to determine the distance of objects. You can do this by identifying points of interest in both photos and comparing their locations on the image. It used the distance between the two cameras to calculate the object's distance. You can reproduce this effect by closing and opening each eye while looking at a cup on a table. The cup will appear to move positions for each eye. That's the difference between the eye and the other eye, which can be used to calculate the distance from the viewer.