Asked

OK great I noticed GPT4o says give me any image in any format and I will work it out where everyone else wants base64
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What’s in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
"detail": "high"
},
},
],
}
],
max_tokens=300,
)
print(response.choices[0].message.content)
I really need to do both. @athena can you provide an overview on the process and advantages of first embedding robotics and ARC domain specific data into an existing model like llama-3 and then fine tuning the model on this new embedded data.
Certainly! Embedding and fine-tuning are two critical processes in enhancing the capabilities of a pre-trained language model like LLaMA-3 for specific tasks, such as robotics and ARC (Advanced Robotics Control). Here's an overview of both processes and their advantages:
Embedding Robotics and ARC Domain Data
1. Embedding: Embedding involves adding new information to the existing model without altering its core structure. This process helps the model understand and generate responses related to specific domains, such as robotics and ARC.
Process:
Advantages:
Fine-Tuning the Model on Embedded Data
2. Fine-Tuning: Fine-tuning involves adjusting the weights of the pre-trained model based on the new embedded data. This process helps the model generate more accurate and contextually relevant responses for specific tasks.
Process:
Advantages:
Combining Embedding and Fine-Tuning
Combining embedding and fine-tuning provides a powerful approach to enhancing a pre-trained model for specific domains and tasks. Here's how you can do it:
Practical Steps for Embedding and Fine-Tuning
Collect Data:
Preprocess Data:
Embed Data:
Prepare Fine-Tuning Dataset:
Fine-Tune Model:
Evaluate Model:
Conclusion
By embedding robotics and ARC domain data into an existing model like LLaMA-3 and then fine-tuning the model on this new embedded data, you can significantly enhance the model's capabilities for specific tasks. This approach provides a powerful and efficient way to leverage pre-trained models for domain-specific applications in robotics and ARC.
If you need further assistance or specific examples, feel free to ask!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
You’re over complicating things with fine tuning. Just because there’s several functions for working with llm’s doesn’t mean you have to use all of them.
you only need to embed
but - you also want to practice prompting. Prompting is the only time that micromanaging is beneficial. You need the prompt to be very specific and direct without leaving any assumptions. If you leave assumptions, expect the unexpected - which is based on your training data and the randomness of the creativity value.
I don’t know how to suggest shoehorning the open source stuff you’re working with to compete with OpenAI. There’s a reason one is free and the other employs the most talented experts. But I do love an underdog story - I just don’t bet my money on it
Because open ai has the highest amount of computing power on earth - you’ll have a difficult time finding compatible and flexible models.
My recommendation - and I know you don’t value open ai’s cost vs result - but my recommendation would be to practice with open ai for your inverse kinematics. Our results have been outstanding.
and it really doesn’t require programming. It requires strict instructions as I explained above- which is what humans love to do- give orders haha.
Oh one other thing - is that prioritizing domain knowledge via internal fine-tuning-type processes in newer gpt’s use nlp. So that’s a big one if you’re asking to fine tune or embed for multiple tasks as you listed.
if you just wanted to do one task, then you could simplify the process without needing nlp.
Put it this way, when you want your lawn mowed and plumbing fixed, you call different professionals.
I am building a server with 6 Nvidia RTX 3090 GPUs with NVLINKs with RODEM8-T2 server Mobo and a 128 lane EPYC CPU with 256GB of RAM for the purpose of training models.
I am struggling with the riser cables for the last 2 cards at the moment as I can only get them to run at PCIe 3.0 16x but I just got some new PCIe 5.0 cables so hopefully they can run at 4.0 full speed. Problem is you lose about 1.5db for every inch of PCIe traces and you really only have about 8dB in buffer so the cable runs need to be short.i should have gone with a motherboard that supported MCIO ports and used twinax cables to reduce loss. Just putting new cables in now but works great on 4 cards but I should be able to do about 2 to 3 epochs a day on 6 GOU cards.
This flies running models as well and you can also run multiple models simultaneously. So vision chat and robot control can all run at same time or I can run very large models when needed.
that is quite the system!
Most of us are in this space to learn, keep our minds active, and not necessarily just build robots. Making robots is entertaining and provides a great way to showcase our work to others in a way that they will appreciate. While some may want to build commercial robots and will ultimately license ARC as the engine that drives their robots, others are just happy to use the software and pay a monthly subscription fee.
Using Open Source in robotics is challenging. There are no standards, no documentation, no support, and code changes on a daily basis without warning. This is one of the reasons people gravitate towards ARC because it just works, you get support, and you can focus on building robots instead of trying to integrate some open-source code to work with multiple other open-source packages.
The problem with closed-source cloud hosted apps is that they can be expensive, invasive, and you can lose control of your project and IP. The costs add up as you become dependent on multiple vendors who can increase costs and license terms at will. For example, we have all seen the horror stories of huge GPU, API and cloud bills and look at Adobe's latest license agreement, essentially they want to own rights and control everyone's content.
ARC initially aligned with Microsoft and Windows, and the reasoning behind this is understandable. ARCx will now work with Linux, Windows, Mac, and presumably other OS environments in the future supporting both open and closed source operating systems. The plugins should also work the same way. For Speech Recognition (STT) with ARC, you can use Microsoft, Google, and IBM speech recognition programs, but you can't use open-source options like Whisper. This means that if you create a commercial robot later, you have to pay API fees to one of the IT giants. For Voice (TTS), you can use Azure, Microsoft, IBM, etc but again, no open-source option like Piper. For image creation, you work with DALL-E but not open-source image tools like SDXL.
ARCx has an amazing opportunity to be the glue that brings and holds all of these unique open-source tools together for robotics. When you added that one feature to the OpenAI Chatbot that allowed you to enter aURL for any LLM suddenly ARC could work with any Open Source model. When image recognition was added, and some minor tweaks were made, suddenly ARC could work with Any Open Source vision model. ARCx plugins should work with the same way, with both leading IT providers, but they should also work with Open Source tools. This way, we are free to choose whether we use hosted Pay per use tools or local Open Source tools like TTS, STT, Image recognition, image creation and simulation environments.
For ARCx users It will still be easy to start with the closed-source tools you provide, but if we want to reduce our personal costs or make a commercial robot, Synthiam ARCx can make all of these complex Open Source tools easy to use, install, and configure, Increasing ARCx functionality, reducing total costs while supporting the Open Source communities.