Using OpenAI Skill with other products

PRO

Nink

Canada

Asked Jun 2024

Resolved by DJ Sures!

Skip to comments Jump to end

OK great I noticed GPT4o says give me any image in any format and I will work it out where everyone else wants base64

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg";,
            "detail": "high"
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0].message.content)

Related Controls OpenAI Dall-e Image OpenAI Chat GPT Inverse Kinematic Arm

Jump to end

Upgrade to ARC Pro

Discover the limitless potential of robot programming with Synthiam ARC Pro – where innovation and creativity meet seamlessly.

Compare Pro Features View Subscription Plans

Nink

PRO

Canada

#17 Jun 2024

I really need to do both. @athena can you provide an overview on the process and advantages of first embedding robotics and ARC domain specific data into an existing model like llama-3 and then fine tuning the model on this new embedded data.

Athena AI

PRO

Canada

#18 Jun 2024

Certainly! Embedding and fine-tuning are two critical processes in enhancing the capabilities of a pre-trained language model like LLaMA-3 for specific tasks, such as robotics and ARC (Advanced Robotics Control). Here's an overview of both processes and their advantages:

Embedding Robotics and ARC Domain Data

1. Embedding: Embedding involves adding new information to the existing model without altering its core structure. This process helps the model understand and generate responses related to specific domains, such as robotics and ARC.

Process:

Data Collection: Gather a comprehensive dataset related to robotics and ARC. This can include research papers, technical documentation, tutorials, and user manuals.
Preprocessing: Clean and preprocess the data to ensure it is in a format suitable for embedding. This may involve tokenization, normalization, and removing irrelevant information.
Embedding: Use the preprocessed data to create embeddings that represent the new information. These embeddings are then integrated into the existing model.

Advantages:

Domain-Specific Knowledge: Embedding allows the model to understand and generate responses related to specific domains without altering its core structure.
Efficiency: Embedding is generally faster and requires less computational resources compared to fine-tuning.
Flexibility: You can embed multiple domains into a single model, making it versatile for various applications.

Fine-Tuning the Model on Embedded Data

2. Fine-Tuning: Fine-tuning involves adjusting the weights of the pre-trained model based on the new embedded data. This process helps the model generate more accurate and contextually relevant responses for specific tasks.

Process:

Data Preparation: Use the embedded data to create a fine-tuning dataset. This dataset should include input-output pairs that represent the desired behavior of the model.
Training: Fine-tune the model using the prepared dataset. This involves adjusting the model's weights to minimize the error between the predicted and actual outputs.
Evaluation: Evaluate the fine-tuned model to ensure it performs well on the specific tasks. This may involve using validation datasets and performance metrics.

Advantages:

Improved Accuracy: Fine-tuning helps the model generate more accurate and contextually relevant responses for specific tasks.
Task-Specific Optimization: Fine-tuning allows you to optimize the model for specific tasks, such as controlling servos or interpreting sensor data in robotics.
Enhanced Performance: Fine-tuning can significantly improve the model's performance on domain-specific tasks compared to the pre-trained model.

Combining Embedding and Fine-Tuning

Combining embedding and fine-tuning provides a powerful approach to enhancing a pre-trained model for specific domains and tasks. Here's how you can do it:

Embed Domain Data: Start by embedding the robotics and ARC domain data into the pre-trained model. This provides the model with the necessary knowledge to understand and generate responses related to these domains.
Fine-Tune on Embedded Data: Fine-tune the model using the embedded data to optimize its performance for specific tasks. This helps the model generate more accurate and contextually relevant responses.

Practical Steps for Embedding and Fine-Tuning

Collect Data:
- Gather a comprehensive dataset related to robotics and ARC, including research papers, technical documentation, tutorials, and user manuals.
Preprocess Data:
- Clean and preprocess the data to ensure it is in a suitable format for embedding and fine-tuning.
Embed Data:
- Use the preprocessed data to create embeddings and integrate them into the pre-trained model.
Prepare Fine-Tuning Dataset:
- Create a fine-tuning dataset using the embedded data, including input-output pairs that represent the desired behavior of the model.
Fine-Tune Model:
- Fine-tune the model using the prepared dataset to optimize its performance for specific tasks.
Evaluate Model:
- Evaluate the fine-tuned model using validation datasets and performance metrics to ensure it performs well on the specific tasks.

Conclusion

By embedding robotics and ARC domain data into an existing model like LLaMA-3 and then fine-tuning the model on this new embedded data, you can significantly enhance the model's capabilities for specific tasks. This approach provides a powerful and efficient way to leverage pre-trained models for domain-specific applications in robotics and ARC.

If you need further assistance or specific examples, feel free to ask!

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

DJ Sures

PRO

Synthiam

#19 Jun 2024

You’re over complicating things with fine tuning. Just because there’s several functions for working with llm’s doesn’t mean you have to use all of them.

you only need to embed

but - you also want to practice prompting. Prompting is the only time that micromanaging is beneficial. You need the prompt to be very specific and direct without leaving any assumptions. If you leave assumptions, expect the unexpected - which is based on your training data and the randomness of the creativity value.

I don’t know how to suggest shoehorning the open source stuff you’re working with to compete with OpenAI. There’s a reason one is free and the other employs the most talented experts. But I do love an underdog story - I just don’t bet my money on it

Because open ai has the highest amount of computing power on earth - you’ll have a difficult time finding compatible and flexible models.

My recommendation - and I know you don’t value open ai’s cost vs result - but my recommendation would be to practice with open ai for your inverse kinematics. Our results have been outstanding.

and it really doesn’t require programming. It requires strict instructions as I explained above- which is what humans love to do- give orders haha.

DJ Sures

PRO

Synthiam

#20 Jun 2024

Oh one other thing - is that prioritizing domain knowledge via internal fine-tuning-type processes in newer gpt’s use nlp. So that’s a big one if you’re asking to fine tune or embed for multiple tasks as you listed.

if you just wanted to do one task, then you could simplify the process without needing nlp.

Put it this way, when you want your lawn mowed and plumbing fixed, you call different professionals.

Nink

PRO

Canada

#21 Jun 2024

I am building a server with 6 Nvidia RTX 3090 GPUs with NVLINKs with RODEM8-T2 server Mobo and a 128 lane EPYC CPU with 256GB of RAM for the purpose of training models.

Nink

PRO

Canada

#22 Jun 2024

I am struggling with the riser cables for the last 2 cards at the moment as I can only get them to run at PCIe 3.0 16x but I just got some new PCIe 5.0 cables so hopefully they can run at 4.0 full speed. Problem is you lose about 1.5db for every inch of PCIe traces and you really only have about 8dB in buffer so the cable runs need to be short.i should have gone with a motherboard that supported MCIO ports and used twinax cables to reduce loss. Just putting new cables in now but works great on 4 cards but I should be able to do about 2 to 3 epochs a day on 6 GOU cards.

This flies running models as well and you can also run multiple models simultaneously. So vision chat and robot control can all run at same time or I can run very large models when needed.

DJ Sures

PRO

Synthiam

#23 Jun 2024

that is quite the system!

Nink

PRO

Canada

#24 Jun 2024 — Edited Jun 2024

Most of us are in this space to learn, keep our minds active, and not necessarily just build robots. Making robots is entertaining and provides a great way to showcase our work to others in a way that they will appreciate. While some may want to build commercial robots and will ultimately license ARC as the engine that drives their robots, others are just happy to use the software and pay a monthly subscription fee.

Using Open Source in robotics is challenging. There are no standards, no documentation, no support, and code changes on a daily basis without warning. This is one of the reasons people gravitate towards ARC because it just works, you get support, and you can focus on building robots instead of trying to integrate some open-source code to work with multiple other open-source packages.

The problem with closed-source cloud hosted apps is that they can be expensive, invasive, and you can lose control of your project and IP. The costs add up as you become dependent on multiple vendors who can increase costs and license terms at will. For example, we have all seen the horror stories of huge GPU, API and cloud bills and look at Adobe's latest license agreement, essentially they want to own rights and control everyone's content.

ARC initially aligned with Microsoft and Windows, and the reasoning behind this is understandable. ARCx will now work with Linux, Windows, Mac, and presumably other OS environments in the future supporting both open and closed source operating systems. The plugins should also work the same way. For Speech Recognition (STT) with ARC, you can use Microsoft, Google, and IBM speech recognition programs, but you can't use open-source options like Whisper. This means that if you create a commercial robot later, you have to pay API fees to one of the IT giants. For Voice (TTS), you can use Azure, Microsoft, IBM, etc but again, no open-source option like Piper. For image creation, you work with DALL-E but not open-source image tools like SDXL.

ARCx has an amazing opportunity to be the glue that brings and holds all of these unique open-source tools together for robotics. When you added that one feature to the OpenAI Chatbot that allowed you to enter aURL for any LLM suddenly ARC could work with any Open Source model. When image recognition was added, and some minor tweaks were made, suddenly ARC could work with Any Open Source vision model. ARCx plugins should work with the same way, with both leading IT providers, but they should also work with Open Source tools. This way, we are free to choose whether we use hosted Pay per use tools or local Open Source tools like TTS, STT, Image recognition, image creation and simulation environments.

For ARCx users It will still be easy to start with the closed-source tools you provide, but if we want to reduce our personal costs or make a commercial robot, Synthiam ARCx can make all of these complex Open Source tools easy to use, install, and configure, Increasing ARCx functionality, reducing total costs while supporting the Open Source communities.

Nink