Using OpenAI Skill with other products

PRO

Nink

Canada

Asked Jun 2024

Resolved by DJ Sures!

Skip to comments Jump to end

OK great I noticed GPT4o says give me any image in any format and I will work it out where everyone else wants base64

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg";,
            "detail": "high"
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0].message.content)

Related Controls OpenAI Dall-e Image OpenAI ChatGPT Inverse Kinematic Arm

Jump to end

Upgrade to ARC Pro

Synthiam ARC Pro is a new tool that will help unleash your creativity with programming robots in just seconds!

Compare Pro Features View Subscription Plans

Nink

PRO

Canada

#1 Jun 2024

Curious is the image sent as a JPG PNG etc or is it converted to base64 and sent as a text file It looks like LM Studio will only take a photo in Base64 format when GPT4o will take a png or jpg

DJ Sures

PRO

Synthiam

#2 Jun 2024 — Edited Jun 2024

jpeg binary encoded to ascii via base64 (open ai specification)

Synthiam Support

Canada

#3 Jun 2024

That example is for a url - which you do not have a web server. If you hosted a web server on the internet with images, you could use that example. Instead, the proper usage is base 64 encoding the binary and including it with the message.

Additionally, the message json is assembled by open ai’s api. The message is not formatted and created by the robot skill, as it’s using their sdk api for their standard. Because the message works with open ai, we can assume the third party system that you’re using has issues.

DJ Sures

PRO

Synthiam

#4 Jun 2024

I think this conversation is starting to get off topic as it's about third party products using the same open ai protocol. I'll make a new thread for it

DJ Sures

PRO

Synthiam

#5 Jun 2024

Okay here we go.... Let me see. This is how the image is sent using the SDK for the Open AI API...

        using (var api = new OpenAIService(aiOptions)) {

          var chat = new ChatCompletionCreateRequest();
          chat.Messages = new List();

          chat.Messages.Add(new ChatMessage() {
            Role = "user",
            Contents = new List() {
               OpenAI.ObjectModels.RequestModels.MessageContent.ImageBinaryContent(_cameraImage, "JPEG")
            }
          });

          chat.Temperature = Convert.ToInt32(_cf.STORAGE[ConfigTitles.SETTING_TEMPERATURE]) / 10f;

          if (_cf.STORAGE[ConfigTitles.MODEL].ToString().StartsWith("other", StringComparison.InvariantCultureIgnoreCase))
            chat.Model = _cf.STORAGE[ConfigTitles.MODEL_OTHER].ToString();
          else
            chat.Model = _cf.STORAGE[ConfigTitles.MODEL].ToString();

          // Setup system message
          // -------------------------------------------------------------------------        
          if (string.IsNullOrWhiteSpace(requestStr)) {

            chat.Messages.Add(ChatMessage.FromSystem("Describe this image"));
          } else {

            chat.Messages.Add(ChatMessage.FromSystem(requestStr));
          }

          // Send open ai message and get response
          // -------------------------------------------------------------------------
          var responsePhrase = await api.ChatCompletion.CreateCompletion(chat);

Like synthiam support says - there's no way the JSON is "Created manually by the robot skill". The API has a specification for the JSON format, and the SDK fulfills that specification; both are by Open AI. The output of the SDK will be a formatted document that the Open AI API requires.

If you're using a third-party product that claims to be compatible with Open AI, I'd challenge them that something isn't compatible.

Nink

PRO

Canada

#6 Jun 2024

OK thanks I don't know C# but looking at the code it appears it is sending this as a binary encoded image and not a base64 image that my tool wants to receive.

OpenAI.ObjectModels.RequestModels.MessageContent.ImageBinaryContent(_cameraImage, "JPEG")

OpenAI Python Example Base64

 return base64.b64encode(image_file.read()).decode('utf-8')

https://platform.openai.com/docs/guides/vision

DJ Sures

PRO

Synthiam

#7 Jun 2024

No - it means taking an IMAGE in BINARY FORMAT. It's essentially the same command that your Python is showing. Python is a different language so that the commands will be different. Also, it appears that Python isn't using an open AI SDK for the API.

This is the OPEN AI command code you're asking about.

    /// 
    ///    Static helper method to create MessageContent from binary image
    ///    OpenAI currently supports PNG, JPEG, WEBP, and non-animated GIF
    /// The image binary data as byte array
    /// The type of image
    /// The detail property
    /// 
    public static MessageContent ImageBinaryContent(
        byte[] binaryImage,
        string imageType,
        string? detail = "auto"
    )
    {
        return new()
        {
            Type = "image_url",
            ImageUrl = new()
            {
                Url = string.Format(
                    "data:image/{0};base64,{1}",
                    imageType,
                    Convert.ToBase64String(binaryImage)
                ),
                Detail = detail
            }
        };
    }

DJ Sures

PRO

Synthiam

#8 Jun 2024 — Edited Jun 2024

I asked the open ai robot skill to be updated i noticed it was using uppercase JPEG and should be lower case although that shouldn't matter. but maybe your open source thing does.

IE it was

"data:image/JPEG;base64,{1}",

and is now

"data:image/jpeg;base64,{1}",

shrug