Canada
Asked
Resolved Resolved by DJ Sures!

Using Openai Skill With Other Products

OK great I noticed GPT4o says give me any image in any format and I will work it out where everyone else wants base64

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg";,
            "detail": "high"
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0].message.content)


ARC Pro

Upgrade to ARC Pro

Discover the limitless potential of robot programming with Synthiam ARC Pro – where innovation and creativity meet seamlessly.

PRO
Canada
#1  

Curious is the image sent as a JPG PNG etc or is it converted to base64 and sent as a text file  It looks like LM Studio will only take a photo in Base64 format when GPT4o will take a png or jpg

PRO
Synthiam
#2   — Edited

jpeg binary encoded to ascii via base64 (open ai specification)

#3  

That example is for a url - which you do not have a web server. If you hosted a web server on the internet with images, you could use that example. Instead, the proper usage is base 64 encoding the binary and including it with the message.

Additionally, the message json is assembled by open ai’s api. The message is not formatted and created by the robot skill, as it’s using their sdk api for their standard. Because the message works with open ai, we can assume the third party system that you’re using has issues.

PRO
Synthiam
#4  

I think this conversation is starting to get off topic as it's about third party products using the same open ai protocol. I'll make a new thread for it

PRO
Synthiam
#5  

Okay here we go.... Let me see. This is how the image is sent using the SDK for the Open AI API...

        using (var api = new OpenAIService(aiOptions)) {

          var chat = new ChatCompletionCreateRequest();
          chat.Messages = new List();

          chat.Messages.Add(new ChatMessage() {
            Role = "user",
            Contents = new List() {
               OpenAI.ObjectModels.RequestModels.MessageContent.ImageBinaryContent(_cameraImage, "JPEG")
            }
          });

          chat.Temperature = Convert.ToInt32(_cf.STORAGE[ConfigTitles.SETTING_TEMPERATURE]) / 10f;

          if (_cf.STORAGE[ConfigTitles.MODEL].ToString().StartsWith("other", StringComparison.InvariantCultureIgnoreCase))
            chat.Model = _cf.STORAGE[ConfigTitles.MODEL_OTHER].ToString();
          else
            chat.Model = _cf.STORAGE[ConfigTitles.MODEL].ToString();

          // Setup system message
          // -------------------------------------------------------------------------        
          if (string.IsNullOrWhiteSpace(requestStr)) {

            chat.Messages.Add(ChatMessage.FromSystem("Describe this image"));
          } else {

            chat.Messages.Add(ChatMessage.FromSystem(requestStr));
          }

          // Send open ai message and get response
          // -------------------------------------------------------------------------
          var responsePhrase = await api.ChatCompletion.CreateCompletion(chat);

Like synthiam support says - there's no way the JSON is "Created manually by the robot skill". The API has a specification for the JSON format, and the SDK fulfills that specification; both are by Open AI. The output of the SDK will be a formatted document that the Open AI API requires.

If you're using a third-party product that claims to be compatible with Open AI, I'd challenge them that something isn't compatible.

PRO
Canada
#6  

OK thanks I don't know C# but looking at the code it appears it is sending this as a binary encoded image and not a base64 image that my tool wants to receive.

OpenAI.ObjectModels.RequestModels.MessageContent.ImageBinaryContent(_cameraImage, "JPEG")

OpenAI Python Example Base64

 return base64.b64encode(image_file.read()).decode('utf-8')

https://platform.openai.com/docs/guides/vision

PRO
Synthiam
#7  

No - it means taking an IMAGE in BINARY FORMAT. It's essentially the same command that your Python is showing. Python is a different language so that the commands will be different. Also, it appears that Python isn't using an open AI SDK for the API.

This is the OPEN AI command code you're asking about.

    /// 
    ///    Static helper method to create MessageContent from binary image
    ///    OpenAI currently supports PNG, JPEG, WEBP, and non-animated GIF
    /// The image binary data as byte array
    /// The type of image
    /// The detail property
    /// 
    public static MessageContent ImageBinaryContent(
        byte[] binaryImage,
        string imageType,
        string? detail = "auto"
    )
    {
        return new()
        {
            Type = "image_url",
            ImageUrl = new()
            {
                Url = string.Format(
                    "data:image/{0};base64,{1}",
                    imageType,
                    Convert.ToBase64String(binaryImage)
                ),
                Detail = detail
            }
        };
    }
PRO
Synthiam
#8   — Edited

I asked the open ai robot skill to be updated i noticed it was using uppercase JPEG and should be lower case although that shouldn't matter. but maybe your open source thing does.

IE it was

"data:image/JPEG;base64,{1}",

and is now

"data:image/jpeg;base64,{1}",

shrug