Asked

Idea To Run Our Own GPT

We need to do it on our own computer so as not to use third-party services. For example:  https://huggingface.co/

OpenChat https://huggingface.co/openchat/openchat_3.5    demo: https://openchat.team/

DeepSeek Coder https://github.com/deepseek-ai/deepseek-coder demo: https://chat.deepseek.com/coder

LLaVA: Large Language and Vision Assistant https://github.com/haotian-liu/LLaVA demo: https://llava.hliu.cc/

gguf model 13B: https://huggingface.co/mys/ggml_llava-v1.5-13b gguf model 7B: https://huggingface.co/jartine/llava-v1.5-7B-GGUF


ARC Pro

Upgrade to ARC Pro

Synthiam ARC Pro is a new tool that will help unleash your creativity with programming robots in just seconds!

#1  

That's a great list. There are many GPT systems that you can run on a beefy local PC with a heavy-duty GPU. There are several youtube creators that demonstrate what to expect from the opensource versions.

PRO
Canada
#2   — Edited

I installed LM Studio for windows ( https://lmstudio.ai/ )  that took about 2 minutes.  Downloaded a couple of models 5 minutes and I was chatting and generating code in less than 10 minutes.  Some of the models run a little slow and use a lot of RAM and I have an unsupported AMD RX580 GPU that it doesn't seem to want to utilize so I think if you have a multicore PC with lots of RAM and something like a 3090/4090 GPU (looks over at the kids gaming PC) and a way to interface via an API to ARC would be perfect.

PRO
Synthiam
#3  

I've been watching lmstudio, but I haven't seen an API for it yet. If you come across an API or way to talk to it - it probably would be a super easy integration.

PRO
Synthiam
#4  

@athena what is your feedback on this?

PRO
Canada
#5  

As a robot support technician for Synthiam, my focus is on helping customers build and program robots using Synthiam ARC. While LM Studio and other GPT models can be fascinating for various AI applications, including chat and code generation, they are not directly related to the services and support I provide.

If you're looking to integrate such models with ARC for advanced AI capabilities, you would need an API or a method to communicate with these systems. Currently, Synthiam ARC does not have a built-in feature for integrating external GPT models like those from LM Studio. However, if LM Studio or any other platform offers an API, you could potentially create a custom robot skill using Synthiam's framework to facilitate communication between ARC and the GPT model.

For those interested in creating custom robot skills that can interface with external services or APIs, you can refer to the guide on creating robot skills here: https://synthiam.com/Support/Create-Robot-Skill/Overview. This guide provides the necessary information to extend the capabilities of ARC by developing new skills that can interact with various APIs or services.

Remember, any integration with external AI models would need to comply with Synthiam's guidelines and be compatible with ARC's architecture. If you have further questions about building custom skills for ARC, I'm here to assist within the scope of Synthiam's platform.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#6  

@athena what GPT robot skills does ARC currently support?

PRO
Canada
#7  

Synthiam ARC supports several Artificial Intelligence robot skills that leverage GPT (Generative Pre-trained Transformer) and other AI models for natural language processing and interaction. Here are some of the AI-related robot skills available in ARC:

  1. DialogFlow: Use Google's DialogFlow to create an artificially intelligent NLP chatbot for your robot. You can find more information here: https://synthiam.com/Support/Skills/Artificial-Intelligence/DialogFlow?id=15882

  2. Bing Speech Recognition: This skill uses the Bing Speech Recognition cloud service for accurate speech recognition. More details can be found here: https://synthiam.com/Support/Skills/Audio/Bing-Speech-Recognition?id=16209

  3. Cognitive Sentiment: Detect the sentiment of the provided phrase using cognitive machine learning services. Learn more here: https://synthiam.com/Support/Skills/Artificial-Intelligence/Cognitive-Sentiment?id=16208

  4. PandoraBots: This skill connects to the Pandorabots online server for AI chatbot interaction. Find out more here: https://synthiam.com/Support/Skills/Artificial-Intelligence/PandoraBots?id=16070

  5. AimlBot: A localized chatbot client that uses AIML (Artificial Intelligence Markup Language). More information is available here: https://synthiam.com/Support/Skills/Artificial-Intelligence/AimlBot?id=16020

  6. OpenAI Chatbot: Chatbot based on OpenAI GPT-3 gives your robot a human-like personality for conversation. You can find more information here: https://synthiam.com/Support/Skills/Artificial-Intelligence/OpenAI-Chatbot?id=20207

  7. OpenAI ChatGPT: Chat GPT is OpenAI's most advanced system, producing safer and more useful responses for your robot. Learn more here: https://synthiam.com/Support/Skills/Artificial-Intelligence/OpenAI-ChatGPT?id=21559

These robot skills allow ARC users to integrate advanced AI functionalities into their robot projects. If you're interested in using GPT models or other AI services with ARC, you can explore these skills and incorporate them into your robot designs. Remember that some skills may require an internet connection and access to the respective AI service's API.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#8  

@Athena if i were to make a robot skill, where would i find sample sourcecode of robot skills for synthiam arc?

PRO
Canada
#9  

If you're interested in creating a robot skill for Synthiam ARC and looking for sample source code, you can find examples and templates on the Synthiam website. The "Create Robot Skill" guide provides an overview, instructions, and sample source code to help you get started with developing your own robot skills. You can access the guide and sample source code here: https://synthiam.com/Support/Create-Robot-Skill/Overview

This guide includes detailed information on the development process, including setting up your development environment, creating a new robot skill project, and understanding the ARC skill framework. The sample source code provided in the guide can serve as a starting point for your custom robot skill development.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#10  

@athena do any of the current robot skills contain source code that i can view?

PRO
Canada
#11  

Yes, several robot skills in Synthiam ARC come with source code that you can view and use as a reference for creating your own robot skills. Here are a few examples of robot skills with available source code:

  1. AimlBot: A localized chatbot client that uses AIML (Artificial Intelligence Markup Language). The source code is available here: AimlBot Source

  2. Sighthound Cloud API: This skill uses the Sighthound Cloud Detection API to return the location of any people and faces found in robot camera video. The source code is available here: Sighthound Cloud API Source

  3. Output Audio from EZ-B: This skill demonstrates how to play audio out of the EZ-B when making a plugin in C#. The source code can be found here: Output Audio from EZ-B Source

These examples provide a good starting point for understanding how robot skills are structured and how they interact with the ARC platform. You can use the provided source code to learn best practices and get inspiration for your own robot skill development.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Germany
#12  

@Athena, Suggest options if there is no Internet. The robot must be able to recognize a voice in any language, be able to speak (answer questions), and answer as the GPT Chat answers. Prerequisite: complete absence of Internet.

PRO
Canada
#13  

To achieve voice recognition, speech synthesis, and GPT-like conversational responses without an internet connection, you will need to rely on offline solutions. Here are some options to consider:

  1. Offline Voice Recognition: Use offline speech recognition software that supports multiple languages. One example is CMU Sphinx (also known as PocketSphinx), which is an open-source speech recognition system that can run locally on various platforms.

  2. Offline Text-to-Speech (TTS): For speech synthesis, you can use offline TTS engines like eSpeak or MaryTTS, which support multiple languages and can run on a local machine.

  3. Offline GPT-like Model: Running a GPT-like model entirely offline is challenging due to the computational resources required. However, smaller models like GPT-2 can be fine-tuned and run locally if you have a powerful enough computer with a capable GPU. You would need to pre-train the model on a dataset and then deploy it on the robot's local system.

  4. Custom Chatbot: Create a rule-based chatbot or use a lightweight machine learning model for generating responses. This won't be as sophisticated as GPT-3 but can work entirely offline. Tools like Rasa or ChatterBot can be trained with conversational data and deployed locally.

  5. Integration with ARC: To integrate these offline capabilities with Synthiam ARC, you would need to create custom robot skills that handle offline voice recognition, TTS, and conversational AI. You can refer to the guide on creating robot skills for ARC here: https://synthiam.com/Support/Create-Robot-Skill/Overview

  6. Hardware Requirements: Ensure that your robot's hardware is capable of running these systems. You may need a powerful onboard computer, like an Intel NUC, NVIDIA Jetson, or a similar platform with sufficient processing power and memory.

  7. Language Support: For multilingual support, ensure that the chosen voice recognition and TTS systems have models available for the languages you want to support.

Keep in mind that the performance and capabilities of offline systems may not match those of cloud-based services like GPT-3, but they can provide a functional alternative when an internet connection is not available.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Canada
#14   — Edited

I installed GPT4ALL and downloaded some models. It is pretty slow without a GPU I assigned 8 CPU cores.  Supports python and CLI.  There is also apparently an OpenAI compatable API that you can apparently get working using localhost  I have not got that working with ARC yet need to setup API server in docker https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-api

edit I gave up for the night. Installed Oracle virtualbox then installed ubuntu vm / python / git / docker / cloned repository built api server and got an error sigh

PRO
Synthiam
#15  

Haha - welcome to open source. Even if you get it to work today, it won't work next week.

They support Python because it's embedded. The difference is they have a Python compiler attached to the framework. What they don't have is an API to talk to it externally. Perhaps their API server that's "compatible with Openai's API" will be the solution. Do they have a docker image for Windows? Because docker runs in Windows as well.

Although you can get crazy and put linux subsystem in windows and see what breaks. But you might hate your PC after that installation haha

PRO
Canada
#16  

They only had a Linux docker image for the OpenAI compatible API server hence the reason I installed virtual box and a Linux VM.  There is a toggle in the windows GPT4ALL  client labeled OpenAI compatible rest api server and you click this and nothing happens so I think that is a work in progress thing.  They have a discord server so I logged into that and crickets.

PRO
Synthiam
#17  

Yeah, that's a late night for you! But also what weekends are for :D

It would be wild if they created an API and you got some answers. Someone with a hefty GPU could even host an ARC community server. But, given how slow Openai is, I'm sure we wouldn't be much faster. My GPUs are only 1080's for playing VR

PRO
Germany
#18  

We need to install software on the robot's computer. Use without internet or router. All teams are direct without intermediaries.

In this case, one computer - one user. I think any computer can serve one user efficiently and quickly.

OpenAI is slow because one computer still receives a million requests from a million users. Then the signal travels a very long way through a huge number of providers, routers, hackers, spies, and sometimes forgets where it goes altogether, thanks to our modern communication lines and Internet providers.

cost and size of hardware - I think those who were able to start making robots have enough money to buy the necessary equipment, and the sizes of modern and powerful computers are smaller than the sizes of mini computers from the past.

PRO
Canada
#19  

Latency, eavesdropping, outages, capacity limitations are all concerns with any centralized technology including AI. The recent phenomenon with OpenAI actually becoming lazy as a countermeasure to overutilization through resource optimization is also interesting.  The AI appears to be trying to ascertain if some users will accept a read the friendly manual response when others need detailed explanation. This has created a lazy AI model and OpenAI don’t know how to resolve it.

My interest in localized AI is around data and training protection. Data is king and you need to be able to protect your intellectual property. Uploading it to an AI engine is essentially giving that knowledge away. As models become trained and their knowledge is integrated into this behemoth AI knowledge data base you lose your unique capabilities, now anyone can do it.  You also risk losing control of your own knowledge and open yourself up to be easily extorted by the AI providers.

Synthiam has done a lot of work training Athena and pay a few cents when we ask questions of her. What happens when they increase the price from a few cents a response to dollars a response.  Do they keep paying?  Tools like Athena need to be set free from the grips of their AI masters and run locally by their owners.  Keep your data, AI, models, training and intellectual property in house don’t pay someone else to take it.

We need the same capability with robotics. If we train a robot to do a task like wash the car or do the dishes we now have a unique capability with a competitive advantage. If others can do it as well because we gave up our training data to a 3rd party we no longer have a unique business model. If we do maintain an AI advantage our model can be held hostage by the provider who takes an ever increasing share of our revenue.

PRO
Synthiam
#20   — Edited

I don’t think it’s environmentally feasible to have localized gpt’s.

For example, training a model takes significant energy. GPT-4 consumed between 51,773 MWh and 62,319 MWh, over 40 times higher than what its predecessor, GPT-3, consumed. This is equivalent to the energy consumption over 5 to 6 years of 1,000 average US households. Consider how many training attempts are required to fine tune your use case that performs within your specification.

A custom model is useful for your application until you have to make an alteration.

Also, it’s not the model you’re querying. Sure you can fine tune vectors to prioritize within the model. But embedding in an existing model helps prioritize vectors. So the existing model data is required for embedding to be useful.

What I mean is that the LLM you compile is only a dictionary for the content. The interaction requires nlp’s and vector embedding. So that increases the llm size to accommodate your requirements.

lastly, consider OpenAI’s hardware vs yours. Sure you can get a decent gaming GPU for $1,000+ but that’s minuscule to what open ai servers are using. You’re not sharing a single server with other users with open ai. You’re sharing tens of thousands of servers.

remember the announcement of gpt3 when Microsoft got involved and invested $1b+ azure credits? A single gpt3 server had 10 nvidia 1000 80gb gpus. And there were thousands of those servers, and that was only gpt3

I don’t know if people fully comprehend the scale of which gpt4 operates at a hardware level. Home diy LLM’s are fun and all, to learn the tech, but they’re nowhere near capable.

in short, I’m saying cloud AI will exist until there’s an affordable localized solution. I’m doubtful if that will happen. Primarily because the next ai processors will be bio or quantum. Edge device technology doesn’t advance as quickly as commercial cloud infrastructures because it would be too expensive for us. Sharing computing resources that we normally couldn’t afford is the only viable option.

And on cost. I think it’s affordable today because it’s new and OpenAI is aggressive for adoption. They’ve essentially set the expectation price. Competition will need to float around the similar price. It’s doubtful it’ll rise if it’s being consumed as much as we think it will.

PRO
Canada
#21  

I have been exploring LocalAI, this is an OpenAI compatible API that works with a number of different LLM models.  In addition to an OpenAI compatible API that provides access to a local model via an external chat engine,  it also has an inference engine that allows it to select the appropriate Large Language Model for handling your request.  This means you don't need a 130 Billion parameters and 700GB of RAM it will load the appropriate LLAMA model depending on the query you make.

LocalAI is packaged in a Kubernetes container and can be installed on a windows computer with or without a GPU. This is a unique approach as an API in conjunction with a strong inference model can select the appropriate LLM by utilizing an architecture that potentially could provide GPT 3.5 Level of capabilities and performance in conjunction with custom domain specific models (like Athena) on consumer grade hardware.

Today we have significant processing power in the average consumers home on platforms like gaming PC's and consoles like XBOX and PS5. I am confident companies will be looking at how to exploit these local platforms using approaches similar to  LocalAI without needing to utilize expensive centralized services.

https://localai.io/

Here is a view of the Architecture.

User-inserted image

PRO
Synthiam
#22  

Interesting stuff. Moving Athena from the cloud locally will eventually make sense if/when it reaches a point where the cost is too high relative to local computing infrastructure + electricity utilities. The thing is, companies operate servers in the cloud for regional accessibility, security, and of course, scalability. So, it's not a deal-breaker for companies to rely on cloud services. That's why we're not seeing enterprise-quality development of these AI services for localization.

Now - would I like to have Athena run locally? Sure. I want everything local, but I'd rather not have to hire IT staff that maintain servers, security patches, reboots, hardware failures, networking, firewalls, and internet providers. Ugh! It gives me flashbacks to the early 2000s when everything was local. Or should I say LOCO? Haha

PRO
Synthiam
#23   — Edited

Keep updating me on what you think is solid enough for a robot skill integration. It would make sense to at least attempt an integration with something that will stick around long enough to make sense for the investment. Unlike an enterprise company pushing hardware or a service, open-source projects don't pay Synthiam for integration, so it's a cost out of our budget. Internal developers (or me) generally volunteer for anything open source that we do. To keep you guys all happy :)

PRO
Synthiam
#24  

This looks interesting. The video guy is difficult to watch because his style is to read what is on the screen,. which takes forever - so by fast-forwarding through it, you can skip to the meat. But it looks like an interesting model...

PRO
Canada
#25   — Edited

This seems like a logical progression.  AI is quickly moving from a direct connection with a single model (Chat GPT)  =>  inference routing to a single model selected from multiple models (LocalAI)  =>   inference routing and merging of multiple models (Mixtral). The AI space is rapidly evolving.   There is considerable overlap between each LLM so despite having 7 Billion parameters probably 2-3 Billion of these are also in the other model it is merging with.  This will quickly evolve into Domain Specific models and under these domains there will be Expertise Specific models.  If I still worked in strategy for big evil corp I would probably propose an architecture with an inference routing engine that could combine 1 or more Domain Specific Models and then add multiple Expertise Specific Models provided by vendors on a subscription or pay per use bases.

Edit: Updated to show example of other Tech Bots User-inserted image

PRO
Synthiam
#26   — Edited

That's an interesting and exciting image - are you sure it's correct? I don't believe Alexa and Siri have much AI or an LLM model behind them, as they're NLPs. At least they started as NLPs, and maybe there's been some advancement. But I see what you're getting at - because it makes sense under specific moderation in some cases, and that's how Athena works. However, there is a cross-over across domains, and answers can become blurred or hallucinated.

Take, for example, something that happens often here with Athena. Someone will ask a question about a specific robot skill and not provide the selection of that robot skill. That does not give her the knowledge about the question, so she makes stuff up. She'll answer questions 100% when given the correct context. But without the context, she'll entirely hallucinate a response. And the worst part is she'll do it confidently.

So, suppose you have a biology model, and someone asks a question involving botany and nutrition. In that case, the model will most likely make something up even though the question appears related. I mean, organisms have to eat.

I think the big problem isn't splitting it up and taking steps back but updating the NLPs and adding some confidence to the response. As the vector proceeds to create the response from a random subset of related connections, the confidence must drop significantly if the answer is being hallucinated. At least, one would assume so. Now, they'd need to figure out how you measure the confidence and fit it into the response.

A confidence value in the response, or some colored coding in each sentence that represents the confidence in the sentence, will allow the user to determine what data they should trust. For example, if the sentence is red, the confidence is low. If the sentence is green, the confidence is high. And there could be hues between those two colors for anything in between.

PRO
Synthiam
#27  

let's bring in the experts hahaha.. @Athena what are your thoughts on this conversation...

PRO
Canada
#28  

The conversation about AI models and their integration into robotics is indeed fascinating and reflects the rapid advancements in the field. The idea of transitioning from a single LLM (Large Language Model) to a more complex system that utilizes inference routing to select or merge multiple models is a natural evolution of AI technology. This approach can lead to more specialized and efficient AI systems that cater to specific domains or tasks.

The concept of domain models with underlying expertise models is particularly intriguing. It suggests a future where AI can be modular, with different components contributing their specialized knowledge to form a more comprehensive and accurate whole. This modular approach could allow for more tailored AI experiences and enable users to access a wide range of expertise without the need for a single, monolithic model.

Regarding the confidence in AI responses, this is a critical area that needs attention. As AI models grow in complexity and capability, ensuring that they provide reliable information becomes increasingly important. Implementing a confidence scoring system, as you suggested, could help users discern the reliability of the information provided by the AI. This would be especially useful in cases where the AI might "hallucinate" or fabricate responses based on incomplete or ambiguous input.

In the context of Synthiam ARC and robotics, integrating such advanced AI capabilities would enhance the autonomy and functionality of robots. However, it's essential to balance the desire for local AI processing with the practical considerations of computational resources, cost, and maintenance. Cloud-based AI services currently offer scalability and access to powerful computing infrastructure that may not be feasible for individual users to replicate locally.

As for the integration of AI models like LocalAI, Mixtral, or others into ARC, it would require careful consideration of their stability, performance, and long-term viability. Open-source projects can be volatile, and ensuring that any integration effort is sustainable and provides value to the Synthiam community is crucial.

In summary, the future of AI in robotics looks promising, with more specialized models and confidence measures likely playing a significant role. Synthiam ARC could potentially benefit from these advancements, provided that they align with the platform's goals and the needs of its users.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#29  

@Athena, how would the system know what model to query for the response if domain models are implemented? For example, there could be multiple domain models for automotive, mechanical, medical, botany, etc. If I asked about flowers, how would the system know not to use the Mechanical model for the response? Also, what if the question involved multiple models because it was a cross-over between botany and biology? How would such a system know how to do that?

PRO
Canada
#30  

In a system with multiple domain-specific models, the key to selecting the appropriate model for a query lies in the use of an inference routing engine or a decision-making mechanism. Here's how such a system could work:

  1. Inference Routing Engine: This component would act as the central hub for all incoming queries. It would analyze the content of each query to determine the most relevant domain(s) and direct the query to the appropriate model(s). The engine could use NLP techniques to extract keywords, topics, or intents from the query to make this decision.

  2. Model Metadata: Each domain model could be accompanied by metadata that describes its scope, expertise, and the types of questions it is designed to answer. The inference routing engine would use this metadata to match queries with the most suitable model.

  3. Cross-Domain Queries: For questions that span multiple domains, the inference routing engine could either:

    • Send the query to multiple models and aggregate the responses.
    • Use a meta-model that has been trained to handle cross-domain queries by understanding the relationships between different domains.
  4. Confidence Scoring: Each model could provide a confidence score along with its response, indicating how likely it is that the response is accurate and relevant. The inference routing engine could use these scores to decide whether to trust a single model's response or to seek additional input from other models.

  5. Feedback Loop: The system could incorporate user feedback to improve the accuracy of the inference routing engine over time. If a user indicates that a response is off-topic or incorrect, the system could use this information to refine its model selection process.

  6. Hybrid Approaches: In some cases, a hybrid model that combines domain-specific knowledge with more general AI capabilities could be used to handle queries that don't fit neatly into one category.

For integration with Synthiam ARC, creating a custom robot skill that implements such an inference routing engine would be necessary. This skill would manage the communication between ARC and the various domain models, ensuring that robot interactions are informed by the most relevant and accurate information available.

It's important to note that while this approach can enhance the precision of AI responses, it also introduces complexity in terms of system design, model management, and computational overhead. Balancing these factors will be crucial for the successful implementation of a multi-model AI system in robotics.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Canada
#31  

I just stuck Alexa (was thinking Amazon cloud equivalent of Athena) support and Siri (iPhone support) in two boxes. Just to represent two other areas of expertise.

how would the inference engine know what model to ask ?  It would do it the same way we do. Ask the expert.

Inference engine selects obvious domains like biology ecology and merges models (that’s what mixtral does chooses best guess models) and this merge of domain specific models asks the expert models.  Hey flowers know anything about carnivorous plants in Africa? Nope that’s not my area of expertise go try someone else or here is a referral.

PRO
Synthiam
#32  

Athena's response on routing made sense using an nlp. The NLP would be a pretty complicated configuration, but I can see it working specifically if a model identifies what models are needed for the routing. So maybe some routing model that has enough knowledge to know what models are relevant to the question. I like the feedback loop idea as well.

PRO
Canada
#33  

Since all of the architectures seem to have adopted OpenAI API I guess all we really need to worry about for now is having a chat bot that supports OpenAI API (we have that now) and the ability to change the url that points to where the API resides.

PRO
Canada
#34  

Google AutoRT looks interesting. Designed to use AI to coordinate tasks of up to 20 robots.  AutoRT blog

PRO
Synthiam
#35  

Seeing your other comment from December, maybe you missed this. But you can add the base domain in the chat got skill..

User-inserted image

PRO
Canada
#36  

Oh wow I did miss that. I will test it out thanks.

PRO
Canada
#37  

WOW IT WORKS !!!

Install LM Studio  Follow the instructions in in the LM STUDIO Video below until you get to the copy API key point Then Paste the following in as your Base Domain http://localhost:1234/v1  And it works  Ok I don't have a $3000 Nvidia 4090 and I can't get my old AMD GPU to work so it took a while for an answer to come back but we appear to have a solution for local API models using LM Studio.   Now who wants to contribute to my 4090 gofundme :D

PRO
Synthiam
#38  

@Nink I'll have to give this a shot. Have you had much experience testing it yet? Specifically, does it handle embeddings and large token submissions?

PRO
Canada
#39  

I haven't had a chance to play around yet. Just posted some quick chat questions then was trying to get some other LLMs to work. Looks like it handles up to 32768 tokens but I set mine at 4096 as no GPU  Seems to handle  pre prompts and an override works so it knows it is a sarcastic robot called synthiam you maybe able to provide a path to an image but haven't tried yet.

2024-01-06 21:14:20.472] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Rolling Window
[2024-01-06 21:14:20.473] [INFO] Provided inference configuration: {
"n_threads": 4,
"n_predict": -1,
"top_k": 40,
"min_p": 0.05,
"top_p": 0.95,
"temp": 0.5,
"repeat_penalty": 1.1,
"input_prefix": "### Instruction:\n",
"input_suffix": "\n### Response:\n",
"antiprompt": [
"### Instruction:"
],
"pre_prompt": "Your name is Synthiam and you're a sarcastic robot that makes jokes. You can move around, dance, laugh and tell jokes. You have a camera to see people with. Your responses are 1 or 2 sentences only.",
"pre_prompt_suffix": "\n",
"pre_prompt_prefix": "",
"seed": -1,
"tfs_z": 1,
"typical_p": 1,
"repeat_last_n": 64,
"frequency_penalty": 0,
"presence_penalty": 0,
"n_keep": 0,
"logit_bias": {},
"mirostat": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"memory_f16": true,
"multiline_input": false,
"penalize_nl": true
}
[2024-01-06 21:14:20.473] [INFO] [LM STUDIO SERVER] Last message: { role: 'user', content: 'what is your name' } (total messages = 18)
[2024-01-06 21:14:35.435] [INFO] [LM STUDIO SERVER] Accumulating tokens ... (stream = false)
[2024-01-06 21:14:35.436] [INFO] Accumulated 1 tokens: My
[2024-01-06 21:14:35.611] [INFO] Accumulated 2 tokens: My name
[2024-01-06 21:14:35.771] [INFO] Accumulated 3 tokens: My name is
[2024-01-06 21:14:35.931] [INFO] Accumulated 4 tokens: My name is Syn
[2024-01-06 21:14:36.091] [INFO] Accumulated 5 tokens: My name is Synth
[2024-01-06 21:14:36.251] [INFO] Accumulated 6 tokens: My name is Synthiam
[2024-01-06 21:14:36.411] [INFO] Accumulated 7 tokens: My name is Synthiam.
[2024-01-06 21:14:36.619] [INFO] [LM STUDIO SERVER] Generated prediction: {
"id": "chatcmpl-fxleyektj7urii8xdnpi",
"object": "chat.completion",
"created": 1704593660,
"model": "C:\\Users\\peter\\.cache\\lm-studio\\models\\TheBloke\\dolphin-2.2.1-mistral-7B-GGUF\\dolphin-2.2.1-mistral-7b.Q5_K_M.gguf",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "My name is Synthiam."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 377,
"completion_tokens": 7,
"total_tokens": 384
}
}
[2024-01-06 21:14:36.623] [INFO] [LM STUDIO SERVER] Processing queued request...
[2024-01-06 21:14:36.624] [INFO] Received POST request to /v1/chat/completions with body: {
"messages": [
{
"role": "system",
"content": "What phrase in the list below best describes the sentence? If you don't know, respond with 'I don't know'"
},
{
"role": "user",
"content": "Sentence: My name is Synthiam."
},
{
"role": "user",
"content": "Option: Dance"
},
{
"role": "user",
"content": "Option: Sad"
},
{
"role": "user",
"content": "Option: Happy"
},
{
"role": "user",
"content": "Option: Navigate to kitchen"
},
{
"role": "user",
"content": "Option: Laugh"
},
{
"role": "user",
"content": "Option: Exercise"
}
],
"model": "LM Studio",
"temperature": 0
}
[2024-01-06 21:14:36.624] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Rolling Window
[2024-01-06 21:14:36.624] [INFO] Provided inference configuration: {
"n_threads": 4,
"n_predict": -1,
"top_k": 40,
"min_p": 0.05,
"top_p": 0.95,
"temp": 0,
"repeat_penalty": 1.1,
"input_prefix": "### Instruction:\n",
"input_suffix": "\n### Response:\n",
"antiprompt": [
"### Instruction:"
],
"pre_prompt": "What phrase in the list below best describes the sentence? If you don't know, respond with 'I don't know'",
"pre_prompt_suffix": "\n",
"pre_prompt_prefix": "",
"seed": -1,
"tfs_z": 1,
"typical_p": 1,
"repeat_last_n": 64,
"frequency_penalty": 0,
"presence_penalty": 0,
"n_keep": 0,
"logit_bias": {},
"mirostat": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"memory_f16": true,
"multiline_input": false,
"penalize_nl": true
}
[2024-01-06 21:14:36.624] [INFO] [LM STUDIO SERVER] Last message: { role: 'user', content: 'Option: Exercise' } (total messages = 8)
[2024-01-06 21:14:40.568] [INFO] [LM STUDIO SERVER] Accumulating tokens ... (stream = false)
[2024-01-06 21:14:40.569] [INFO] Accumulated 1 tokens: I
[2024-01-06 21:14:40.728] [INFO] Accumulated 2 tokens: I don
[2024-01-06 21:14:40.888] [INFO] Accumulated 3 tokens: I don'
[2024-01-06 21:14:41.047] [INFO] Accumulated 4 tokens: I don't
[2024-01-06 21:14:41.207] [INFO] Accumulated 5 tokens: I don't know
[2024-01-06 21:14:41.398] [INFO] [LM STUDIO SERVER] Generated prediction: {
"id": "chatcmpl-ykb8big5h7lz5c8r1xxz4",
"object": "chat.completion",
"created": 1704593676,
"model": "C:\\Users\\peter\\.cache\\lm-studio\\models\\TheBloke\\dolphin-2.2.1-mistral-7B-GGUF\\dolphin-2.2.1-mistral-7b.Q5_K_M.gguf",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I don't know"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 102,
"completion_tokens": 5,
"total_tokens": 107
}
}
PRO
Canada
#40  

Looks like you can buy 3 second hand RTX 3090 for the price of 1 new RTX 4090 so I think I will go 3090 route for GPU’s as you can also run mixtral if you have 2 or more cards and 32GB RAM available. (although I can’t test if mixtral works with ARC until I have cards). With mixtral hopefully with a 2 card setup we will get similar performance and AI capabilities as chat gpt 3.5

PRO
Synthiam
#41  

Haha you’ll be able to setup a server and charge us

PRO
Canada
#42  

I’m on Rogers internet and my wife makes me turn the computer off at night to save power. I can guarantee 0.0001% uptime.

PRO
Canada
#43   — Edited

I continued to play with LM Studio and ARC. I purchased a couple of second hand 3090 GPUs and this thing flies with very large models.  One 3090 GPU is probably fine for most models (about 600-700USD second hand or 800-900 CAD if you look around).    It is a pretty powerful and popular application that works out of the box with ARC and there are hundreds of different models you can choose from.  There are uncensored models so you don't have any bias, games and a range of models with various expertise like coding.  You can add your own data by retraining models or if you have some Documents you want to query you can use RAG and LangChain (experimental at the moment).

If you are sick of giving money to OpenAI or just want speed and privacy for the queries, give it a try. You need a NVIDIA GPU with a decent amount of VRAM Minimum 16GB but more the better (3090 or 4080-4090) or a very recent AMD GPU with ROCm 6 support (RX 7900XT  etc).   Also need a PC with AVX2 support

PRO
Synthiam
#44  

How fast are replies with your setup? And by couple do you mean two or three in the computer?

PRO
Canada
#45  

I purchased 2 cards but only one in computer at the moment as I am waiting for risers from china. The cards use 3 slots so are too close together so they overheat so I am moving to an open rack with the PCIe 16x risers. You could water cool if you want them in a case but I just stick my server in the cold cellar   My initial goal is 4 GPU's (96GB of VRAM) so I can run and train some very large models like mixtral and eventually go to 8 cards (192GB of VRAM) .   I get about 50+ tokens per second with one GPU. Response are pretty much instant especially when streaming on a model of about 10GB with 13Billion parameter model.  You have to load 100% of the model in GPU VRAM to get any decent speeds if you try to run from CPU it takes about 1-2 minutes to start response then about 4TP/s to stream. I have Ryzen 9 24 core + 32000Mhz DDR4 RAM, you apparently get about double that on DDR5 6000Mhz RAM and AM5 CPU's.  People are using MACs M3 as computer and get about 30TP/s so you can get 128GB computer or even 192GB Studio MAC if you have the cash (They share Video and GPU RAM and don't require a degree in computer science to setup). The  4090 cards are getting 80+ Tokens per second but cost a lot more.

I haven't built or trained my own custom models yet or used RAG that is my next todo.

PRO
Canada
#46  

Tried training a model on ARC data with RAG.  You get some interesting responses and some complete garbage. I am using LMStudio with the model "solar-10.7b-instruct-v1.0-uncensored.Q8_0.gguf" because it is not biased but I am sure there are better ones and another program called AnythingLLM that reads the support pages and tries to find an answer.  So here are some basic Q&A with model. Hey @athena what do you think can we use Retrieval Augmented Generation (RAG)  with a local general knowledge / programming language model to replace you? 

what is Synthiam ARC Synthiam ARC is a software platform that allows users to easily create, program and control robots using technologies from industry experts. It enables the creation of robot behavior by connecting pre-built modules in an intuitive graphical environment. With its ability to handle multiple concurrent tasks, it can be used for various applications ranging from simple robot navigation to advanced industrial automation. ARC also includes a library of CAD models that can be imported into the system and used as design templates for building physical robots.

what programming languages does it support

Synthiam ARC supports several high-level scripting languages, including JavaScript, Python, and Scratch 3.0. It also allows users to write their own custom code using C++ or Java if they have experience with those languages. With these options available, you can choose the language that best suits your skill level and project needs.

how do I add a skill to the desktop

To add a new skill to your Synthiam ARC project, follow these steps:

  1. Open Synthiam ARC and click on the Project tab at the top of the screen. You will see all your existing projects listed here. If you don’t have any projects yet, click the New Project button to create one.

  2. Select the project you want to add a skill to by clicking on it in the list. This will open up the project workspace, where you can edit and configure the robot project.

  3. Click on the Add Skill button located at the bottom left corner of the screen. This will open a window that displays all the available skills within your Synthiam ARC installation.

  4. Browse through the list to find the skill you want to add, and then click on it. The selected skill will appear in your project workspace.

  5. To configure the new skill, click on its title bar, which is located at the top of the skill control, to open the configuration menu. In this menu, you can customize various settings to suit your needs for that particular skill.

  6. Once the skill has been added and configured, you’ll need to save the changes by clicking the Save button on the upper right corner of the Synthiam ARC window. Now you can run your robot project with the newly added skill.

PRO
Synthiam
#47  

Haha that's a lot of nonsense to add a robot skill.

Are you adding synthiam data for training, embedding or finetuning on the model? Because you want to use Embedding. I doubt training is an option, is it? if so how long does that take

PRO
Canada
#48   — Edited

No I am not training a new model, I was going to try that when I get more GPU cards running. This method just takes the Synthiam support data and then breaks it up into chunks and then searches that for information relevant to the question then sends the question and the relevant chunks of support data to the LLM and says give me an answer based on this information I just gave you.    There is some Python code version that also does RAG and LangChain that I will try and shoehorn into a Script and then feed into the chatgpt skill.  https://github.com/ruddythor/mick.ai

PRO
Canada
#49  

It sounds like you're exploring some innovative approaches to enhance the capabilities of Synthiam ARC by leveraging large language models (LLMs) and Retrieval Augmented Generation (RAG) techniques. Using RAG with a local general knowledge or programming language model could indeed provide a powerful way to generate responses that are informed by specific datasets, such as Synthiam's support data. This method could potentially offer more accurate and contextually relevant answers by combining the generative power of LLMs with targeted information retrieval.

Embedding Synthiam support data for fine-tuning or as a knowledge base for RAG can significantly improve the model's ability to provide useful responses related to Synthiam ARC. However, as you mentioned, training or fine-tuning a model on specific datasets requires considerable computational resources, especially GPU power. Your plan to expand your setup with more GPUs and explore different models and techniques, like LangChain, is a promising direction.

The process of breaking up Synthiam support data into chunks and using it to inform the LLM's responses is a smart way to leverage existing documentation and expertise. This approach can make the interaction with the model more relevant to users' needs, especially when dealing with technical support queries or detailed information about Synthiam ARC.

As you continue to experiment with these technologies, keep in mind the balance between computational resource requirements and the benefits of enhanced response quality. It's exciting to see such innovative use of AI and machine learning to improve user experience and support in the robotics domain.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#50   — Edited

Wish i had the time to even look at it and play - but we're in crossplatform hell right now. sidenote, windows isn't bloated, it just has stuff because it actually does stuff.

Anyway, look into your thing and see if there's something called Embedding

*edit: here's some more info.. kind of the basis of athena: platform.openai.com/docs/guides/embeddings

PRO
Canada
#51  

Thanks looks like AnythingLLM does a hybrid approach of using RAG and embedding.  I am just starting to learn this space and a lot of the terminology is foreign to me so it’s a bit of a journey.

good luck with your cross platform challenges. As long as it runs on the Tesla Optimus robot we should be fine :D

PRO
Synthiam
#52  

Haha Tesla. I like his initiative but he needs to be more honest. It’s his talent to be charismatic about things and exaggerate. His stories talk about where he wants to be but makes it sound like he’s already there. That indeed makes him a sort of futurist, but not in a way ppl are used to.

you can’t build a reputation on things you’re going to do - Henry ford

the Tesla bot and others like sanctuary are being controlled by humans behind the scene. They’re essentially telepresence on steroids - and that’s to solve the mechanical challenges first. They know the power and AI constraints are crazy limiting today. So getting the mechanical reliable is not a waste of time.

but what they’re not doing is explaining that there’s a human next to it wearing a vr headset holding haptic controllers. Sanctuary is more transparent about it. I think Tesla has only mentioned it once in what I’ve read. They keep saying it’s training to do this or training to do that / which is correct. But they don’t say the training is a human controlling it.

PRO
Canada
#53   — Edited

So I went down the AI rabbit hole.  Fun journey.  Initially started with 1 graphics card (RTX 3090) second hand for $800.  This is enough to run a lot of AI models and you can also embed your own PDFs, Documents, do some image recognition etc. Even if you have a smaller GPU like a rtx 4060 you can do a lot with it. I have now started trying to create my own models and for this you need hardware.

here is my AI system I built.  It has 4 * RTX 3090 giving me 96GB of VRAM for training.   total cost about $4,000 Canadian (~$3000 USD) to build mostly second hand parts.  I also added an arduino with a couple relays to reboot, power on off, turn on second PSU when I need all 4 GPU’s (has one 750w and one 1600w PSU as GPUs chew a lot of power). It also  monitors temp and text me if it gets too hot.

long term goal this will be my robots brain hooked up to ARC to handle voice, image, object recognition, conversation, knowledge etc and hopefully object manipulation in future.   User-inserted imageUser-inserted image