data:image/s3,"s3://crabby-images/59c29/59c29139b57601f6c6ec9b1ea0c9ea3d16ad81e0" alt="Profile Picture"
PRO
smiller29
USA
Asked
What are the most cost-effective Text-to-Speech (TTS) and Speech-to-Text (STT) skills when using the ChatGPT skill within ARC? I am interested in setting up a local Language Model (LLM) server for ChatGPT but still require TTS and STT capabilities to achieve my desired outcomes. It appears that many of these skills involve paid services, so I am seeking options that offer good performance at the best price. Ideally, I would prefer not to rely on cloud-based services for these functionalities. If there are any viable local options, please provide more details.
Related Hardware LattePanda
Related Control
3-in-1 IMU
Take a look in the audio section for the robot skill list. I use Bing Speech Recognition and Azure Speech Synthesis. Those are my favorites and are free with an ARC subscription.
When it comes to setting up Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities for your LattePanda using Synthiam ARC, there are a few options to consider, especially if you're looking for cost-effective and potentially local solutions.
Text-to-Speech (TTS) Options
Microsoft Speech Platform:
eSpeak:
Speech-to-Text (STT) Options
Vosk:
CMU Sphinx (PocketSphinx):
Setting Up a Local Language Model (LLM) Server
For setting up a local LLM server for ChatGPT, you might consider using open-source models like GPT-Neo or GPT-J, which can be run locally. However, these models require significant computational resources, so ensure your LattePanda is equipped to handle them.
Integration with ARC
To integrate these TTS and STT solutions with ARC, you can use the scripting capabilities within ARC to call the respective TTS/STT engines. You can write scripts in JavaScript or Python to interface with these engines and process audio input/output.
Additional Resources
If you have any further questions or need assistance with the setup, feel free to ask!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
@athena he's asking about robot skills not making his own. i know you don't know about pricing because you only know what's in a manual.
The list of audio robot skills and their descriptions are here: https://synthiam.com/Support/Skills/Skills-Overview#Audio
Thanks DJ, I see Azure Speech Synthesis is limited to 500 Char calls and 1000 calls a day which I think will be OK. Does Bing Speech Recognition have any limitations like that?
Bing Speech Recognition,as part of the Microsoft Cognitive Services,does have some limitations,though they are generally quite generous for typical use cases.Here are some key points to consider:
Free Tier Limitations:
Character or Time Limitations:
Daily or Monthly Quotas:
Latency and Processing Time:
For the most accurate and up-to-date information, I recommend checking the specific details within the ARC skill or visiting the Microsoft Azure website to understand the current offerings and limitations of Bing Speech Recognition. If you have any more questions or need further assistance, feel free to ask!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
@smiller, I think Bing Speech Recognition is 1,000 daily (6,000 daily total for all cognitive requests). We're also exploring adding the grok and gpt integration into a new subscription tier for the autonomous AI, and Athena built into ARC. I can't answer any pricing questions because we're negotiating now. We're porting Arcx stuff to Arc, given how delayed Arcx dependencies have caused us. That's why we have been releasing the autonomous AI robot skill.