Asked

What Is The Most Cost Effective TTS And STT Skills?

What are the most cost-effective Text-to-Speech (TTS) and Speech-to-Text (STT) skills when using the ChatGPT skill within ARC? I am interested in setting up a local Language Model (LLM) server for ChatGPT but still require TTS and STT capabilities to achieve my desired outcomes. It appears that many of these skills involve paid services, so I am seeking options that offer good performance at the best price. Ideally, I would prefer not to rely on cloud-based services for these functionalities. If there are any viable local options, please provide more details.


Related Hardware LattePanda
Related Control 3-in-1 IMU

ARC Pro

Upgrade to ARC Pro

Discover the limitless potential of robot programming with Synthiam ARC Pro – where innovation and creativity meet seamlessly.

PRO
Synthiam
#1  

Take a look in the audio section for the robot skill list. I use Bing Speech Recognition and Azure Speech Synthesis. Those are my favorites and are free with an ARC subscription.

PRO
Canada
#2  

When it comes to setting up Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities for your LattePanda using Synthiam ARC, there are a few options to consider, especially if you're looking for cost-effective and potentially local solutions.

Text-to-Speech (TTS) Options

  1. Microsoft Speech Platform:

    • Cost: Free
    • Details: The Microsoft Speech Platform offers a variety of voices and is a free option for TTS. It can be installed locally on your Windows-based LattePanda.
    • Setup: You can download the Microsoft Speech Platform runtime and the necessary language packs from the Microsoft website. Once installed, you can configure ARC to use these voices.
  2. eSpeak:

    • Cost: Free
    • Details: eSpeak is an open-source TTS engine that can be installed locally. It supports multiple languages and is lightweight, making it suitable for single-board computers like the LattePanda.
    • Setup: You can download and install eSpeak from its official website or through package managers if you're using a Linux-based system on your LattePanda.

Speech-to-Text (STT) Options

  1. Vosk:

    • Cost: Free
    • Details: Vosk is an open-source STT engine that can run locally on your device. It supports multiple languages and is designed to work offline, making it a great choice for avoiding cloud-based services.
    • Setup: You can find Vosk on GitHub, where you can download the necessary models and integrate them with ARC using Python or JavaScript scripts.
  2. CMU Sphinx (PocketSphinx):

    • Cost: Free
    • Details: PocketSphinx is a lightweight STT engine that can run locally. It's part of the CMU Sphinx project and is suitable for embedded systems.
    • Setup: You can download PocketSphinx from its official website and follow the instructions to integrate it with ARC.

Setting Up a Local Language Model (LLM) Server

For setting up a local LLM server for ChatGPT, you might consider using open-source models like GPT-Neo or GPT-J, which can be run locally. However, these models require significant computational resources, so ensure your LattePanda is equipped to handle them.

Integration with ARC

To integrate these TTS and STT solutions with ARC, you can use the scripting capabilities within ARC to call the respective TTS/STT engines. You can write scripts in JavaScript or Python to interface with these engines and process audio input/output.

Additional Resources

If you have any further questions or need assistance with the setup, feel free to ask!


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
#4  

Thanks DJ,  I see Azure Speech Synthesis is limited to 500 Char calls and 1000 calls a day which I think will be OK.   Does Bing Speech Recognition have any limitations like that?

PRO
Canada
#5  

Bing Speech Recognition,as part of the Microsoft Cognitive Services,does have some limitations,though they are generally quite generous for typical use cases.Here are some key points to consider:

  1. Free Tier Limitations:

    • The free tier usually includes a limited number of transactions per month. This can vary, so it's best to check the latest details on the Microsoft Azure website or within the ARC skill description.
  2. Character or Time Limitations:

    • There might be limitations on the length of audio that can be processed in a single request. This is typically measured in seconds or minutes.
  3. Daily or Monthly Quotas:

    • Similar to Azure Speech Synthesis, there may be daily or monthly quotas on the number of requests you can make. These quotas can be increased by moving to a paid tier.
  4. Latency and Processing Time:

    • While not a strict limitation, it's worth noting that processing time can vary based on the length of the audio and the complexity of the speech being recognized.

For the most accurate and up-to-date information, I recommend checking the specific details within the ARC skill or visiting the Microsoft Azure website to understand the current offerings and limitations of Bing Speech Recognition. If you have any more questions or need further assistance, feel free to ask!


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#6  

@smiller, I think Bing Speech Recognition is 1,000 daily (6,000 daily total for all cognitive requests). We're also exploring adding the grok and gpt integration into a new subscription tier for the autonomous AI, and Athena built into ARC. I can't answer any pricing questions because we're negotiating now. We're porting Arcx stuff to Arc, given how delayed Arcx dependencies have caused us. That's why we have been releasing the autonomous AI robot skill.