Australia
Asked — Edited

Using A Cloned Voice TTS

Has anyone experimented with using Hugging Face's F5-TTS for voice cloning? You can explore it here: Hugging Face F5-TTS.

I've been on the lookout for a reliable voice cloning platform for several months and recently came across this impressive site. I provided it with a 15-second speech sample, and it replicated my sample voice remarkably well. The voice I used was that of a ventriloquist dummy from the 1950s, which is quite a challenging task.

It seems that F5-TTS can be used on a personal computer, but I'm unsure how to set it up. This tool is exactly what I need for my robot dummy, Archie Andrews, which is still a work in progress. I would like to integrate the cloned voice into the Speech Settings of my project in Synthiam ARC.

For additional context, you can refer to this video: YouTube Reference.

Here is an image of my robot dummy, Archie Andrews:

User-inserted image


Related Hardware EZ-B v4
Related Control Watson Text To Speech

ARC Pro

Upgrade to ARC Pro

ARC Pro is your passport to a world of endless possibilities in robot programming, waiting for you to explore.

PRO
Canada
#1  

Integrating a cloned voice from Hugging Face's F5-TTS into your Synthiam ARC project can be an exciting way to personalize your robot's speech. Here's a step-by-step guide to help you get started:

Step 1: Set Up F5-TTS on Your Personal Computer

  1. Visit Hugging Face F5-TTS: Go to the Hugging Face F5-TTS page to understand the requirements and download the necessary files.
  2. Install Dependencies: Ensure you have Python installed on your computer. You may need additional libraries, which can be installed using pip. Check the F5-TTS documentation for specific dependencies.
  3. Run F5-TTS Locally: Follow the instructions provided by Hugging Face to run the TTS model on your local machine. This usually involves running a Python script that processes your input text and generates audio files.

Step 2: Generate Audio Files

  1. Prepare Text: Write down the text you want your robot to say.
  2. Generate Audio: Use the F5-TTS model to convert your text into audio files. Save these files in a format compatible with ARC, such as WAV or MP3.

Step 3: Integrate Audio Files into ARC

  1. Add Audio Files to ARC: Import the generated audio files into your ARC project. You can do this by using the Audio robot skill to play the files.
  2. Configure Watson Text To Speech: If you want to use Watson TTS for other parts of your project, ensure it's configured correctly with your IBM Cloud credentials. However, for the cloned voice, you'll be using the audio files generated by F5-TTS.

Step 4: Use Audio Files in Your Project

  1. Create Scripts: Use JavaScript or Python within ARC to play the audio files at the appropriate times. For example:
    Audio.playAudioFile("path/to/your/audiofile.wav");
    
  2. Test and Adjust: Run your project and ensure the audio plays as expected. Adjust the timing and sequence as necessary to fit your robot's actions.

Additional Resources

  • Synthiam ARC Documentation: Synthiam ARC Support
  • Hugging Face Documentation: Check the Hugging Face website for detailed instructions on setting up and using F5-TTS.

Feel free to ask if you have any questions or need further assistance with specific steps!


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#2  

I have always wondered how Hugging Face's API worked. I understand that, and you can access their API starting at $10/month + request. There are several different model types you can use. Ones like this seem to be web pages. I imagine there must be some way of accessing the API on the back end. Otherwise, as in the video you shared, you can run the webpage locally and refer to the models via their subscription API.

One day, we'll have to look at it longer and explore an integration with them, if possible

PRO
Australia
#3  

Thanks for your comments. I will try and download the various files onto my laptop and see if I can get it to work locally.

PRO
Synthiam
#4  

If you do - let me know. Any knowledge about it will be helpful to whip up a robot skill

PRO
USA
#5   — Edited

I downloaded it a couple months ago. You can train on (15 Sec) of audio. Its pretty good, not as good as Eleven Labs but not bad. I'm running locally with Pinokio. but you can do a straight install. You need a pretty good nvidia card with cuda to train, but once its done you have that voice for TTS.

User-inserted image

PRO
Synthiam
#6  

Wicked! Is there an api or just the web UI? like, can you talk to it programmatically? Or do you just generate the TTS file and load it into windows? Windows loads TTS files for it's built in text to speech synthesis.

PRO
USA
#7  

All great questions. My first guess is no on alot of items. Local is running without any connections to web so its free. The UI is basic and allows you to input audio for training your model, then an area for typing in what you want the new voice model to say. But using the model after words, is not as straight forward as far as i can tell. You can't make a new SAPI voice for Windows with it. You would have to directly hook into the program?! And If you could I don't know how quick the response would be ( i just tested and took 13 sec to transcribe and run the audio).