afcorson
Has anyone experimented with using Hugging Face's F5-TTS for voice cloning? You can explore it here: Hugging Face F5-TTS.
I've been on the lookout for a reliable voice cloning platform for several months and recently came across this impressive site. I provided it with a 15-second speech sample, and it replicated my sample voice remarkably well. The voice I used was that of a ventriloquist dummy from the 1950s, which is quite a challenging task.
It seems that F5-TTS can be used on a personal computer, but I'm unsure how to set it up. This tool is exactly what I need for my robot dummy, Archie Andrews, which is still a work in progress. I would like to integrate the cloned voice into the Speech Settings of my project in Synthiam ARC.
For additional context, you can refer to this video: YouTube Reference.
Here is an image of my robot dummy, Archie Andrews:
Integrating a cloned voice from Hugging Face's F5-TTS into your Synthiam ARC project can be an exciting way to personalize your robot's speech. Here's a step-by-step guide to help you get started:
Step 1: Set Up F5-TTS on Your Personal Computer
Step 2: Generate Audio Files
Step 3: Integrate Audio Files into ARC
Audio
robot skill to play the files.Step 4: Use Audio Files in Your Project
Additional Resources
Feel free to ask if you have any questions or need further assistance with specific steps!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
I have always wondered how Hugging Face's API worked. I understand that, and you can access their API starting at $10/month + request. There are several different model types you can use. Ones like this seem to be web pages. I imagine there must be some way of accessing the API on the back end. Otherwise, as in the video you shared, you can run the webpage locally and refer to the models via their subscription API.
One day, we'll have to look at it longer and explore an integration with them, if possible
Thanks for your comments. I will try and download the various files onto my laptop and see if I can get it to work locally.
If you do - let me know. Any knowledge about it will be helpful to whip up a robot skill
I downloaded it a couple months ago. You can train on (15 Sec) of audio. Its pretty good, not as good as Eleven Labs but not bad. I'm running locally with Pinokio. but you can do a straight install. You need a pretty good nvidia card with cuda to train, but once its done you have that voice for TTS.
Wicked! Is there an api or just the web UI? like, can you talk to it programmatically? Or do you just generate the TTS file and load it into windows? Windows loads TTS files for it's built in text to speech synthesis.
All great questions. My first guess is no on alot of items. Local is running without any connections to web so its free. The UI is basic and allows you to input audio for training your model, then an area for typing in what you want the new voice model to say. But using the model after words, is not as straight forward as far as i can tell. You can't make a new SAPI voice for Windows with it. You would have to directly hook into the program?! And If you could I don't know how quick the response would be ( i just tested and took 13 sec to transcribe and run the audio).