Asked
— Edited
Hello y'all, I've created a C# project and I'm already able to do basically the same as the Bing Speech Recognition plugin does, but I also need to make my EZ-B (JD) "speaks" in another language. I've installed a Microsoft voice, but that's a pretty bad one, and the Azure platform offers two very nice voices to be used.
Right now I'm stuck with how I send the voice received from Azure to the EZ-B. Azure offers a variety of audio formats.
Has anyone tried this before? I've gone through some of the tutorials in the SDK but couldn't find one that does something like that.
Thanks! Gilvan
HTML5 Speech Synthesis works pretty good and has a variety of voices to offer...
https://stackoverflow.com/questions/25336428/html5-speech-synthesis?rq=1
This would be the full documentation
https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
Thanks @Mickey666Maus, that sounds great. But my question is on how to send the audio received to the EZ-B. I guess I'd have the same problem there.
@Gilvan,
Check the UniversalBot code http://synthiam.com/Products/ARC
Browsing the code you have information needed to send the sound data to EZB.
I've done that before, but i can't find the code.
This is related to the Microsoft Cognitive API, maybe DJ is working or will be working to support that feature too.
@DJ ?
@Mickey666Maus,
The api you mentioned
javascript example code:
is only supported within the browser (not all) although chrome handles pretty well.
some complains: http://ejb.github.io/2015/06/07/html5-speech-synthesis-api.html
even if you manage to launch the chrome engine (v8) like ARC does with blocky editor, you don't have a way to extract the voice sound.
https://stackoverflow.com/questions/21905583/record-html5-speechsynthesisutterance-generated-speech-to-file
Still a neat idea for the web...
@ptp the api works quiet well...and to me it is kind of a good work around to get different voices and languages going. But you are totally right, it is only working from within a browser, so I was kind of pointing to the wrong direction I guess!
This is a working example of a web based client! Which can also send data over to ARC, but cannot be called from within ARC!
http://www.downtown-tattoo.de/robotics/test123.html
To connect to the ARCs server you would just have to make an XMLHttpRequest(); to call eg a ControlCommand() in ARC...
But the limitation is clearly that ARC cannot send data to the browser I guess? At least I did not find a solution on this!
I updated the plugin tutorial to include instructions on how to output audio: https://synthiam.com/Tutorials/UserTutorials/146/24
I also created a plugin with complete example and source code: https://synthiam.com/redirect/legacy?table=plugin&id=202
DJ,
Nice!
Do you have plans to add the Microsoft TTS Bing TTS feature to the existent plugin i.e. (EZ-Robot Bing plugin) ?
*** EDITED ***
Microsoft TTS? As in the speech synthesis? It has an output stream... Just do this..
@DJ,
Not that.
This: https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoiceoutput
@Mickey666Maus,
you are mixing up concepts.
I checked the Javascript code and runs on the browser the speech recognition and tts, after the speech recognition is done the browser calls api.ai.
All the work is done on the browser, the server only serves the page.
XMLHttpRequest is a javascript class to GET or POST data to a web server.
you have a similar function in EZ-script:
There is no limitation: a browser is a client application, you want another client application i.e. ARC to call a client ?
off course there are some exceptions i remember some years ago i used windows DDE (old stuff) https://en.wikipedia.org/wiki/Dynamic_Data_Exchange
to interact with Internet Explorer, and Microsoft Excel, etc.
To allow other clients to connect to an Application the application needs to expose a protocol, interfaces, methods etc.
@ptp, the example i provided will give you the ability to pipe any audio through the ez-b. Simply pipe PCM data at the sample rate specified in the example into any of the PlayData() overrides. It doesn't matter where you get the data from. If you get it from the link provided, cool. If you get it from a microphone, cool. If you get it from summoning a spirit ghost in the form of compatible audio data, cool.
Here is information in step-by-step for recap in case you haven't viewed the links i provided. Have fun!
Source Code: OutputAudioFromEZ-BSource.zip
*Dependency: Additional to adding ARC.exe and EZ-B.DLL, this plugin requires NAudio.DLL library to be added as a project reference. Remember to UNSELECT "Copy Files"!
This plugin provides the following examples:
Output Text to Speech You can output text to speech easily as well, using the following code example...
@DJ,
The code is clear, and i believe it helps/answers the initial request (Gilvan Gomes).
I believe Gilvan is trying to code a plugin to use Bing/Azure/Microsoft Cognitive TTS services with EZB.
To avoid overlapping features (User vs EZ-Robot) plugin.
I asked if you have plans to create a Bing TTS plugin or extend the existent Bing Recognition plugin.
Not immediate plans - looks pretty simple though. Specifically since the service returns a byte array of compatible audio. Simply pump it through the examples and voila, you got bing tts.
I don't think it belongs in the Bing Recognition plugin. It would be a plugin of its own. The bing recognition is for speech recognition, not speech synthesis. The configuration of the two are quite different from the user perspective.
Correct, i only asked because is inline with your Microsoft effort/partnership.
Correct, i assumed that mainly due to the name and if sharing the same keys.
I think someone else should get some fame and publicity for building a plugin using microsoft services. I know ez-robot would promote it and microsoft would be happy to see the engagement
@ptp yes, I guess I was mixing concepts! I was actually trying to get the same thing done that @Gilvan Gomes was trying to do, since the TTS voices on Windows are kind of hard to change eg from English to German to Portuguese...
If there is anything that can be done to have a variety of voices/languages available it would be highly appreciated!
That's so awesome, guys! Thanks @Mickey666Maus and @ptp for all the comments. And I can't thank you enough for the tutorial, @DJ-Sures!
In my research project, JD will work with blind people, that's why speech is so important, and the Microsoft Cognitive Services APIs have all need to complement JD's features. Once I'm done with the project itself, I'm going to publish the plugin (if no one does it until then).
Thanks again! Gilvan
What a great initiative - very impressive! I'm also certain a large number of community members would appreciate the plugin. EZ-Robot will also share the plugin in a news letter.
Also, if you happen to create a video or media regarding the research project, I'll be sure to have it added to a news letter. The ez-robot newsletter reaches pretty high profile people - i'm certain you will get quality viewers.
@Dj-Sures we'll probably publish an article with the results of the project/experiments, so I'll send that to you with other media once I have it later this year.
Thank you! Gilvan