Asked — Edited

Microsoft Cognitive Speech - Text To Speech

Hello y'all, I've created a C# project and I'm already able to do basically the same as the Bing Speech Recognition plugin does, but I also need to make my EZ-B (JD) "speaks" in another language. I've installed a Microsoft voice, but that's a pretty bad one, and the Azure platform offers two very nice voices to be used.

Right now I'm stuck with how I send the voice received from Azure to the EZ-B. Azure offers a variety of audio formats.

Has anyone tried this before? I've gone through some of the tutorials in the SDK but couldn't find one that does something like that.

Thanks! Gilvan


ARC Pro

Upgrade to ARC Pro

Harnessing the power of ARC Pro, your robot can be more than just a simple automated machine.

#1  

HTML5 Speech Synthesis works pretty good and has a variety of voices to offer...

https://stackoverflow.com/questions/25336428/html5-speech-synthesis?rq=1

#2  

This would be the full documentation

https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

#3  

Thanks @Mickey666Maus, that sounds great. But my question is on how to send the audio received to the EZ-B. I guess I'd have the same problem there.

PRO
USA
#4  

@Gilvan,

Check the UniversalBot code http://synthiam.com/Products/ARC

Browsing the code you have information needed to send the sound data to EZB.

I've done that before, but i can't find the code.

PRO
USA
#5  

This is related to the Microsoft Cognitive API, maybe DJ is working or will be working to support that feature too.

@DJ ?

PRO
USA
#6  

@Mickey666Maus,

The api you mentioned

javascript example code:


var speech = new SpeechSynthesisUtterance('Ola Brazil!');
speech.lang = 'pt-BR';
window.speechSynthesis.speak(speech);

is only supported within the browser (not all) although chrome handles pretty well.

some complains: http://ejb.github.io/2015/06/07/html5-speech-synthesis-api.html

even if you manage to launch the chrome engine (v8) like ARC does with blocky editor, you don't have a way to extract the voice sound.

https://stackoverflow.com/questions/21905583/record-html5-speechsynthesisutterance-generated-speech-to-file

Still a neat idea for the web...

#7  

@ptp the api works quiet well...and to me it is kind of a good work around to get different voices and languages going. But you are totally right, it is only working from within a browser, so I was kind of pointing to the wrong direction I guess!

This is a working example of a web based client! Which can also send data over to ARC, but cannot be called from within ARC!

http://www.downtown-tattoo.de/robotics/test123.html

#8  

To connect to the ARCs server you would just have to make an XMLHttpRequest(); to call eg a ControlCommand() in ARC...

But the limitation is clearly that ARC cannot send data to the browser I guess? At least I did not find a solution on this! :)

PRO
Synthiam
#9  

I updated the plugin tutorial to include instructions on how to output audio: https://synthiam.com/Tutorials/UserTutorials/146/24

I also created a plugin with complete example and source code: https://synthiam.com/redirect/legacy?table=plugin&id=202

PRO
USA
#10  

DJ,

Nice!

Do you have plans to add the Microsoft TTS Bing TTS feature to the existent plugin i.e. (EZ-Robot Bing plugin) ?

*** EDITED ***

PRO
Synthiam
#11  

Microsoft TTS? As in the speech synthesis? It has an output stream... Just do this..


      using (MemoryStream s = EZBManager.EZBs[0].SpeechSynth.SayToStream("I am speaking out of the EZ-B))
        EZBManager.EZBs[0].SoundV4.PlayData(s);

PRO
USA
#12  

@DJ,

Not that.

This: https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoiceoutput

PRO
USA
#13  

@Mickey666Maus,

you are mixing up concepts.

Quote:

http://www.downtown-tattoo.de/robotics/test123.html

I checked the Javascript code and runs on the browser the speech recognition and tts, after the speech recognition is done the browser calls api.ai.

All the work is done on the browser, the server only serves the page.

Quote:

To connect to the ARCs server you would just have to make an XMLHttpRequest();

XMLHttpRequest is a javascript class to GET or POST data to a web server.

you have a similar function in EZ-script:


HttpGet( url ) 

Quote:

But the limitation is clearly that ARC cannot send data to the browser I guess?

There is no limitation: a browser is a client application, you want another client application i.e. ARC to call a client ?

off course there are some exceptions i remember some years ago i used windows DDE (old stuff) https://en.wikipedia.org/wiki/Dynamic_Data_Exchange

to interact with Internet Explorer, and Microsoft Excel, etc.

To allow other clients to connect to an Application the application needs to expose a protocol, interfaces, methods etc.

PRO
Synthiam
#14  

@ptp, the example i provided will give you the ability to pipe any audio through the ez-b. Simply pipe PCM data at the sample rate specified in the example into any of the PlayData() overrides. It doesn't matter where you get the data from. If you get it from the link provided, cool. If you get it from a microphone, cool. If you get it from summoning a spirit ghost in the form of compatible audio data, cool.

Here is information in step-by-step for recap in case you haven't viewed the links i provided. Have fun!

Source Code: OutputAudioFromEZ-BSource.zip

*Dependency: Additional to adding ARC.exe and EZ-B.DLL, this plugin requires NAudio.DLL library to be added as a project reference. Remember to UNSELECT "Copy Files"!

This plugin provides the following examples:

  1. Load audio from MP3 or WAV file

// MP3
NAudio.Wave.Mp3FileReader mp3 = new NAudio.Wave.Mp3FileReader(openFileDialog1.FileName);

// WAV
NAudio.Wave.WaveStream wav = new NAudio.Wave.WaveFileReader(openFileDialog1.FileName);

  1. Convert audio file to uncompressed PCM data to supported EZ-B sample rate and sample size

NAudio.Wave.WaveFormatConversionStream pcm = new NAudio.Wave.WaveFormatConversionStream(new NAudio.Wave.WaveFormat(EZ_B.EZBv4Sound.AUDIO_SAMPLE_BITRATE, 8, 1), mp3);

  1. Compress PCM data with gzip to be stored in project STORAGE

                using (MemoryStream ms = new MemoryStream()) {

                  using (GZipStream gz = new GZipStream(ms, CompressionMode.Compress))
                    pcm.CopyTo(gz);

                  _cf.STORAGE[ConfigTitles.COMPRESSED_AUDIO_DATA] = ms.ToArray();
                }

  1. Play audio data from compressed project STORAGE

        using (MemoryStream ms = new MemoryStream(compressedAudioData))
        using (GZipStream gz = new GZipStream(ms, CompressionMode.Decompress))
          EZBManager.EZBs[0].SoundV4.PlayData(gz);

  1. Supports ControlCommand() for Play and Stop of audio to be used in external scripts

    public override object[] GetSupportedControlCommands() {

      List items = new List();

      items.Add(ControlCommands.StartPlayingAudio);
      items.Add(ControlCommands.StopPlayingAudio);

      return items.ToArray();
    }

    public override void SendCommand(string windowCommand, params string[] values) {

      if (windowCommand.Equals(ControlCommands.StartPlayingAudio, StringComparison.InvariantCultureIgnoreCase))
        playStoredAudio();
      else if (windowCommand.Equals(ControlCommands.StopPlayingAudio, StringComparison.InvariantCultureIgnoreCase))
        stopPlaying();
      else
        base.SendCommand(windowCommand, values);
    }

  1. Changes the status of the button when audio is playing globally from anywhere in ARC on EZ-B #0

    public FormMain() {

      InitializeComponent();

      EZBManager.EZBs[0].SoundV4.OnStartPlaying += SoundV4_OnStartPlaying;
      EZBManager.EZBs[0].SoundV4.OnStopPlaying += SoundV4_OnStopPlaying;
    }

    private void FormMain_FormClosing(object sender, FormClosingEventArgs e) {

      EZBManager.EZBs[0].SoundV4.OnStartPlaying -= SoundV4_OnStartPlaying;
      EZBManager.EZBs[0].SoundV4.OnStopPlaying -= SoundV4_OnStopPlaying;
    }

    private void SoundV4_OnStopPlaying() {

      Invokers.SetText(btnPlayAudio, "Play");
    }

    private void SoundV4_OnStartPlaying() {

      Invokers.SetText(btnPlayAudio, "Stop");
    }

Output Text to Speech You can output text to speech easily as well, using the following code example...


      using (MemoryStream s = EZBManager.EZBs[0].SpeechSynth.SayToStream("I am speaking out of the EZ-B))
        EZBManager.EZBs[0].SoundV4.PlayData(s);

PRO
USA
#15  

@DJ,

The code is clear, and i believe it helps/answers the initial request (Gilvan Gomes).

I believe Gilvan is trying to code a plugin to use Bing/Azure/Microsoft Cognitive TTS services with EZB.

To avoid overlapping features (User vs EZ-Robot) plugin.

I asked if you have plans to create a Bing TTS plugin or extend the existent Bing Recognition plugin.

PRO
Synthiam
#16  

Not immediate plans - looks pretty simple though. Specifically since the service returns a byte array of compatible audio. Simply pump it through the examples and voila, you got bing tts.

I don't think it belongs in the Bing Recognition plugin. It would be a plugin of its own. The bing recognition is for speech recognition, not speech synthesis. The configuration of the two are quite different from the user perspective.

PRO
USA
#17  

Quote:

looks pretty simple though.

Correct, i only asked because is inline with your Microsoft effort/partnership.

Quote:

I don't think it belongs in the Bing Recognition plugin. It would be a plugin of its own.

Correct, i assumed that mainly due to the name and if sharing the same keys.

PRO
Synthiam
#18  

I think someone else should get some fame and publicity for building a plugin using microsoft services. I know ez-robot would promote it and microsoft would be happy to see the engagement :D

#19  

@ptp yes, I guess I was mixing concepts! I was actually trying to get the same thing done that @Gilvan Gomes was trying to do, since the TTS voices on Windows are kind of hard to change eg from English to German to Portuguese...

If there is anything that can be done to have a variety of voices/languages available it would be highly appreciated!

#20  

That's so awesome, guys! Thanks @Mickey666Maus and @ptp for all the comments. And I can't thank you enough for the tutorial, @DJ-Sures!

In my research project, JD will work with blind people, that's why speech is so important, and the Microsoft Cognitive Services APIs have all need to complement JD's features. Once I'm done with the project itself, I'm going to publish the plugin (if no one does it until then).

Thanks again! Gilvan

PRO
Synthiam
#21  

What a great initiative - very impressive! I'm also certain a large number of community members would appreciate the plugin. EZ-Robot will also share the plugin in a news letter.

Also, if you happen to create a video or media regarding the research project, I'll be sure to have it added to a news letter. The ez-robot newsletter reaches pretty high profile people - i'm certain you will get quality viewers.

#22  

@Dj-Sures we'll probably publish an article with the results of the project/experiments, so I'll send that to you with other media once I have it later this year.

Thank you!:) Gilvan