Thumbnail

Watson Speech To Text

Watson Speech to Text is a cloud-native solution that uses deep-learning AI algorithms to apply knowledge about grammar, language structure, and audio/voice signal composition to create customizable speech recognition for optimal text transcription.

+ How To Add This Control To Your Project (Click to Expand)
  1. Make sure you have the latest version of ARC installed.
  2. Select the Get button in this page to download the archive file.
  3. Double click the downloaded archive file to execute installer.
  4. The installer will add this control to ARC.
  5. Load ARC and press the Project -> Add Control button from the menu.
  6. Choose the Audio category tab.
  7. Press the Watson Speech To Text icon to add the control to your project.

Manual

Version 10 (2020-10-20)
==================
Minor changes

Version 9 (2020-10-08)
==================
I decided to break the plugin in multiple plugins to help troubleshooting, improve and fix bugs.
This plugin will become Watson Speech to Text.

Documentation (WIP)
You will need an IBM cloud account (Free Tier)

Watson Speech To Text:
https://www.ibm.com/cloud/watson-speech-to-text
User-inserted image


Register a new account:
https://cloud.ibm.com/registration
User-inserted image


login to your account:
https://cloud.ibm.com/login
User-inserted image


Dashboard:
User-inserted image

Press "Create Resource"
Search for Speech To Text resource:
User-inserted image


Quote:

The Lite plan gets you started with 500 minutes per month at no cost.
after creating or managing the service's credentials:
User-inserted image

you are ready to configure the plugin.

Plugin Configuration:
User-inserted image

Press "change":
User-inserted image


1) You will need to configure the model i.e. Language choose high rates i.e. 16000 for better quality capture:
User-inserted image


2) You will need to select a VAD (Voice Activity Detection) engine, this engine handles the speech detection the plugin supports two different engines: Speech API available on Microsoft Windows and WebRTC engine.

Windows VAD engine configuration:
User-inserted image

You will need to choose an available Microsoft Speech Recognizer. If your windows does not have one available you will need to use the WebRTC VAD engine:
User-inserted image

Modes are: 0-Quality, 1-LowBitRate, 2-Aggressive, 3-Very Aggressive.
Higher the value more aggressive the WebRTC engine will try to match the speech detection, but with an additional cost more false positives.

Speech Tab:
User-inserted image


1) The plugin will be listening i.e. Capturing Audio when the Listening checkbox is checked)
2) The plugin will show audio visualization when the Visualize checkbox is checked
To listen and visualize you don't need a Watson service account configured.

To use the Watson cloud services you will need to check the STT (Speech To Text) checkbox.

Green Areas are flagged with speech and orange areas are silence / noise.

#1  
When I load the skill this error appears:

User-inserted image


User-inserted image



what am I wrong?

thanks!
Canada
#4  
Nice thanks PTP I really appreciate you keeping this plugin updated :-)
#5  
quick feedback:
I got delayed with some issues relating to the development tools upgrade, but I'm almost done.

@Nink:
No problem.

There are new improvements, one of them is a translation service, I plan to add to the plugin.
#6  
Hello. I am waiting for the update as well. 

Thanks for all of your good work!
#7  
New version released ! Still doing tests and trying to catch new bugs.
#8  
New version released, please check the "breaking changes".

I'll start releasing the other Watson services plugins soon: TextToSpeech, Visual Recognition, Assistant and Language Translator