Watson Speech To Text

Name: Watson Speech To Text
Author: ptp

ptp

Watson Speech to Text is a cloud-native solution that uses deep-learning AI algorithms to apply knowledge about grammar, language structure, and audio/voice signal composition to create customizable speech recognition for optimal text transcription.

Requires ARC v11 (Updated 11/3/2020)

How to add the Watson Speech To Text robot skill

Load the most recent release of ARC (Get ARC).
Press the Project tab from the top menu bar in ARC.
Press Add Robot Skill from the button ribbon bar in ARC.
Choose the Audio category tab.
Press the Watson Speech To Text icon to add the robot skill to your project.

Don't have a robot yet?

Follow the Getting Started Guide to build a robot and use the Watson Speech To Text robot skill.

How to use the Watson Speech To Text robot skill

Get Version 11

Version 11 (2020-11-03)

compatibility with ARC's new version

Version 10 (2020-10-20)

Minor changes

Version 9 (2020-10-08)

I decided to break the plugin in multiple plugins to help troubleshooting, improve and fix bugs. This plugin will become Watson Speech to Text.

Documentation (WIP) You will need an IBM cloud account (Free Tier)

Watson Speech To Text: https://www.ibm.com/cloud/watson-speech-to-text

Dashboard:

Press "Create Resource" Search for Speech To Text resource:

Quote:
The Lite plan gets you started with 500 minutes per month at no cost.

after creating or managing the service's credentials:

you are ready to configure the plugin.

Plugin Configuration:

Press "change":

You will need to configure the model i.e. Language choose high rates i.e. 16000 for better quality capture:
You will need to select a VAD (Voice Activity Detection) engine, this engine handles the speech detection the plugin supports two different engines: Speech API available on Microsoft Windows and WebRTC engine.

Windows VAD engine configuration:

You will need to choose an available Microsoft Speech Recognizer. If your windows does not have one available you will need to use the WebRTC VAD engine:

Modes are: 0-Quality, 1-LowBitRate, 2-Aggressive, 3-Very Aggressive. Higher the value more aggressive the WebRTC engine will try to match the speech detection, but with an additional cost more false positives.

Speech Tab:

The plugin will be listening i.e. Capturing Audio when the Listening checkbox is checked)
The plugin will show audio visualization when the Visualize checkbox is checked To listen and visualize you don't need a Watson service account configured.

To use the Watson cloud services you will need to check the STT (Speech To Text) checkbox.

Green Areas are flagged with speech and orange areas are silence / noise.