Canada
Asked — Edited

Ibm Watson Services Plugin

The IBM Watson Services plugin created by @PTP currently allows you to perform Speech to Text and Text to Speech as well as visual recognition using Watson Services.

You can download and install the plugin here https://www.ez-robot.com/EZ-Builder/Plugins/view/251

Before you can use the plugin, you will need to apply for a free 30 day trial of IBM Speech to Text Account here https://www.ibm.com/watson/services/speech-to-text and a free 30 day trial of IBM Text to Speech https://www.ibm.com/watson/services/text-to-speech/

I will create some examples as it moves forward and try to answer any how to questions.

Thanks for creating this PTP, as with all your plugins this is an excellent piece of work and showcase of your talents.

User-inserted image


ARC Pro

Upgrade to ARC Pro

Discover the limitless potential of robot programming with Synthiam ARC Pro – where innovation and creativity meet seamlessly.

PRO
USA
#1  

I'm almost done with the Vision Recognition and Conversation Services.

They are simple to use, watson handles all the setup and plumbing for you.

I'm working on a demo but requires additional hardware and low level code so i'm busy ..

If anyone have ideas for a demo using a Six or JD please contribute.

#2  

Great work @ptp as always. I look forward to try it out.

I'm running a series of human-robot interaction experiments with a couple of students at my lab in about a month from now. We could see if we could integrate this module in one of the scenarios with the JD. If it works out we can add some performance metrics, which will give you an idea how the system performs in an interactional setting. If nothing else, I can give you a few videos of people interacting with the system. That might be a bit more interesting than a video. Let me know what you think:)

PRO
USA
#3  

@larschrjensen,

Thanks for the feedback, and yes more feedback is welcome and it helps to generate/fuel more ideas.

#4  

@ptp I've been playing around with the plugin and it works quite well. I've noticed that you can't set the silence threshold below 2000 ms. This effectively means that there is at least a 2+ second pause between an utterance and a response. This is quite long considering that people normally expect a response within 300 msec. Is it possible to change this? I realize this can possibly affect the perfomance and send off utterances prematurely, but it would make the plugin a bit more flexible.

PRO
USA
#5  

@larschrjensen,

Done, the minimum value is set to 500 ms.

The audio's capture buffer is 100 ms, the handler gets notifications every 100 ms.

Let me know if is ok, i can change to 300 ms.

I was in middle of something, i hope no harm done.

PRO
Canada
#7  

Just noticed visual recognition update. Nice. I had a quick play, looks great, I will need to learn how to train a model in Watson Studio.

Curious is there a way to get ControlCommand to wait until visual recognition is complete as I found I was reading the previous variables and not the new ones. I tried a sleep and that seemed to work but I was setting for 1000 and sometimes was not long enough. I played with setting $WatsonClassifyTypes[0] ="wait" and then waiting until != "wait" but kept getting stuck in a loop. I am not sure if there is a better way. I was triggering visual off speech to text "What do you see"

PRO
USA
#8  

@Nink,

1) You can add your script to the VR script, it will be executed when the visual recognition is complete.

User-inserted image


SayWait("I see")
$size=GetArraySize("$WatsonClassifyClasses")-1
Repeat($ix, 0, $size, 1)
  #Adjust the min score e.g. 0.60
  if ($WatsonClassifyScores[$ix]>=0.60)
    SayWait($WatsonClassifyClasses[$ix]) 
  EndIf 
EndRepeat

add a small VR script:


$VisualRecognitionCounter=$VisualRecognitionCounter+1

then monitor the variable in another script. The variable must be initialized in an init script.

3) There is an internal CaptureId variable. I'll expose the variable in the next update. And you will be able to monitor the $WatsonPictureCaptureId variable.

To avoid begin caught between two different calls, the VR script (1) is the best option. After the VR is done, the variables are assigned with the results and the VR script is executed. If there is another VR call the results are queued waiting for the VR script to finish. While your VR script is being executed the results (e.g. variables) are constant and related to the current call.