The IBM Watson Services plugin created by @PTP currently allows you to perform Speech to Text and Text to Speech as well as visual recognition using Watson Services.
You can download and install the plugin here https://www.ez-robot.com/EZ-Builder/Plugins/view/251
Before you can use the plugin, you will need to apply for a free 30 day trial of IBM Speech to Text Account here https://www.ibm.com/watson/services/speech-to-text and a free 30 day trial of IBM Text to Speech https://www.ibm.com/watson/services/text-to-speech/
I will create some examples as it moves forward and try to answer any how to questions.
Thanks for creating this PTP, as with all your plugins this is an excellent piece of work and showcase of your talents.
Asked
— Edited
They are simple to use, watson handles all the setup and plumbing for you.
I'm working on a demo but requires additional hardware and low level code so i'm busy ..
If anyone have ideas for a demo using a Six or JD please contribute.
I'm running a series of human-robot interaction experiments with a couple of students at my lab in about a month from now. We could see if we could integrate this module in one of the scenarios with the JD. If it works out we can add some performance metrics, which will give you an idea how the system performs in an interactional setting. If nothing else, I can give you a few videos of people interacting with the system. That might be a bit more interesting than a video. Let me know what you think
Thanks for the feedback, and yes more feedback is welcome and it helps to generate/fuel more ideas.
Done, the minimum value is set to 500 ms.
The audio's capture buffer is 100 ms, the handler gets notifications every 100 ms.
Let me know if is ok, i can change to 300 ms.
I was in middle of something, i hope no harm done.
Curious is there a way to get ControlCommand to wait until visual recognition is complete as I found I was reading the previous variables and not the new ones.
I tried a sleep and that seemed to work but I was setting for 1000 and sometimes was not long enough. I played with setting $WatsonClassifyTypes[0] ="wait" and then waiting until != "wait" but kept getting stuck in a loop. I am not sure if there is a better way. I was triggering visual off speech to text "What do you see"
1)
You can add your script to the VR script, it will be executed when the visual recognition is complete.
Code:
2)
add a small VR script:
Code:
then monitor the variable in another script. The variable must be initialized in an init script.
3)
There is an internal CaptureId variable.
I'll expose the variable in the next update. And you will be able to monitor the $WatsonPictureCaptureId variable.
To avoid begin caught between two different calls, the VR script (1) is the best option. After the VR is done, the variables are assigned with the results and the VR script is executed. If there is another VR call the results are queued waiting for the VR script to finish. While your VR script is being executed the results (e.g. variables) are constant and related to the current call.
Any thoughts?
I fixed a memory leak. Let me know if the new version fixes the problem.
Please send me a email, I wish to discuss something with you.
My addy is in my profile.
Thanks
I took a picture of playing card (Seven of Clubs)
Then i performed a few quick tests:
IBM Watson Visual Recognition Services:
Microsoft Computer Vision API :
Google:
It's obvious these results don't help.
I've used the EZ-Robot camera V2, it's important to use the same camera for training and recognition.
Using the ARC camera snapshot control, i took 10-15 pictures of each card (7,8,9,Jack,King) of Clubs.
It's important to test different light conditions, angles, orientations etc.
I created a zip file one for each card image folder.
then i run the following script:
Code:
This will upload the images files and will create a Custom classifier name: PlayingCards with 5 classes (SevenOfClubs ... KingOfClubs)
The training will start after the upload, can take a few minutes, depends on the number of classes, pictures, picture size etc.
Code:
is training:
running again the same code:
it's ready to be used.
The following code uses the default (watson classifiers)
Code:
The following code uses the above custom classifier.
Code:
Note: PlayingCards_978301467 is the classifierId, PlayingCards is the classifier name.
The following code uses the default and the above custom classifier.
Code:
Note: you can pass multiple classifier ids
Testing the initial picture (Seven of Clubs) with the custom classifier:
SevenOfClubs class has the highest score.
The free service i.e. Lite, only allows one custom classifier e.g. "PlayingCards", and does not allow updates to the existing classifier.
So the only solution, is to delete the existent classifier, and create again with more classes or create another one for another purpose, or if you feel confident move to the standard service.
bear in mind, a good training requires pictures, more than merrie.
I did a few tests (with 10 pictures per cards) an eight of clubs can get higher score in a specific angle/light.... the solution is to train more pictures.
Taking the pictures, takes time, it's very important the small details: playing with the card, putting something below to create a distorted image, change the distance/orientation from the camera, play with the light etc, maybe introduce some minors errors (fingers while holding ?).
Although the Lite model does not allow incremental updates, i plan to use my assistants e.g. Kids, to take more pictures and then I'll try to create the classifier with all the zip files (big upload).
Now the next question is, How do I get it to recognize multiple cards at once? Bounding boxes?
we need an offline solution to detect objects, and then apply VR for each object.
There are a few ideas using opencv and tensorflow.
tensorflow with Watson:
https://medium.com/ibm-watson/dont-miss-your-target-object-detection-with-tensorflow-and-watson-488e24226ef3
Tensorflow on linux is well supported you will find a lot of examples with linux and python, some of them on a Raspberry PI. Windows is a different story.
I'm exploring the windows implementation, but, so far it's not possible to built a ARC skill plugin, most libraries are x64 and ARC is a x86 (32) bits application.
The solution is to built a 64 bits application to handle the tensorflow logic and create a communication layer between the application and the ARC skill plugin.
I photographed all the cards if you want them but some need a lot more photos (or I have too many bad quality photos) as I am getting a lot of false positives. I wish the free version of Watson VR would allow us to retrain when we have an issue (No it is not an ace of clubs it is an ace of spades).
I can send anyone who wants a physical deck that I used, I Purchased them DollarRama (Canadian Dollar Store) They are Victoria brand.
Here is my edit of ptp script so others don't need to re type. Note my pictures location is different so you will have to change path but a search and replace in your fav editor is easier than retyping and debugging (Get a zip file name wrong and it will not work).
Code:
It can take anywhere from 2 to 20 seconds to come back with an answer now when you show a card.
Now confident I was ready, I pulled out JD and held up a card in front of him. Not a clue (about 5% accuracy).
This tells me my training method is bad or we need a very controlled environment.
I think we could probably do a controlled read if we used a static base with controlled lighting, Example a Game of Blackjack (21) between JD and Six and me as the dealer, with JD and Six sitting in specific positions and cards placed on playing board in predetermined locations.
The other alternative is I create a method that is automated for reading cards and taking photo's. Perhaps it is something like hand JD a Card and then have a script run so he can then turn around in a circle and change the angle of the card taking pictures so the lighting, angle and background keep changing. Then when he is finished (Say 100 photos) upload the photos he took and ask for another card.
Thinking ....
Edit - Turns Out JD dexterity is not good enough to read a card he is holding (Can't get good enough angle / distance) so it will require a custom bot to read cards (More Servo's and perhaps a 360 Servo). Also it appear not just the same model camera is good enough, it has to be the exact same camera. If I take photo's with JD and try to read them on EZ-B + Camera doesn't work and vise versa.
I'm still working on it and researching other options. I've compiled tensorflow from source code, took me 3 failed attempts, successful compilation took 4 hours (core I7) and almost 6 GB disk space, to obtain a 64 bits c library (dll and lib).
I can't find that brand, maybe you can suggest one available both in amazon.ca and amazon.com. That way we could share pictures.
I took pictures too... i wanted to test two different resolutions: 320x240 and 640x480, unfortunately the high resolution pictures are low resolution, the snapshot control ignored the camera control setting.
I'll add a picture/snapshot functionality to the watson plugin.
Can you share a zip of one your cards ?
NineofHearts.zip
JD a record player, an iPad playing a Netflix movie.
Yes, the wrapper works on top tensorflow.dll (libtensorflow.dll) available from the tensorflow project.
https://github.com/tensorflow/tensorflow/issues/10817
sources:
https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-windows-x86_64-1.2.0.zip
http://ci.tensorflow.org/view/Nightly/job/nightly-libtensorflow-windows/lastSuccessfulBuild/artifact/lib_package/libtensorflow-cpu-windows-x86_64.zip
the tensorflow.dll is 64 bits.
Can you explain your plan B
Regarding the playing cards, we don't need to exchange publicly picture files, let's find a common deck available on both amazons.
https://www.amazon.ca/gp/offer-listing/B01A61ZQI8/ref=sr_1_1_olp?ie=UTF8&qid=1522329419&sr=8-1&keywords=victoria+playing+cards&condition=new
Plan B was to just take 100 photos of each card (1 photo every second) in camera app and have background and angle changing using Ipad and record player then move robot head up and down as I do it while creating shadows and changing lighting. Logic was I could get through this in about 2 hours.
I have been thinking about a good use case for the ability to read cards. Obviously a QR code can be read easily so this makes me wonder if this is a good application for vision recognition, but at least we are learning and will make a good demo.
Colour Cluster.
Red cards positive and Black cards negative
Is it a red card or black card. this should be a relatively easy problem to solve. Assume red
Red Suit Cluster
Hearts positive and diamonds negative.
Again relatively easy problem to solve. Assume hearts.
Hearts Suit Cluster
Picture positive Numeric Negative. Assume Picture.
Hearts Picture Cluster.
Is it jack queen king or ace of hearts.
If we could do this and then somehow feed that back into a machine learning I am wondering if this would be very accurate and quick to process?
https://www.youtube.com/watch?v=_zZe27JYi8Y
and return the objects' names and their x,y coordinates?
Thomas
subjects:
1) playing cards
I'll get this playing cards:
https://www.amazon.com/Bicycle-Playing-Card-Deck-2-Pack/dp/B010F6BAES
They are available in the amazon US,UK,CA. It will be more easy to share (*not public*) the training pictures.
2) plugin roadmap
soon i plan to update the plugin to support training with negative samples, and the assistant (formerly conversation) API, although there are some gaps to fill.
3) offline object detection
This will take a little longer, still researching, most implementations e.g. libraries are 64 bits and require some resources (memory & cpu and when possible a gpu), some plumbing is needed between ARC/plugins and an external application.
@Thomas:
Watson:
I presume you managed to find and setup the watson visual recognition.
Machine Learning:
The main problem is the Hype regarding machine learning... we are far from machine learning, and AI, and the robots seeing and taking the world ...
What we have is massive data dumped to massive computational hardware to obtain some clues...
Tensorflow:
With a core I7 without a nvidia GPU and a custom compilation for windows x64 and AVX2 extension enabled (not available in all cpus https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) the quickest model (less accuracy) took 3.5 seconds and a more accurate model took 45 seconds to classify/detect objects in a single photo.
Unless i'm doing something wrong, less time requires a decent GPU and or a different model.
I'll post in another thread more details.
Found small problem In the Car example under dialogue there was a dialogue "anything else" true and it had the statement I am not sure about that you can "Turn on my lights" with quotation marks that caused script to crash so I just deleted that dialogue response as it was causing errors. All good.
Fixed. EZ-Script does not like quotes, quotes are replaced with ( ' ).
One simple example:
Init script (Run before the Watson Assistant)
Code:
Assistant Script:
Code:
I called him Jarvis
Listen what happens when you try to switch on the radio when the radio is already on
Code:
1522815312 is a previous CaptureId.
I like the new previous response test variable as I get lost in where we are in the conversation. Are we in the middle of a conversation or starting a new one?
Just like the ones that cheesy real estate people where for their phone. Connects to your pc
PTP did some amaizing work on allowing you to configure DB levels and delays with the voice capture plugin he wrote so this makes it really easy to tune to the appropriate responses.
Which service / script is not assigning the variables ?
Eventually I restarted my windows VM and problem went away. 8GB assigned to Windows only ez-robot running in VM (MAC Parralels) I will monitor today. Maybe I will try running native Win 10 and ez-robot on a Bare metal PC.
call from speech to text plugin before speech to text
Code:
after speech to text
Code:
Put in Watson Assist Script in the plugin.
Code:
import as a robotest.json file to watson assist
Code:
Anyone tested the Kinect Microphone Array ?
It can't be added to a small robot... and soon later they will stop selling...
I use a PS3 eye on my desktop and works really well with voice recognition (great mic) although it would be good to get a kinect working especially if we get a unity plugin.
I'm curious how good is the Kinect in a noisy/open environment.
@Nink:
I have: PS3Eye, Asus Xtion, Kinect 1, Kinect 2.
All of them have microphone arrays, and kinect has a sound localization api.
One can use to turn the robot head towards the sound/person.
Does the PS3Eye work for long distances and/or environment noise ?
I'm evaluating a few microphone arrays, and one of the cheapest solutions is to use the PS3Eye with a Raspberry PI Zero and forward the sound to the PC (wifi microphone)
Post #19
https://www.ez-robot.com/Community/Forum/Thread?threadId=10781&page=2
This doesn't solve depth sense issue though.
I am happy to go for a 2 for 1 on a depth sensor (I will buy 2 and send you one) if you want to work on something. I have been waiting for EZ-Robot to provide a LIDAR to do a SLAM. This in conjunction with the unity work going on would be exciting. Maybe we should just look at getting a couple of https://click.intel.com/realsense.html for now.
Off topic or on topic (Not sure any more) I have my latest photos of bicycle cards Ace to 6. I can send you a link to cards off line as you own a deck, but results still are not good. I think I need to string together in multiple AI searches but time delay is an issue and Watson Visual recognition does not seem to support a VR pipeline or linking requests so I have to call multiple VR requests from a single EZB Script based on previous VR outcome ,so takes A LONG TIME. First find Suit (Hearts, Diamonds, Clubs, Spades)=> when suit derived find Picture or number (check if numbers or picture cards)=> then actual card (Number in suit) and it still gets it wrong. Maybe I work on weekend if I have time.
http://lisajamhoury.com/portfolio/kinectron/
https://kinectron.github.io/docs/intro.html
https://github.com/kinectron/kinectron
https://www.seeedstudio.com/ReSpeaker-4-Mic-Array-for-Raspberry-Pi-p-2941.html
"this 4-Mics version provides a super cool LED ring, which contains 12 APA102 programmable LEDs. With that 4 microphones and the LED ring, Raspberry Pi would have ability to do VAD(Voice Activity Detection), estimate DOA(Direction of Arrival) and show the direction via LED ring, just like Amazon Echo or Google Home"
Thank you for your work on the IBM Watson Plugin.
I tried to play with it a little bit and it seems that IBM Watson is now only providing an API Key and URL for the new accounts. So no more credentials with username/password.
https://console.bluemix.net/docs/services/watson/getting-started-tokens.html#tokens-for-authentication :
This link is also mentionning the migration : https://www.ibm.com/watson/developercloud/text-to-speech/api/v1/curl.html?curl#introduction
I could only try with the visual recognition and I have the following error :
IBM.WatsonDeveloperCloud.Http.Exceptions.ServiceResponseException : The API query failed with status code Unauthorized :....
Not sure if it's my bad, or some big change were made on the Watson .NET SDK :
I'm investigating... my services are working.
Do you know how to convert an existing service (user/password) to the new authentication method ?
I can't find an upgrade option.
@ptp I'm also working on a personal plugin that include the Watson .NET SDK, using the 2.11.0 and I'm having some difficulties with the dependencies of system.net.http...
Did you encountered some similar problems ?
Although London location is not available, the new service is located in Washington DC.
Is obsolete
The Watson SDK supports both the old and new method, use the new method.
SDK TTS sample is working with the new TokenOptions. Does not work for you ?
I'll update the plugin to support both user/password and the api key/token method.
Tokens seems more secure in advanced scenarios where a proxy service creates and delivers time limit tokens, if an attacker gets the token is not a major issue: has a time limit and is not the service's API key.
What version of the SDK and dependencies are you using ?
I opened an issue here with more details : https://github.com/watson-developer-cloud/dotnet-standard-sdk/issues/307
I tested with a .net core console project.
I'll check with a .NET project.
I found the issue and is working with a windows.forms .net 4.6.1.
I'll redo it again from zero to confirm the fix and I'll update the Watson's open ticket.
I've updated your ticket.
I'm starting to understand what's going on.
The TextToSpeech 2.11.0 isn't working with other versions of System.Net.Http than the 4.0.0.
When I run your code, I have to edit the AppConfig and remove the comment from the dependentAssembly of System.net.http otherwise it's loading the 4.1.0.
If I do that, I can run your code without any problem and it's working great.
But as I want to integrate it to ARC, I'm changing the project to a Class Library one, and not Window one. I add the debugger option, compiler, plugin.xml, path, etc.
I can load the plugin on ARC, however, it's loading the System.Net.http 4.1.0 even if the AppConfig is correctly setup. That means, ARC is using System.Net.Http and overwritting the use of the plugin one (4.0.0) ?