Thumbnail

Darknet YOLO (Obj Detection)

by Avatarptp

You only look once (YOLO) is a state-of-the-art, real-time object detection system. using Tiny YOLOv3 a very small model as well for constrained environments (CPU Only, NO GPU)

How to add the Darknet YOLO (Obj Detection) robot skill

  1. Load the most recent release of ARC (Get ARC).
  2. Press the Project tab from the top menu bar in ARC.
  3. Press Add Robot Skill from the button ribbon bar in ARC.
  4. Choose the Camera category tab.
  5. Press the Darknet YOLO (Obj Detection) icon to add the robot skill to your project.

Don't have a robot yet?

Follow the Getting Started Guide to build a robot and use the Darknet YOLO (Obj Detection) robot skill.

How to use the Darknet YOLO (Obj Detection) robot skill

You only look once (YOLO) is a state-of-the-art, real-time object detection system. using Tiny YOLOv3 a very small model as well for constrained environments (CPU Only, NO GPU)

Darket YOLO website: https://pjreddie.com/darknet/yolo/

Requirements: You only need a camera control, the detection is done offline (no cloud services).

User-inserted image

User-inserted image

  1. start the camera.
  2. check the Running (check box)

The detection will run continuously when the detection results change an On Changes script is executed (check the configuration area): User-inserted image

  1. Press config
  2. Edit the on changes script
  3. on changes Javascript script

you can run the detection on demand, javascript:

controlCommand("Darknet YOLO", "Run");

The above command runs the configured on demand script.

An example of script:

var numberOfRegions=getVar('$YOLONumberOfRegions');
if (numberOfRegions==0)
{
   Audio.sayWait('No regions found');
}
else
{
   Audio.sayWait('Found ' + numberOfRegions + ' regions');
        var classes = getVar('$YOLOClasses');
        var scores = getVar('$YOLOScores');
        for(var ix=0; ix        {
           Audio.sayWait('Found ' + classes[ix] + ' with score: ' + (classes[ix]*100) + '%');
   }
}

ARC Pro

Upgrade to ARC Pro

Unleash your robot's full potential with the cutting-edge features and intuitive programming offered by Synthiam ARC Pro.

PRO
USA
#1   — Edited

There is a file size limit for plugin uploads. There is a missing file required to operate the plugin.

Please download the following file:

https://pjreddie.com/media/files/yolov3-tiny.weights

And copy to the plugin folder:

C:\ProgramData\ARC\Plugins\d1db5da7-8805-41eb-8a65-a548e2fe60f6

Expected plugin folder content: User-inserted image

PRO
USA
#2  

hi ptp,

Intriguing item and website.

I have a https://pixycam.com/

EzAng

PRO
Synthiam
#3  

What's the file size of your entire plugin package? including the .weights file?

And - this is amazing

PRO
USA
#4  

DJ: 35.5 Mb User-inserted image Thanks DJ!

PRO
Synthiam
#5  

Okay great - I'll have amin update the file size on the website for ya

PRO
USA
#6   — Edited

Pedro, this is awsome. Love to see a video of the plug in running. How does it compare in speed to the GPU version shown on their site (Pascal Titan X )?

PRO
USA
#7  

@Ezang:

Quote:

Intriguing item and website.
indeed a strange/dark name "Darknet" for an AI framework, there are also nightmares: https://pjreddie.com/darknet/nightmare/ it's analogue to google deepdream project, so they played with the words dream... nightmare.

Quote:

I have a https://pixycam.com/
The PixyCam is useful to pair with underpowered microcontroller e.g. arduino 8 bits. If you are using ARC the camera control has many more features plus you have additional CPU power.

@Fxrtst:

Quote:

Love to see a video of the plug in running.
I will do it soon, the plugin requires additional TLC:)

Quote:

How does it compare in speed to the GPU version shown on their site (Pascal Titan X )
Unfortunately not great, there is a reason why NVIDIA GPUs cost a few $$$$

it's important to explain what the plugin does.

Frameworks: Darknet is an open source neural network framework written in C and CUDA similar to TensorFlow, Caffe although is mainly used for Object Detection, and have a different architecture. The frameworks have both the training and inference processes. You can run both without CUDA (CPU only) but be prepared for a huge difference.

Datasets: To train a neural framework you need input (data) e.g. images, sounds, etc. This plugin ships with the following dataset: https://cocodataset.org/ the biggest publicly available. Each dataset requires additional metadata: labels, categories and image optimization (resize image filters) so is not an easy task to create one.  Each dataset contains specific categories e.g. people, birds, dogs the COCO has 90 categories, usually other datasets have less than 30.

Model: So the model is the output of a dataset training. The models are not interchangeable between the frameworks. You ll find COCO model for Tensorflow, YOLO etc. Training takes time, huge time if you don't have the GPU power, although the models are framework specific you can convert between them (there are some issues and requirements and sometimes additional scripting) for Yolo there is a tool called  DarkFlow (everything is dark :)).

So the Yolo detection + coco model (245Mb) takes almost 50 seconds (first time) to detect an image on a Intel NUC i7 (my machine) without a CUDA card with 8 GB you can't expect to have FPS only FPM (frames per minute).

I plan to test the AtomicPI (similar to LattePanda) what can you expect with atom processor ? We will see, everyone agrees the game changer is the GPU and only NVDIA has the stuff!

To alleviate the frustration the yolo guys trained the model for a tiny version plus a different configuration/parameters, so with a tiny version you can get some FPS, but, once again the GPU blows the CPU performance.

Tensorflow also released a different engine (Tensorflow Lite) plus TF lite models, that allows you to run lite models in micro-controllers, embedded computers (PI), mobile phones and regular CPUs.

To summarize: The plugin ships with the tiny COCO model (35Mb). Later I'll add the possibility to download the full model (250GB) more accurate but very slow. The plugin was built without CUDA support, so does not matter if you have a CUDA GPU.

Let's hope it can be useful running on a Latte panda.

PRO
USA
#8  

I found a bug: while running in the continuous mode, I don't stop the detection process, so while the On Changes script is running (Text To Speech say results9, the detection is queueing results.

PRO
USA
#9   — Edited

Debug:

Darknet - offline

09:32:39.250>Info>>Cleared

states numerously audibly: no Regions found:

09:32:39.276>Debug>>Detection Took:71 seconds

Regions found: 0

then:

09:32:52.701>Debug>>Detection Took:72 seconds

Regions found: 1

..class=[person] confidence=[0.5682923] X=[106] Y=[77]

says "a person" me, a cup,

a banana not recognize

works a little  :-(

EzAng

PRO
Canada
#10  

@ptp upload file size is now increased so you should be fine :)

PRO
USA
#11   — Edited

ok, I will try it -  did not get an update yet

EzAng

PRO
USA
#12   — Edited

@ptp Got it. I won't be installing my main computer into any robot soon.....as it seems the more power the better with the full YOLO version. I have a beast desktop with 32 cores 128 mb ram and I was one of the lucky ones to snag a Nvidia 3090Ti with 24 mg vram  and 10,496 cuda cores.  But as I said wont be putting it into a robot any time soon!

I like the idea of getting your plug in to maybe work with the Latte Panda. I have the ver 1 of the Panda installed in the bartender robot and its extremely impressive. This plug in will be a great addition to vision systems on robots.

Nice job as always!!!!

PRO
USA
#13   — Edited

Hello again, back,

Quote:

@ptp upload file size is now increased so you should be fine
Is your version 2 up to date, or is an update coming?

works ok

EzAng

#14   — Edited

OMG.... @fxrtst What are you up to??? :D

This is a great plugin, I never thought that YOLO will make it into the heavenly realm of plugins!! Great work @ptp

Germany
#15   — Edited

your YOLO porting is nice and so perfect for my next Robot Gen. but the bug is annoying, because the voice repeat non stop what the cam saw.. I can only stop the voice if I close the ez-robot software.

PRO
USA
#16  

working on a fix / update.

PRO
USA
#17  

@amin: Thanks, 35Mb file uploaded with success.

@Smarty: Fixed.

@All: Model file is now is included with the plugin.

PRO
Canada
#18  

wow  PTP you are amazing. This is excellent. FIVE STARS

PRO
USA
#19  

Your program works well,

I have been using DJ's Train Vision Object By Script =  works well also

thanks for all your work

EzAng

Germany
#20   — Edited

@ptp

now I have another problem. If I use "run detection only" all is fine. a perfect live video. If I unmark "run detection only" the video takes 15 seconds to show another frame. no smoothly live video. YOLOv2 was both together: a perfect frame video and voice playback (with the non stop playback problem). YOLOv3 now has a frame problem. :D

PRO
USA
#21  

@Smarty, New update with minor optimizations.

Regarding the delay: before (v2) during the script execution the detection results were queued, and that was the cause of the bug i.e. after you stop the detection the queue was still begin processed.

To solve the bug I stop the execution while the onchanges script is being executed. I presume the 15 seconds must be delay processing the script

Can you add the javascript code:

Quote:

//do nothing
to OnChanges script.

And try to see if the delay is relevant ?

If you are using EZ-Script I recommend changing to Javascript, EZ-Script is very slow.

Post your EZ-Script if you need help converting to Javascript.

PRO
USA
#22   — Edited

YOLO is the acronym of the phrase "you only live once"  lol

thanks again for the app, control

EzAng.

#23   — Edited

In this case it actually means "You Only Look Once" of course referring to the urban slang...but it is describing the way the algorithm is working. Kinda a cool Tagline for a sopisticated mathematical operation!! :D

PRO
USA
#24  

How are you Mickey?

What are you up to?

EzAng

#25   — Edited

I have a few questions for PTP.

  1. Without a GPU how many frames per second/ per minute are people getting with this arrangement?
  2. I have a GPU on my laptop, but you said that this code won't utilize it. Is that right?
  3. What about the Latte Panda, how can such a tiny machine run this CPU/GPU intensive code?
  4. How do they get object classification to run so fast on products like HuskeyLens? I get about 1 or 2 FPS on that device.
PRO
USA
#26  

NUC Core I7  4 fps with a dummy javascript script i.e. comments only

User-inserted image

2) Correct. This version does not use GPU, also is using the tiny model, less accurate but lighter.

3) No tests yet. I've used Atomic PI (similar to Latte panda entry model) but is running ROS, I got a new one and I plan to install Windows, ARC and the plugin, I can guess the performance will be worst.

4) I'll address in another post.

PRO
USA
#27  

Quote:

4. How do they get object classification to run so fast on products like HuskeyLens? I get about 1 or 2 FPS on that device.
HuskeyLens uses a SoC (System on chip) Kendryte K210 and the video capture is handled/processed directly on the pcb. You can read more about the chip here: https://www.seeedstudio.com/blog/2019/09/12/get-started-with-k210-hardware-and-programming-environment/ https://hackaday.com/2019/11/04/how-smart-are-ai-chips-really/

User-inserted image so basically is a dual risc with a KPU:

Quote:

KPU is a general-purpose neural network processor with built-in convolution, batch normalization, activation, and pooling operations. It can detect faces or objects in real time
The K210 is not new (2019) is from a Chinese manufacturer, it's a good choice for IOT scenarios i.e. (no PC) , power and budget constrains. I don't like DFRobot approach they  mentioned and advertised as an open source product but later they changed to "to be open source later".

So if you are designing solutions for IOT and pairing with other micro-controllers it's good choice, everything is glue together (camera on board), serial communication, product support etc.

PRO
USA
#28  

Regarding Robots in general If you plan to have an embedded computer, operating system, additional software e.g. ARC or ROS you will need additional hardware: GPU or a TPU.

If you are building real robots or creating products:

  1. you don't pick a Windows Desktop for a Robot.
  2. For an embedded computer you pick ARM and not an Intel architecture. Intel arch is too much complicated. Only works if you have a good design (more money) to accommodate battery power, heat dissipation, space and additional hardware e.g. GPUS.

That is of my opinion, also is the reason why ROS works very well on Linux. If you need to develop a windows driver, or something low level is a pain in neck you need to deal with all the safe protections e.g. drivers certificates and closed APIs. Also a good portion of CPU is used for user interface and other user features not relevant for a Robot.

ARC/Builder's user base expects an aasy (EZ) software to run the robots with a friendly operating system i.e. Windows plus a friendly off the shelf controller (EZ-Robot) with no extra soldering or extra changes.

There was a Raspberry PI version now is gone, similar scenario for EZ-Robot controller replacement i.e. Arduino Firmwares, you can do a lot of new stuff... but requires coding not an easy task for most forum users so most people will wait for Synthiam to add the required features.

Regarding TPUs there are some low cost solutions running on Linux, and some are ported (being fixed) to run on windows.

Hopefully they will become more Windows friendly and combined with Lattepandas / Upboards will become a solution for Windows + ARC users.

#30   — Edited

ptp,

I Downloaded and installed your skill and it works PERFECTLY!!! In the past I installed YOLO/Darknet and it took me 2 weeks to get it right. This time it took less than two minutes! I get 8 FPS.

I made some changes to your javascript code that works better for me. I wanted to just announce the object/objects that it sees.

Again thank you for all your hard work in getting this skill up and going!!!!!

--Thomas

var numberOfRegions=getVar('$YOLONumberOfRegions');
            if (numberOfRegions==0)
            {
            }
            else
            {
                var classes = getVar('$YOLOClasses');
                for(var ix=0; ix                {
                  Audio.sayWait('I see a ' + classes[ix]);
                }
            }
#31  

PS. Do you have a list of all of the objects that it can detect?

#32  

PPS. FEATURE REQUEST: Can you give us the coordinates of the bounding boxes, or at least the center of the bounding boxes?

PRO
USA
#34  

Quote:

PPS. FEATURE REQUEST: Can you give us the coordinates of the bounding boxes, or at least the center of the bounding boxes?
Yes. It's possible, I'll add in the next update.

PRO
USA
#36   — Edited

I like the the code with a few things off, I highlighted it

var numberOfRegions=getVar('$YOLONumberOfRegions'); if (numberOfRegions==0) { // Audio.sayWait('No regions found'); } else { // Audio.sayWait('Found ' + numberOfRegions + 'regions'); //Audio.sayWait('Found '); var classes = getVar('$YOLOClasses'); for(var ix=0; ix { //Audio.sayWait("I see a " + (classes[ix]));  or Audio.sayWait(classes[ix]); }

        }

So the audio only comes out with the item it detects

EzAng

PRO
Synthiam
#37  

I get this error during detection


17:46:40.200>Debug>>Id=[9c45498b-14a1-4044-81e3-a8feedd06984] Version=[2020.10.13.1]
17:46:40.338>Debug>>Native Id=[342AD9BC-8A21-4908-8048-F5575614F95F] Version=[2020.9.9.1]
17:46:40.344>Info>>Process new configuration isSetConfiguration=[True]
17:46:59.476>Debug>>Detection Took:1264 seconds
# Regions found: 1
..class=[tvmonitor] confidence=[0.9448678] X=[84] Y=[108] 

17:46:59.519>Error>> Exception=[
System.Runtime.InteropServices.ExternalException (0x80004005): A generic error occurred in GDI+.
   at System.Drawing.Image.Save(String filename, ImageCodecInfo encoder, EncoderParameters encoderParams)
   at System.Drawing.Image.Save(String filename, ImageFormat format)
   at YOLO.Plugin.MainForm.RunDetector(Bitmap bitmap)]
17:47:05.510>Debug>>Finished script=[YOLO.OnChanges] took=[00:00:06.0018869]
17:47:28.520>Info>>Process new configuration isSetConfiguration=[False]
17:47:30.346>Debug>>Detection Took:336 seconds
# Regions found: 1
..class=[tvmonitor] confidence=[0.9381291] X=[85] Y=[108] 

17:47:30.362>Error>> Exception=[
System.Runtime.InteropServices.ExternalException (0x80004005): A generic error occurred in GDI+.
   at System.Drawing.Image.Save(String filename, ImageCodecInfo encoder, EncoderParameters encoderParams)
   at System.Drawing.Image.Save(String filename, ImageFormat format)
   at YOLO.Plugin.MainForm.RunDetector(Bitmap bitmap)]

I'm guessing the detection is done in a new task (new thread)? If so, you'll have to make a copy of the bitmap if it's not being manipulated in the OnNewFrame event. Working with any camera image has to be done either in the new frame event, or a copy of itself needs to be made to work in another thread.

PRO
USA
#38  

It works here, pretty well

EzAng

PRO
USA
#39  

@DJ: User-inserted image :

var cameraBitmap = new Bitmap(bitmap);
this.RunDetection(cameraBitmap);

Maybe is not enough to copy the bitmap ? Do you recommend another method to copy the bitmap ?

PRO
Synthiam
#40   — Edited

That won't copy the bitmap - it'll create a new object wrapped around the memory of the bitmap. The bitmap memory actually never changes in ARC. The memory is allocated when the camera starts and is re-used for every frame. A new Bitmap(bitmap) will create a wrapper around the memory that's being used.

An old version of ARC (when ARC days) used to create a new bitmap and dispose it for every frame. But that was super expensive on garbage collection. With the new method, the memory is reused - and that's also why a camera image can exist on many skill controls without a ton of cpu being used or memory. It's because every instance of that bitmap is actually referencing the same memory location and only the screen needs to be refreshed when the memory updates :)

So the solution for yours is actually quite easy - i would recommend taking the Camera's bitmap and "draw" it to a bitmap for your own detection thread. That way, you can keep your own bitmap or dispose of it how you wish etc etc and it lives for long as you want it to.

Something like...



Bitmap _myBitmap;
FormCameraDevice _camera;

void main() {

  var tmpCamera = get the current camera instance control
  tmpCamera.Camera.OnStart += Camera_OnStart;
  tmpCamera.Camera.OnStop += Camera_OnStop;
  _camera = tmpCamera;
}

    private void Camera_OnStart() {

  _myBitmap = new Bitmap(_cameraDevice.Camera.CaptureWidth, _cameraDevice.Camera.CaptureHeight, _cameraDevice.Camera.GetPixelFormat);
}

    private void Camera_OnStop() {

  if (_myBitmap != null) {

    _myBitmap.Dispose();
    _myBitmap = null;
  }
}

void Camera_OnNewFrame() {

  using (Graphics g = Graphics.FromImage(_myBitmap))
    g.DrawImageUnscaled(_cameraDevice.Camera.GetCurrentBitmapManaged, 0, 0);

 // now do what you wish with myBitmap   
}

OR fastest way is....... you can use memcpy and get an instance of your Bitmap's BitmapData - and memcpy the Camera.GetCurrentBitmapUnmanaged.Data to your bitmap's Data0

PRO
USA
#41   — Edited

DJ: quick search: https://docs.microsoft.com/en-us/dotnet/api/system.drawing.bitmap.-ctor and I'm guessing is a shallow copy not a deep copy, so the "shell" object is different but the byte buffer is the same. So soon or later becomes an issue. I'll change the code, looking for elegant e.g. (less boiler plate code) to generate a deep clone.

EDITED *** I did not see the previous post **  Thanks!

PRO
Synthiam
#42  

Here - you might find these handy... They exist in EZ_B.Camera


/// Copy the data from a managed bitmap to an unmanaged image. This does not create the destination image.
/// This checks the dimensions and pixel format are the same
/// Your dst image must have already been created with the same dimemsions and pixel format as the src
/// This is a whole memory copy, which is a duplicate. The images will not share the same memory space
public static void CopyManagedBitmapToUnmanaged(Bitmap src, AForge.Imaging.UnmanagedImage dst) 

/// Copy the data from a managed bitmap to an unmanaged image. This does not create the destination image.
/// This does NOT check the dimensions and pixel format are the same (for performance)
/// Your dst image must have already been created with the same dimemsions and pixel format as the src
/// This is a whole memory copy, which is a duplicate. The images will not share the same memory space
public static void CopyManagedBitmapToUnmanagedUnsafe(Bitmap src, AForge.Imaging.UnmanagedImage dst) 

/// Copy the data from an unmanaged image to a bitmap. This does not create a bitmap.
/// This checks the dimensions and pixel format are the same
/// Your dst bitmap must have already been created with the same dimemsions and pixel format as the src
/// This is a whole memory copy, which is a duplicate. The images will not share the same memory space
public static void CopyUnmanagedImageToBitmap(AForge.Imaging.UnmanagedImage src, Bitmap dst) 

/// Copy the data from an unmanaged image to a bitmap. This does not create a bitmap.
/// This does NOT check if your dimensions and pixel format is the same (for performance)
/// Your dst bitmap must have already been created with the same dimemsions and pixel format as the src
/// This is a whole memory copy, which is a duplicate. The images will not share the same memory space
public static void CopyUnmanagedImageToBitmapUnsafe(AForge.Imaging.UnmanagedImage src, Bitmap dst)

/// Copies the memory of one bitmap to another.
/// This checks if the dimensions and pixel type are the same
/// The source and destination bitmaps must have the same size and pixel format, otherwise this won't work and may produce unmanaged code errors
/// This is a whole memory copy, which is a duplicate. The images will not share the same memory space
public static void CopyBitmapMemory(Bitmap bmpFrom, Bitmap bmpTo) 

/// Copies the memory of one bitmap to another.
/// This does NOT check if the dimensions and pixel type are the same (for performance)
/// The source and destination bitmaps must have the same size and pixel format, otherwise this won't work and may produce unmanaged code errors
/// This is a whole memory copy, which is a duplicate. The images will not share the same memory space
public static void CopyBitmapMemoryUnsafe(Bitmap bmpFrom, Bitmap bmpTo) 

#43   — Edited

Very nice work ptp.  I am interested to know what tools/languages you used to build this, if you have the time that is. I love Yolo...seems to work pretty well in near dark lighting conditions too.  I've been using the 80 class version.  My favorite is when it recognizes my cats, plants, phones, and tvs.  I don't know why but it continues to amuse me, I think because I know I would never be able to do it without a NN.  It feels like magic.  I keep hoping someone in the industry will build some more Yolo-like models with a lot more classes.  It seems like I read about a Yolo9000 but was never was able to find anything I could use.  If anyone finds a model with a lot more classes, I'd love to hear about it.  I haven't tried v4 or v5 yet...I don't think anyone has published one beyond v3 that will deploy on a Movid.

PRO
Synthiam
#44  

Ptp did a great job - works well! If you’re interested in having a larger trained dataset, there’s a global version here: https://synthiam.com/Support/Skills/Camera/Cognitive-Vision?id=16211

it uses a worldwide database of trained stuff. And you get a description of the scene that’s neat. You can feed that into nlp for topics of the surroundings.

#45  

Hey PTP this looks amazing,  will give it a try, hopefully works well when people detected coming through main door and use enhanced script to have some fun with intruders!

PRO
USA
#46  

Sorry the delay, I've been underwater with work.

@Guys: Thanks for the good words

@Martin:

Quote:

I am interested to know what tools/languages you used to build this
Short answer: Visual Studio Community/Enterprise 2019

ARC plugins are .NET The main plugin dll is a visual studio c#  .NET class project and I've two other additional projects in c++. ARC is a 32 bits application so when you combine low level code (c++) or external native libraries and .NET you need to take that in consideration. Sometimes you need to compile from source, and fix or tweak the open source code to use msft building environment.

If you need more details my email is in the profile.

PRO
USA
#47   — Edited

@DJ:

Quote:

If you’re interested in having a larger trained dataset, there’s a global version here: https://synthiam.com/Support/Skills/Camera/Cognitive-Vision?id=16211
When I start playing with object detection, I wanted something to monitor a live video feed and trigger actions based on objects. All the cloud APIs have limits so is not feasible to use  the online services.

I presume your skill has a limit cap ?

Q) What is the limit (number of request) per ARC account ?

Also the model is a Tiny version optimized for CPU, so the accuracy is lower than the full models (Nvidia GPUs).

The biggest challenge is to find optimized models for our needs, for example I'm using this model to track the deliver man. I don't expect a train, horse, sheep, cow, elephant, bear, zebra, giraffe in the camera:) But the model supports those categories.

The solution is to train your model with your images. I'm capturing pictures of the deliver guy, and I want to expand the capture to the trucks. Maybe later I can train a model to detect UPS, Fedex, USPS, DHL, Amazon trucks :)

Until then... I've a trigger to alert me if an Elephant arrives at the door.

#48  

Spotting Elephants here in Alabama could be useful as its the mascot for the University of Alabama.  BTW, there is no pressure to ever answer anything from me, timely or at all. I am just thrilled at any answer at all on any timeframe.

For me, I am proceeding along the following path with darknet object detection:

1.  I implemented Darknet as you know, and it is returning good bounding boxes and probabilities for the 80 classes.  When I mentioned having a better model...I meant I wanted more classes.

2.  I implemented a skill to get the robot to tell me what it sees by saying "What do you see?".  3. I am wondering if there is a way to augment DarkNet with AlexNet.  To this ends, I first implemented the AlexNet model (1000 classes). The problem here is AlexNet classifies an image as a single thing, so I need to figure out an algo for picking subsets of an image that might contain single interesting objects. Until I figure out now to do that, AlexNet is not all that useful to me unless the bot is leaning over, staring directly at something and trying to identify or pick it up.  Also, a huge amount of the 1000 classes are still biology or other fairly useless classes like you pointed out.  There are a lot of alternatives to AlexNet that all have these same issues...single object and too many useless classes like species.  Species aside, does anyone know a good way to segment an image so parts of it can be classified?

4.  Here are the use cases I want to focus on next... more verbal questions and answers (about what is seen) like "Where is the cat?", "How far away is the cat?" (depth sensor), "How big is the cat." (some trig with distance and bounding box), "What color is the cat?" (image processing, tougher one for me), "Shoot the cat." (lasers), "Go to/chase the cat" (nav/drive), "Point at the cat." (servos), "What is next to the cat", "How many cats do you see?", "Look at the cat", "What is the cat on top of?" (table, tv, etc.) and others.  You get the idea.  While some of these sound challenging or error prone, almost all of these are achievable.  I'd like to make a vid when I get some of these going.

#49  

Hi,

I have been using this skill for some time and I am very glad that you created it! I do have a couple of questions:

  1. is there any EASY way to REMOVE objects from its list? I want to remove objects that my robot will never encounter.
  2. is there any way from within ARC to define new objects? I understand that defining new objects (using thousands of photos) is a very CPU intensive operation.
  3. is it possible to add other DarkNet/Yolo objects from other datasets?
PRO
USA
#50  

Quote:

1. is there any EASY way to REMOVE objects from its list? I want to remove objects that my robot will never encounter.
Are you asking for an "ignore" / black list ? It's possible. If your are asking to remove classes e.g.  elephants, horses, zebras to speed up the detection, that it's not possible.

Quote:

2. is there any way from within ARC to define new objects?
 ARC has a camera control that allows training new custom objects using a camera: https://synthiam.com/Support/Skills/Camera/Camera-Device?id=16120#objectTracking

Quote:

3. is it possible to add other DarkNet/Yolo objects from other datasets?
It's possible to improve the plugin to specify a custom Yolo dataset, there are a few Yolo framework implementation versions i.e. v3, v2, v4,v5. If you have one in particular please share the URL and I can try to see if is compatible with the plugin.

It's not possible to use multiple datasets in a single inference process.

#51  
  1. Thanks. :)
  2. Does the ARC camera control support YOLO objects?
  3. I'll take a look. Thanks! :)
#52  

Can I port the On Changes Script over to EZ-Script, Blockly, or Python, or is only JavaScript supported for this skill?

Thomas Messerschmidt

PRO
Synthiam
#53   — Edited

There's a standard dialog for editing scripts - it's the same editor in all ARC scripts. You can select the language you wish to use by a tab on the top. There's more information on this page about how the script editor works and languages: https://synthiam.com/Support/Programming/code-editor/edit-scripts

Scroll to the bottom, and you can read that relevant section of the page. You can use the support section to find additional information about using ARC.

*edit: or this step of the getting started guide is quite popular: https://synthiam.com/Support/Get-Started/how-to-make-a-robot/choose-skill-level

#54  

So I assume you meant that just because the "On Changes Script" was written in JavaScript, it could have just as easily been written in the other 3 languages. I had assumed that there was JavaScript code used that would not work in the other languages. I guess I could have tried rewriting it myself. I've been a bit overwhelmed trying to get the last two Simone articles out.

Thanks.

PRO
Synthiam
#55  

Yeah, that's precisely what you'll have to do. Why would you want it in another language? The Javascript compiler is 100 times faster (or something comparable) than ezscript.