Requested

2D/3D Camera Aggregation And Correlation Score 696

The requirements for such a skill as I see it would be as follows: 1.  Have a configuration for the FOV, resolution, location and orientation of 1 or more camera-like devices that return arrays of data representing pixels.  These devices could be one or more of the following: a.  2D Greyscale and 2D Color Cameras c.  3D Depth Cameras d.  2D Thermal Array Sensors 2.  Accept arrays at runtime representing the pixel data from all the devices through an API, and orientation updates if any for cameras on servos. 3.  Provide an API to be able to retrieve (query) subsets of the aggregated data as required in an efficient way.  An example...query for the box of data in a given area, depth range, temperature range, and/or even a color range. There are many use cases for a feature like this for obstacle avoidance, object detection, mapping, displays, conversation, question answering, etc.  If anyone wants to collaborate with me on this and related ideas, feel free to contact me via my email.  Thanks.

Want to see this feature happen? Like it to increase the score.

ARC Pro

Upgrade to ARC Pro

Elevate your robot's capabilities to the next level with Synthiam ARC Pro, unlocking a world of possibilities in robot programming.

PRO
Synthiam
#1   — Edited

This appears to be a duplicate of: https://synthiam.com/Community/FeatureRequests/Support-for-Orbbec-Astra-Embedded-S-20103

Reviewing the desired "return an array of data" is out of scope for ARC skills for usability. ARC skills relieve users from having to program at that level of interaction. Please review what an ARC skill. They are technologies or features wrapped for easy accessibility to be used by non programmers. The suggestion i have for your skill feature request is to provide an outcome feature that solves a problem.

Please provide that information requested when creating the feature request. Here is a copy and paste

Quote:

Describe the outcome of your feature request by providing an explanation of how it will help your robot perform a new function. Also, describe how not having the feature is holding your robot development back. The more information you can provide, the better it helps us develop a solution for you and others with similar requests.
Additional information to create a skill is here: https://synthiam.com/Support/Skills/Skills-Overview

#2  

I would see this as a very different and higher level skill.    This skill would have nothing to do with any specific sensor (camera, depth, thermal imagers, etc.) and could deal with aggregation, overlapping FOVs, and have a query capability that could be used in a wide range of use cases.  This would make it more similar to mapping or navigation in that it would take data as input from multiple other skills.

I never really intended to request an Orbbec skill, and only mentioned it because someone asked what sensor I was using.  The Orbbec skill is a sideshow.

#3  

I saw your instructions the first time and wrote another page worth of info to describe the outcomes.  Unfortunately, your UI had a very short length limit that prevented me from posting in its original form...so I had to edit it down to fit the space allowed.

PRO
Synthiam
#4  

Your feature request is to recreate the functionality of existing dual FOV depth tracking cameras in a DIY hardware/software package?

#5  

I don't know of any depth cam that allows data to be combined with a thermal cam or other visual cams that have "pixels" of various sizes, located at various places, some stationary, some moving, on one bot.  This skill is not about any one sensor.  It is about aggregation, situation awareness, whatever else you want to call it.  It's not about depth, although depth can play a big part.  I would like to be able to get the color "visual" image of a "warm" object in a room for example...by saying "Show me everything between 90 and 105 Fahrenheit".  This is not the depth, and is not the thermal directly, it is an image of a subset of the space in a given temperature range.  From that, you could classify the image to answer "What is the warm object to the left?" or "How far away is the warm object?" or so many other use cases.  This is not idle speculation.  Some robots will be doing this stuff very soon.

PRO
Synthiam
#6   — Edited

I'm having difficulty following the feature request. I really do want to make your ARC experience positive, even for free non-subscription users - everyone should be able to program amazing robots. You'd like a software skill in ARC that supports a camera that doesn't exist? Are you asking for both the hardware and software as a feature request?

#7  

I would suggest we let this and any new idea have some space to breathe and develop, without rushing to any premature conclusions or categorizations on day 1.  This will allow others to have some time to digest and add their own experiences and requirements/desires/ideas to any new idea.  As far as improving my own ARC experience, that's the biggest suggestion I could make to improve the community and encourage ideas.

Anyway...I will now try to address your (DJ) questions/points one to provide more clarity on the concept.  I get that it is not simple or clearly defined yet.  I am also not attempting to dictate the design either.

To be clear, this is a software only skill.  Its a "Fusion Skill" that would fuse data from: 1)  Sensor-Based Skills (all of these "boxes" of pixels) a) 2D b) 3D c) Thermal Imaging Cams 2)  Feature Detection Skills (many of these output bounding "boxes" or operate on an image from one of those bounds) a) YOLO b) AlexNet c) Face Detection, etc. d) Models for Age/Gender/Emotion/Eyes/Pose Detectors e) Other models TBD

The key is to bring all of the above together in a useful way.  This will centralize all the spatial/math complexity so other skills can concentrate on their specific tasks.

Quote:

Putting one or more cameras (2D or 3D) on a bot and getting data from them adds value but by themselves are often not very useful. Feature detectors like YOLO and face detection add value too, but once again are not useful in isolation.
The skill would be used to answer high level useful questions about the data.  The skill could also be used to answer lower level needs like obstacle avoidance or extracting an image of interest from the combined dataset (more on that another day).  I do this a lot now...extracting a "box" or "boxes" from the data and using it for obstacle detection.  You can call the box a zone, sector, array, image, or whatever you wish to call it.  In my opinion, people will want to be able to access the data in some form or API (json, xml, or images).   I would.  Moving forward, I would like to retrieve "boxes" in a lot of other ways that use 2 or more sensors, like retrieving the color image for a hot box, or the depth region of a yolo box.

To your point about "out of scope for usability".  I understand your point.  Surely anything that involves data or might not be entirely easy is not out of scope for usability though.  For example, I noticed a user requested that the Yolo skill return a list (an array) of bounding boxes (x,y,w,h).  Same idea.  All the image models pretty much work that way.  People can wrap things up in a pretty package and that is great and should be done to the extent possible, but there will always be many more use case where people need access to the data.  Drawing boxes on an image, as many people do in tools for demonstrating face or object detection or others...is only for people.  An image with boxes adds zero value to the robot.  Without access to the data, the skills will not be useful beyond the few use cases foreseen by their makers.

Some Example Use Cases: Note, most of these sample use cases cannot be answered using a single sensor or detector.  I phrase these examples in English just to communicate and get people thinking about the possibilities.  The skill could use NLP or some other way to communicate.  That should be up to the builder.

Which cat is taller/wider/bigger/closer?
Is the cat real? (or just a picture) How far away is the table? What is on the table? What is the temperature of the television/stove/person/dog?
How tall is the plant to your left? Is the space in front of the chair flat and clear (so the robot can go there) ? Which person in the scene is older/happier/male/female/or real/fake? Extract a thermal/visual/depth image at a resolution of choice for a known object (cat, dog, person, etc) or a bounding box of choice.

There are many more...I don't have time to spell all of them out, nor would I imagine that I could think of all of them.  All of the ones mentioned are feasible with current sensors and models and some math.  To me, this skill is what makes all the other sensors and models useful and usable.

Some Benefits to the Average Robot Builder: 1)  Abstraction from the complexities of geometry, trig, matrixes, and other math. 2)  Abstraction from caring about the details of different camera fields of view, resolutions, overlap, etc. All of this could be handled internally in the skill so a user can answer high level question/queries, extract any image or subset they want, in a resolution of their own choosing, across the spectrum of sensors. 3)  Smarter bots in less time 4)  General 3D Perception and Situational Awareness 5)  Augmented Reality - I am not doing this as my data is not currently good for that.  I have seen others doing it.

This idea is obviously not a small undertaking.  There is a need in my opinion and I can see no benefit to ignoring it any longer.  Navigation, Mapping, Movement, Obstacle Avoidance, and some other higher-level skills were needed and were not small undertakings either.  I would imagine they all evolved over time.  This skill is similar in that regard.

Background on my efforts so far:

I've been working with all the different types of cams (2D, 3D, Thermal, and SLAM) for some time.  Once again, each sensor by itself is only a first step and add little value.

I have been thinking and working on this kind of fusion of 2D/3D/Thermal and Models like YOLO for a some time as well and would like to help if I can, so everyone doesn't have to spend as much time wrestling with it as I have.  I am far from having all these issues figured out but I strongly believe it is worth pursuing and that I can add value.  As the idea becomes an area of focus for more people, I can imagine they'll improve it greatly as they give their input.   I believe a much better name can be found too.

As far as requesting features for sensors like the Orbbec or any other sensor, I doubt I would ever submit a request as I think the need for skills like that are obvious.  Each would typically be a wrapper or "facade pattern" on top of the manufacturers SDK or driver anyway.  I am not going to waste anyone's time pointing out obvious things, so I will typically try to point out non-obvious obstacles on the robotics path ahead.

Lastly, when I can make a few vids, I hope to be able to demo a few of the use cases mentioned.  It is and will be an area of ongoing focus for me.

On that point...I gotta stop typing and get back to bot building.  Sorry for the long post people.