DeepLabv3+ segmented image boundary - tensorflow

I able to apply DeepLabV3+ to segment the images, but also like to get the boundary around individual detection.
For example, in the image segmentation mask above, I cannot distinguish between the two children on the horse. If I could draw the boundary around each individual children or put a different color for them, I would be able to distinguish them. Please let me know if is there any way to configure deepLab to achieve that.

You are confusing two tasks: semantic segmentation and instance segmentation.
DeepLbV3+ (and many similar deep nets) are solving semantic segmentation problem: that is labeling each pixel with the class it belongs to. You got a very nice results where all pixels belonging to "person" were colored pink. Semantic segmentation algorithms do not care how many "person"s there are in the image and they do not wish and do not care to label each person separately. As long as all "person" pixels were labeled as such - the task is considred well done.
On the other hand, what you are looking for is instance segmentation: that is labeling each "person" as a unique person in the image. This is far more complex task: not only should you succeed in labeling all "person" pixels as "person", but also you want to group the "person" pixels into the different instances in the image.
Since instance segmentation is a more difficult task, you would need different models/nets to accomplish it.
I suggest Mask R-CNN as a good starting point for instance segmentation algorithms.

Related

Anchor Boxes for Object detection

Can someone help me understand how are anchor boxes represented in YoloV5.
In the official code it is mentioned as:
[10,13, 16,30, 33,23] # P3/8
[30,61, 62,45, 59,119] # P4/16
[116,90, 156,198, 373,326] # P5/32
I understand that p3, p4 and p5 are layers of feature pyramids. But what are the numbers corresponding to. I'l appreciate if someone can clarify on:
What these number specify.
Their, significance.
Why are they changing from layer to layer.
Thanks.
Anchor box is just a scale and aspect ratio of specific object classes in object detection. The FPN (Future Pyramid Network) has three outputs and each output's role is to detect objects according to their scale. For example:
P3/8 is for detecting smaller objects.
P4/16 is for detecting medium objects.
P5/32 is for detecting bigger objects.
So when you're going to detect smaller objects you need to use smaller anchor boxes and for medium objects you should use medium scale anchor boxes, so on. You can see it on following image as well:
You may have a question why bigger feature maps have smaller anchor boxes. It's because, during downsampling the feature maps if you downsample it many times you may lose smaller objects, that's why you should use bigger feature maps and smaller anchor boxes to detect smaller objects.

training images? Considerations for selection

I'm relatively new and am still learning the basics. I've used NVIDIA DIGITS in the past, and am now looking at Tensorflow. While I've been able to fumble my way around creating some models for a few projects I'm working on, I really want to start diving deeper into what I'm doing, how I'm doing it, and ultimately a better understanding of why.
One area that I would like to start with is the Images that I'm using for training and testing. Can anyone point me to a blog, an article, a paper, or give me some insight in what I need to consider when selecting images to train a new model on. Up until recently, I've been using datasets that have already been selected and that are available for download. Lets say I'm going to start working on a project that involves object detection of ships from a variety of distances and angles.
So my thoughts would be
1) I need a large quantity of images.
2) The images need to contain ships of the different types I would like to detect. (lets just say one class, ships, don't care what type of ships)
3) I also need to have images that have a great variety of distance perspective for the different types of ships.
Ultimately, my thoughts are that the images need to reflect the distance, perspective, and types of ships I would ideally want to identify from the video. Seems simple enough.
However, there are a number of questions
Does the images need to be the same/similar resolution as the camera I'll be using, for best results?
Does the images all need to be the same resolution?
Can I use a single image and just digitally zoom out on the image to give the illusion of different distances?
I'm sure there are a number of other questions that I'm not asking, or should be asking. Are there any guide lines available for creating a solid collection of images to use when creating the collection of images for training and validation?
I recommend thinking through end to end, like would you need to classify ship models as a next step? I recommend going through well known public datasets and actually work with the structure, how to store data, labels, how to handle preprocessing etc.
More importantly, what are you trying to achieve? Talking to experts in the topic does help greatly while preparing your own dataset.
Use open source images if you can, e.g. flickr, google, imagenet.
No, you don't need them to be the same resolution.
It is not ideal to zoom in/out images to use in different categories. Preprocessing images and data augmentation already does this to create more distant representations of the same class. This is why I would recommend hands on approach with an existing dataset first.
Yes, what you need is many, different representations of classes, and a roughly balanced dataset of classes. If you define your data structure well in the beginning, it will save you a ton of time as you won't have to make changes often.

Basics of face Sculpting in Blender

I mean, the basics..
1) I have seen in the Online videos, that they are modelling a character (or anything) through one object only, they are extruding, loop cut, scaling, etc and model a character, why don't they design different objects separately (like hands separately, legs separately, body separate and then join them together and make one object)..??????
2) Like What the texturing department has to see so that they should not return the model back to the modelling department. I mean like the meshes(polygons) over the model face must be quad, etc not triangle. while modelling a character..
what type of basics i should know , means is there any check list or is there any basics which i should see before modelling a character..
Please correct me if i am wrong , and answer my both questions.. Thanks
It may be common but it definitely isn't mandatory to have a model as one solid mesh. Some models will have parts of the body underneath clothing removed to reduce the poly count. How the model is to be used will be a big factor to how you model it, that is a for a single image it is easy to get away with multiple parts, while a character that will be animated in a cartoony animation could be stretched and distorted in ways that could show holes in a model with multiple pieces. When working in a team, there may be rules in place determining whether a solid or multi-part model is considered acceptable.
An example of an animated model made from multiple parts is Sintel, the main character in the Sintel short animation.
There is nothing stopping you from making a library of separate body parts and joining them together when you make your model. Be aware that this can bring complications, if you model an arm with 12 verts and then you make your hand with 15, then you have to fiddle around to merge them together.
You will also find some extra freedom to work with multiple body parts during the sculpting phase as you are creating a high density mesh that is used as a template to model a clean mesh over. This step is called retopology.
It is more likely that the rigging department will send a model back for fixing than the texturing department. When adding a rig and deforming the mesh in different ways, any parts that deform badly will be revealed and need fixing.
[...] (like hands separately, legs separately, body separate and then
join them together and make one object) [...]
Some modelers I know do precisely this and they do it in a way where they block in the design using broad primitive shapes, start slicing some edge loops and add broad details, then merge everything together, then sculpt it a bit further with high-res sculpting tools, and finally retopologize everything.
The main modelers I know who do this, however, model in a way that tries to adhere as close as possible to the concept artist's illustration. They're not creating their own models from scratch but are instead given top/front/back/side illustrations of a character, for example, and are just trying to match it as closely as possible.
When you start modeling everything in small pieces, it helps to have that concept illustration since you can get lost in the topology otherwise and fusing organic meshes together can be difficult to do in a clean way.
[...] why don't they design different objects separately? [...]
Again they sometimes do, but one of the appeals of creating organic meshes by keeping it seamless the entire time is that you can start to focus on how edge loops propagate across the entire model. It helps to know that the base of a finger is a hexagon, for example, in figuring out how to cleanly propagate and terminate the edge loops for a hand, and likewise have a strategy for the hand to cleanly propagate and terminate edge loops as it joins into the forearm.
It can be hard to get the topology to match up cleanly if you designed everything in small pieces and then had to figure out how to merge it all together. Polygonal modeling is very topology-oriented. It tends to require as much thinking about the wireframe and edge flows as it does the shape of the model, since it needs to be a certain way for everything to subdivide cleanly and smoothly and animate predictably with subdivision surfaces.
I used to work with developers who took one glance at the topology-dominated workflow of polygonal modeling and immediately wanted to jump to seeking alternatives, like voxel sculpting. With voxels you could be able to potentially model everything in pieces and foose it all together in a nice and smooth organic way without thinking about topology whatsoever.
However, that loses sight of the key appeal of polygonal meshes. Their wire flow forms a control lattice with a very finite number of control points for the artist to animate and move around to predictably control the shape of their model. You immediately lose that with a voxel representation -- so while voxels free the artist of thinking about how the topology works and how the wireframe flows through the model, it also loses all those control benefits of having that. So often if people use voxel sculpting, they end up meticulously retopologizing everything at the end anyway to gain back that level of coarse and predictable control they have with polygonal meshes.
I mean like the
meshes(polygons) over the model face must be quad, etc not triangle.
while modelling a character..
This is all in the context of subdivision surfaces: the most popular of which are variants of catmull-clark. That favors quads to get the most predictable subdivision. It's much easier for the artist to predict how everything will look like and deform if they favor, as much as possible, uniform grids of quadrangles wrapped around their model with 4-valence vertices and every polygon having 4 points. Then only in the case where they kind of need to "join" these quad grids together, they might create some funky topology: a 5-valence vertex here, a 3-valence vertex there, a 5-sided polygon here, a triangle there -- but those cases tend to deform a bit unpredictably (at least unintuitively), so artists tend to try to avoid these as much as possible.
Because when artists model polygonal meshes in this way, they are not just trying to create a statue with a nice shape. If that's all they wanted to do, they'd save themselves a lot of grief avoiding dealing with things in terms of individual vertices/edges/polygons in the first place and using something like Sculptris. Instead they are designing not only shapes but also designing a control lattice, a wire flow and a set of control points they can easily move around in the future to get predictable behavior out of their control cage. They're basically designing controls or an "interactive GUI/rig" almost for themselves with how they design the topology.
2) Like What the texturing department has to see so that they should
not return the model back to the modelling department.
Generally how a mesh is modeled in a direct sense shouldn't affect the texture department's work much at all if they're working with UV maps and painting textures over them (at that point it doesn't really matter if a model has clean wire flows or not, since all the texture artists do is pain images over the 2D UV map or directly onto the 3D model).
However, if the modeler does the UV mapping, then regardless of whether he uses quad meshes and clean wire flows or not, if the UV mapping is poor, then the resulting texture images will look all distorted. So the UV maps need to be made well with minimal distortion, though that's usually easy to do automatically these days.
The other exception is if the department doesn't use UV maps and instead uses, say, PTex from Disney. PTex really favors quads. In the original paper at least, it only worked with quads.

Registration of 3D surface mesh to 3D image volume

I have an accurate mesh surface model of an implant I'd like to optimally rigidly align to a computed tomography scan (scalar volume) that contains the exact same object. I've tried detecting edges in the image volume with canny filter and doing an iterative closest point alignment between the edges and the vertices of the mesh, but it's not working. I also tried voxelizing the mesh, and using image volume alignment methods (Mattes Mutual) which yields very inconsistent results.
Any other suggestions?
Thank you.
Generally, mesh and volume are two different data structures. You have to either convert mesh to volume or convert volume to mesh.
I would recommend doing a segmentation of volume data first, to segment out the issues you want to register. With canny filter might not be enough to segment the border clearly. I would like to recommend you with level-set method and active contour model. These two are frequently used in medical image processing. For these two topics, I would recommend professor Chunming Li's work.
And after you do the segmentation of volume data, you might be able to reconstruct mesh model of that volume with marching cubes. The vertexes of two mesh could be registered through a simple ICP algorithm.
However, this is just a workaround instead of real registration, it always takes too much time to do the segmentation.

using HAAR training for post-it note recognition

I need to be able to detect a variety of coloured post-it notes via a Microsoft Kinect video stream. I have tried using Emgucv for edge detection but it doesn't seem to locate the vertices/edges and also colour segmentation/detection however considering the variety of colours that may not be robust enough.
I am attempting to use HAAR classification. Can anyone suggest the best variety of positive/negative images to use. For example, for the positive images should I take pictures of many different coloured post-it notes in various lighting conditions and orientations? Seeing as it is quite a simple shape ( a square) is using HAAR classification over-complicating things?
I haar classifiers are typically used on black and white images and trigger primarily on morphologic edge like feature. Seems like if you want to find post it notes in an image the easiest method would be to look at colors (since they come in very distinct colors). Have you tried training a SVM of Random forest classifier to detect post it notes based on just color? Once you've identified areas in the image that are probably post it notes you could start looking at things like the shape as additional validation that you are indeed looking at a post it note.
Take a look at the following as an example of how to find rectangles in an image using hough transform:
https://opencv-code.com/tutorials/automatic-perspective-correction-for-quadrilateral-objects/