What are the differences between “instance detection” and "semantic segmentation"? - instance

I am working on semantic segmentation using deep learning, and I have met the terms: semantic segmentation, instance detection, object detection and object segmentation.
What is the differences between them?

Some of the usage of these terms is either subjective to the user or context-dependent, but as far as I can tell a plausible reading of these can be:
instance detection - given an instance (i.e. an image of a specific object) you need to detect it in an image / image set. Result can be either "Image i has instance X", a segmentation of the instance in all of its occurrences or anything in between.
object detection - depending on context can be the same as instance detection, or could mean that given a specific class of objects you want to detect all objects of this class that occur in an image / image set
object segmentation - take object detection and add segmentation of the object in the images it occurs in.
semantic segmentation - attempt to segment given image(s) into semantically interesting parts. This usually means pixel-labeling to a predefined class list.
Another question about image segmentation terminology can be found here and might be of some interest for you.

Related

Detecting Any Object in my path using Tensorflow

Can I Use Tensorflow object detection API for detecting any objects which come in between my path so that can stop the movement of my product? I have done customized Object detections before but here I can't train each object which may interrupt my product path. So is that possible to use Tensorflow API as a kind of collision detection?
With object detection, you can identify objects and the object's location and extent on an image. This would be an option to check if specific objects are blocking your path. There is also the option of detecting/segmenting unknown objects (as described here). However, what you are after sounds more like depth estimation or even SLAM.
One example for depth estimation is monodepth - a neural network that can estimate the depth for each pixel from a single camera image. You can use that to verify if your path is clear or if something in front of your product is blocking the path.
The other one SLAM - simultaneous location and mapping - might be a bit over the top for just checking if you can drive somewhere. Anyways, SLAM solves the task of navigating an unknown environment by building an internal model of the world and at the same time estimates the own location inside this model to solve navigation tasks.

Is it necessary to label every object of a class on an image?

I labeled a bunch of Images for training a Faster-RCNN network for object detection with one class. There are about hundrets or thousands of objects of this class on every Image. Do I have to label all of them?
For now i labeled about 20 to 80 instances of the object on every Image. Therefore i picked the objects where i think reconition is easy.
When I start the training of the network with this dataset, the loss commutes between 0.9 and 20,000,000
Normally the loss should become smaller but in my case it decreases and has extreme high spikes.
Yes you should label every instance of the object in each training image. Because, whatever you don't label is considered as background (which is an implicit class which is labeled -1). So if you leave an instance of the object without label, it is considered as background and so the model will be confused when trying to distinguish the two classes, namely background class (-1) and object class (1 for example).
If there are too many instances of the object in each image, maybe you could cut the images to smaller ones (like 1000 parts each containing ~100 objects).

DeepLabv3+ segmented image boundary

I able to apply DeepLabV3+ to segment the images, but also like to get the boundary around individual detection.
For example, in the image segmentation mask above, I cannot distinguish between the two children on the horse. If I could draw the boundary around each individual children or put a different color for them, I would be able to distinguish them. Please let me know if is there any way to configure deepLab to achieve that.
You are confusing two tasks: semantic segmentation and instance segmentation.
DeepLbV3+ (and many similar deep nets) are solving semantic segmentation problem: that is labeling each pixel with the class it belongs to. You got a very nice results where all pixels belonging to "person" were colored pink. Semantic segmentation algorithms do not care how many "person"s there are in the image and they do not wish and do not care to label each person separately. As long as all "person" pixels were labeled as such - the task is considred well done.
On the other hand, what you are looking for is instance segmentation: that is labeling each "person" as a unique person in the image. This is far more complex task: not only should you succeed in labeling all "person" pixels as "person", but also you want to group the "person" pixels into the different instances in the image.
Since instance segmentation is a more difficult task, you would need different models/nets to accomplish it.
I suggest Mask R-CNN as a good starting point for instance segmentation algorithms.

Can TF object detection API be used to detect objects that are bounded by another?

This is a separation of this thread
Can TF object detection API be used to detect two objects where 1 is enclosed / bounded by the other?
Ex. face vs person - face is within the bounds of the person
The answer is Yes. You essentially don't have to worry about this --- though for various reasons, it may become a more challenging for the algorithm if you have an object enclosing another object that is almost the same size. Settings like "Face within person" or even "sunglasses within face" should be just fine.
Yes, You can. You don't have to do any extra things for that. once the detection happen and if it will found one object under some other object. it will give you output which is actually function of multi detection.

Object Oriented implementation of graph data structures

I have been reading quite a bit graph data structures lately, as I have intentions of writing my own UML tool. As far as I can see, what I want can be modeled as a simple graph consisting of vertices and edges. Vertices will have a few values, and will so best be represented as objects. Edges does not, as far as I can see, need to be neither directed or weighted, but I do not want to choose an implementation that makes it impossible to include such properties later on.
Being educated in pure object oriented programming, the first things that comes to my mind is representing vertices and edges by classes, like for example:
Class: Vertice
- Array arrayOfEdges;
- String name;
Class: Edge
- Vertice from;
- Vertice to;
This gives me the possibility to later introduce weights, direction, and so on. Now, when I read up on implementing graphs, it seems that this is a very uncommon solution. Earlier questions here on Stack Overflow suggests adjacency lists and adjacency matrices, but being completely new to graphs, I have a hard time understanding why that is better than my approach.
The most important aspects of my application is having the ability to easily calculate which vertice is clicked and moved, and the ability to add and remove vertices and edges between the vertices. Will this be easier to accomplish in one implementation over another?
My language of choice is Objective-C, but I do not believe that this should be of any significance.
Here are the two basic graph types along with their typical implementations:
Dense Graphs:
Adjacency Matrix
Incidence Matrix
Sparse Graphs:
Adjacency List
Incidence List
In the graph framework (closed source, unfortunately) that I've ben writing (>12k loc graph implementations + >5k loc unit tests and still counting) I've been able to implement (Directed/Undirected/Mixed) Hypergraphs, (Directed/Undirected/Mixed) Multigraphs, (Directed/Undirected/Mixed) Ordered Graphs, (Directed/Undirected/Mixed) KPartite Graphs, as well as all kinds of Trees, such as Generic Trees, (A,B)-Trees, KAry-Trees, Full-KAry-Trees, (Trees to come: VP-Trees, KD-Trees, BKTrees, B-Trees, R-Trees, Octrees, …).
And all without a single vertex or edge class. Purely generics. And with little to no redundant implementations**
Oh, and as if this wasn't enough they all exist as mutable, immutable, observable (NSNotification), thread-unsafe and thread-safe versions.
How? Through excessive use of Decorators.
Basically all graphs are mutable, thread-unsafe and not observable. So I use Decorators to add all kinds of flavors to them (resulting in no more than 35 classes, vs. 500+ if implemented without decorators, right now).
While I cannot give any actual code, my graphs are basically implemented via Incidence Lists by use of mainly NSMutableDictionaries and NSMutableSets (and NSMutableArrays for my ordered Trees).
My Undirected Sparse Graph has nothing but these ivars, e.g.:
NSMutableDictionary *vertices;
NSMutableDictionary *edges;
The ivar vertices maps vertices to adjacency maps of vertices to incident edges ({"vertex": {"vertex": "edge"}})
And the ivar edges maps edges to incident vertex pairs ({"edge": {"vertex", "vertex"}}), with Pair being a pair data object holding an edge's head vertex and tail vertex.
Mixed Sparse Graphs would have a slightly different mapping of adjascency/incidence lists and so would Directed Sparse Graphs, but you should get the idea.
A limitation of this implementation is, that both, every vertex and every edge needs to have an object associated with it. And to make things a bit more interesting(sic!) each vertex object needs to be unique, and so does each edge object. This is as dictionaries don't allow duplicate keys. Also, objects need to implement NSCopying. NSValueTransformers or value-encapsulation are a way to sidestep these limitation though (same goes for the memory overhead from dictionary key copying).
While the implementation has its downsides, there's a big benefit: immensive versatility!
There's hardly any type graph that I could think of that's impossible to archieve with what I already have. Instead of building each type of graph with custom built parts you basically go to your box of lego bricks and assemble the graphs just the way you need them.
Some more insight:
Every major graph type has its own Protocol, here are a few:
HypergraphProtocol
MultigraphProtocol [tagging protocol] (allows parallel edges)
GraphProtocol (allows directed & undirected edges)
UndirectedGraphProtocol [tagging protocol] (allows only undirected edges)
DirectedGraphProtocol [tagging protocol] (allows only directed edges)
ForestProtocol (allows sets of disjunct trees)
TreeProtocol (allows trees)
ABTreeProtocol (allows trees of a-b children per vertex)
FullKAryTreeProtocol [tagging protocol] (allows trees of either 0 or k children per vertex)
The protocol nesting implies inharitance (of both protocols, as well as implementations).
If there's anything else you'd like to get some mor insight, feel free to leave a comment.
Ps: To give credit where credit is due: Architecture was highly influenced by the
JUNG Java graph framework (55k+ loc).
Pps: Before choosing this type of implementation I had written a small brother of it with just undirected graphs, that I wanted to expand to also support directed graphs. My implementation was pretty similar to the one you are providing in your question. This is what gave my first (rather naïve) project an abrupt end, back then: Subclassing a set of inter-dependent classes in Objective-C and ensuring type-safety Adding a simple directedness to my graph cause my entire code to break apart. (I didn't even use the solution that I posted back then, as it would have just postponed the pain) Now with the generic implementation I have more than 20 graph flavors implemented, with no hacks at all. It's worth it.
If all you want is drawing a graph and being able to move its nodes on the screen, though, you'd be fine with just implementing a generic graph class that can then later on be extended to specific directedness, if needed.
An adjacency matrix will have a bit more difficulty than your object model in adding and removing vertices (but not edges), since this involves adding and removing rows and columns from a matrix. There are tricks you could use to do this, like keeping empty rows and columns, but it will still be a bit complicated.
When moving a vertex around the screen, the edges will also be moved. This also gives your object model a slight advantage, since it will have a list of connected edges and will not have to search through the matrix.
Both models have an inherent directedness to the edges, so if you want to have undirected edges, then you will have to do additional work either way.
I would say that overall there is not a whole lot of difference. If I were implementing this, I would probably do something similar to what you are doing.
If you're using Objective-C I assume you have access to Core Data which would be probably be a great place to start - I understand you're creating your own graph, the strength of Core Data being that it can do a lot of the checking you're talking about for free if you set up your schema properly