How many objects these Computer Vision API can detect? - api

https://learn.microsoft.com/fr-fr/azure/cognitive-services/computer-vision/concept-object-detection
https://cloud.google.com/vision/docs/object-localizer
I would want to know how many and which objects are recognizable using theses APIs and I can't find a mention of that fact.
I found that google API use https://developers.google.com/knowledge-graph/ which is based on schema.org types but I don't really understand well what it's all about.

I'm sorry but as far as I know, there is no fixed list of classes that Azure Computer Vision is able to detect.
By the way, even if there was one, this list is evolving on a regular basis (but no schedule is announced).
In any case, there are limitations (see doc here):
It's important to note the limitations of object detection so you can
avoid or mitigate the effects of false negatives (missed objects) and
limited detail.
Objects are generally not detected if they're small (less than 5% of
the image).
Objects are generally not detected if they're arranged
closely together (a stack of plates, for example).
Objects are not differentiated by brand or product names (different
types of sodas on a store shelf, for example). However, you can get
brand information from an image by using the Brand detection feature.
If you want to detect specific objects, I would highly suggest using Custom Vision (doc / overview here), not Computer Vision, where you can train your model with your own images to match what you are trying to detect

Related

Is There a Quick, Efficient Way to Add Large Numbers of Labels in Either ArcGIS or QGIS?

In 2007, when I was young and foolish and before I knew about Open Street Map, I started an urban historical map project. I was working in Illustrator, it was going to be an interactive Flash piece, and my process was to draw the maps first, with the thought that I'd label some, but not all, of the street later on.
As we know Flash was began to die about 2010 and I put the project away for a number of years. I picked it up again a couple years ago and continued my earlier practice of just drawing streets and water features, this time with the intention of making it a conventional web map. Now I'm pretty close to finishing the drawing of a five-layer (1871, 1903, 1932, 1952 and 2016) historical map of a medium-sized city, though it still lacks labels.
My problem now is how to add large numbers of labels, many of them duplicates. There could be as many as 10,000 for all five layers, though as a practical matter I may have to settle for a smallish fraction of that number. Based on web searches I gather my workflow is unusual and that mine is therefore an unusual problem.
I've exported my maps and brought them into QGIS and played with the software a little. The process of adding labels to objects doesn't seem terribly efficient or user-friendly, but that's probably due to my unfamiliarity with the program.
So my question is this: Are there any tricks to speed up the painful process of adding large numbers of duplicate labels in either QGIS or ArcGIS? Since so many of the streets exist in all five layers, functionality like the ability to select multiple objects in different layers and edit their attributes simultaneously in the Attribute Table would be a godsend. (Doesn't seem possible.) So would the ability to copy the attributes from one object and paste them onto other objects. Or the ability to do either of these things in Illustrator via a plugin and then export the data along with the shapes to a GIS program.
Thanks for your help!
If I understand the issue correctly I think are several different solutions. When you say that you
Typically for a spatial layer in ArcGIS or QGIS you define how to label all features in a layer once by defining a label scheme to use across all features, 1 or 1 million. This assumes that each feature in the layer has one or more attributes in the associated table for the layer.
How are you converting the Illustrator vectors to a spatial layer? DXF?
You will likely have better/faster responses to this question by posting it to the GIS Stack exchange. https://gis.stackexchange.com/

training images? Considerations for selection

I'm relatively new and am still learning the basics. I've used NVIDIA DIGITS in the past, and am now looking at Tensorflow. While I've been able to fumble my way around creating some models for a few projects I'm working on, I really want to start diving deeper into what I'm doing, how I'm doing it, and ultimately a better understanding of why.
One area that I would like to start with is the Images that I'm using for training and testing. Can anyone point me to a blog, an article, a paper, or give me some insight in what I need to consider when selecting images to train a new model on. Up until recently, I've been using datasets that have already been selected and that are available for download. Lets say I'm going to start working on a project that involves object detection of ships from a variety of distances and angles.
So my thoughts would be
1) I need a large quantity of images.
2) The images need to contain ships of the different types I would like to detect. (lets just say one class, ships, don't care what type of ships)
3) I also need to have images that have a great variety of distance perspective for the different types of ships.
Ultimately, my thoughts are that the images need to reflect the distance, perspective, and types of ships I would ideally want to identify from the video. Seems simple enough.
However, there are a number of questions
Does the images need to be the same/similar resolution as the camera I'll be using, for best results?
Does the images all need to be the same resolution?
Can I use a single image and just digitally zoom out on the image to give the illusion of different distances?
I'm sure there are a number of other questions that I'm not asking, or should be asking. Are there any guide lines available for creating a solid collection of images to use when creating the collection of images for training and validation?
I recommend thinking through end to end, like would you need to classify ship models as a next step? I recommend going through well known public datasets and actually work with the structure, how to store data, labels, how to handle preprocessing etc.
More importantly, what are you trying to achieve? Talking to experts in the topic does help greatly while preparing your own dataset.
Use open source images if you can, e.g. flickr, google, imagenet.
No, you don't need them to be the same resolution.
It is not ideal to zoom in/out images to use in different categories. Preprocessing images and data augmentation already does this to create more distant representations of the same class. This is why I would recommend hands on approach with an existing dataset first.
Yes, what you need is many, different representations of classes, and a roughly balanced dataset of classes. If you define your data structure well in the beginning, it will save you a ton of time as you won't have to make changes often.

Comparative Analysis of Object Recognition or Tracking Methods with Multiple Kinects

First of all, I have looked into the list of all branches of StackExchange, and this seems to be the one best suited for this question.
I am looking for a comparative analysis (can be both theoretical and implementation-oriented) between various popular methods of object recognition and tracking using a Microsoft Kinect 360. These methods do not have to include specialised Kinect API features like hand gesture recognition or skeleton tracking. Could you point me to some technical literature that tackles this subject? I have found quite a few papers that talk about doing object detection and training based on the PCL library. I have also found some papers on using just the RGB image of the Kinect to do classic object tracking. But to put things into perspective, I want to know which method gives good performance and is less challenging to implement when a set of Kinects (possibly with overlapping projection cones) is used to do object recognition and/or tracking.
In my opinion, building a unified point cloud to analyse and label the objects to recognise/classify them using multiple Kinects (assuming I have multiple BUSes) would have too high processing overhead. Would it be a viable alternative to do foreground extraction on the depth images separately and then somehow identify duplicates?

Scaling up SURF lookups

I am currently trying to recognise DVD covers in generic photos. My initial test involved using 100 DVD covers and 10 test cases of photos that contained them, and with some tweaking of the find_obj.cpp example in OpenCV I was able to get recognition working.
However now I need to do this on a much larger database, and I am aware that the FLANN method will not scale up well to meet this requirement. How do people here recommend I scale up my SURF recognition in an SQL database?
If you really want to scale your system to several orders of magnitude, nearest neighbors search (FLANN) will not be sufficient.
In such a case what you need is to build a visual vocabulary (a.k.a bag of words) by quantizing your descriptors, and create an inverted index.
I recommend you to refer to the Scalable Recognition with a Vocabulary Tree paper that is the reference publication for such a topic.

Graph Database: TinkerPop/Blueprints vs W3C Linked data

Looking for an infrastructure for network analysis for heterogeneous (multiple node types (multi-mode), multiple edge type (multi-relation) and multiple descriptive features (multi-featured)) networks, I've noticed that there are two standard stacks in the Graph Database world:
On one hand we have the ThinkPop/Blueprint property graph model. It is supported by Neo4j, OrientDB GraphDB, Dex, Titan, InfiniteGraph, etc.
The Tinkerpop stack includes the Blueprint property graph model interface, the Gremlin graph traversal language, and the Furnace graph algorithms package.
On the other hand we have W3C's Linked Data technology stack, which is supported by AllegroGraph, 4store, Oracle Database Semantic Technologies, OWLIM, SYSTap BigData, etc.
Semantic data is represented using RDF/RDFS/OWL, and can be queried using SPARQL On top it offers rules and reasoning capabilities.
Now, suppose that I want to represent heterogeneous data in a graph database, and analyse such data (statistics, relations discovery, structure, evolution, etc.) (I know these terms are wide and vague) - What are the relative strengths of each model for various types of network analysis tasks? Do these two models complement each other?
Couple things, your exemplars of linked data stacks are all triple stores. You would start building a linked data application by first getting your triple store set up, but calling a database a linked data stack is incorrect imo. That's also an incomplete list of triple stores, there is also Sesame, Jena, Mulgara, and Stardog. Sesame and Jena kind of pull double duty, they're the two de-facto standard Java APIs for the semantic web, but both provide triple stores that come bundled with the APIs. I also know that both Cray and IBM are working on triple stores, but I don't know much about either at this point. I do know that Stardog works well with the TinkerPop stack and that it's basically a drop in and start writing Gremlin queries against the RDF.
I think the strengths of RDF/OWL is that you 1) get a real query language 2) they're w3c standards and 3) you get reasoning, if the triple store supports it, for free (more or less -- you still have to write an ontology).
With RDF/OWL/SPARQL being standards, it makes it quite easy to pick up and move to a new triple store with a different feature set should you need to, your data is already in a common format that everyone understands and any application logic encoded as queries are completely portable. And in most cases, you'd be writing against either the Sesame or Jena APIs, or working over SPARQL protocol, so you might need to only change your config/init. I think that's a big win in the early prototyping phases.
I also think that RDF/OWL especially combined w/ reasoning and the kinds of complex SPARQL queries that you can create with the new SPARQL 1.1 really suit themselves well to building complicated analytic applications. Also, I think that the impression that most people have that RDF triple stores don't scale is no longer correct. Most triple stores at this point easily scale into the billions of triples and have very competitive throughput numbers as well.
So based on what I think you might be doing, I think semweb might be a better bet for you. I did a similar project a few years back using RDF & RDFS for the backend fronted by a simple Pylons based webapp and was very happy with the results.