Different types of augmentations to be used for APR?

Different types of augmentations to be used for APR? - object-detection

I am trying to do automatic number plate recognition but I have very few data. In order to increase the data, I tried translation, rotation and addition of noise. Can someone suggest me some other data augmentation techniques, which can be used in ANPR.

You can use also :Rectification;Aspect-Ratio;Centring;Scaling;Mirroring;Cropping;Flipping H&V

Related

Creating a good training set for one-class detection

I am training a one-class (hands) object detector on the egohands data set. My problem is that it detects way too many things as hands. It feels like it is detecting everything that is skin-colored as a hand.
I assume the most likely explanation for this is that my training set is poor, as every single image of the set contains hands, and also almost no other skin-toned elements are on the images. I guess it is necessary to also present the network images that are not what you try to detect?
I just want to verify I am right with my assumptions, before investing lots of time into creating a better training set. Therefore I am very grateful for every hint want I am doing wrong.

Object detection preprocessing is critical step, take extra caution guards as detection networks are sensitive to geometrical transformations.
Some proven data augmentation methods include:
1.Random geometry transformation for random cropping (with constraints),
2.Random expansion,
3.Random horizontal flip
4.Random resize (with random interpolation).
5.Random color jittering for brightness, hue, saturation, and contrast

training images? Considerations for selection

I'm relatively new and am still learning the basics. I've used NVIDIA DIGITS in the past, and am now looking at Tensorflow. While I've been able to fumble my way around creating some models for a few projects I'm working on, I really want to start diving deeper into what I'm doing, how I'm doing it, and ultimately a better understanding of why.
One area that I would like to start with is the Images that I'm using for training and testing. Can anyone point me to a blog, an article, a paper, or give me some insight in what I need to consider when selecting images to train a new model on. Up until recently, I've been using datasets that have already been selected and that are available for download. Lets say I'm going to start working on a project that involves object detection of ships from a variety of distances and angles.
So my thoughts would be
1) I need a large quantity of images.
2) The images need to contain ships of the different types I would like to detect. (lets just say one class, ships, don't care what type of ships)
3) I also need to have images that have a great variety of distance perspective for the different types of ships.
Ultimately, my thoughts are that the images need to reflect the distance, perspective, and types of ships I would ideally want to identify from the video. Seems simple enough.
However, there are a number of questions
Does the images need to be the same/similar resolution as the camera I'll be using, for best results?
Does the images all need to be the same resolution?
Can I use a single image and just digitally zoom out on the image to give the illusion of different distances?
I'm sure there are a number of other questions that I'm not asking, or should be asking. Are there any guide lines available for creating a solid collection of images to use when creating the collection of images for training and validation?

I recommend thinking through end to end, like would you need to classify ship models as a next step? I recommend going through well known public datasets and actually work with the structure, how to store data, labels, how to handle preprocessing etc.
More importantly, what are you trying to achieve? Talking to experts in the topic does help greatly while preparing your own dataset.
Use open source images if you can, e.g. flickr, google, imagenet.
No, you don't need them to be the same resolution.
It is not ideal to zoom in/out images to use in different categories. Preprocessing images and data augmentation already does this to create more distant representations of the same class. This is why I would recommend hands on approach with an existing dataset first.
Yes, what you need is many, different representations of classes, and a roughly balanced dataset of classes. If you define your data structure well in the beginning, it will save you a ton of time as you won't have to make changes often.

Cloud Vision API poorly recognizes 7-segment numbers

The simplest example of what I'm trying to recognize:
I use DOCUMENT_TEXT_DETECTION, but in the answer I get the hieroglyphics.
If I use Eng in the ImageContext parameter for the addAllLanguageHints method, then I have 111 in result. Better, but still bad.
Is there any way to indicate that the numbers are recognised or somehow improve the results?
Also, how is the setRepeatedField option in ImageContext is used? I could not find any examples of its use.
Thanks in advance.

Even if it doesn't work out of the box ... you'd need is to classify images using custom labels, when the default labels won't suffice. Cloud Auto ML Vision (select Vision from that blue drop-down menu) let's you train custom models, which can be used to recognize that font. And since the possible amount of shapes is quite limited with that 7-segment display, it shouldn't be too difficult to train it. If you'd get a calculator with a better display, it might also work better. The LCD above looks a little cheap, with those huge spaces and cut-off endings - but nevertheless, one can train it to read that.

Encoding invariance for deep neural network

I have a set of data, 2D matrix (like Grey pictures).
And use CNN for classifier.
Would like to know if there is any study/experience on the accuracy impact
if we change the encoding from traditionnal encoding.
I suppose yes, question is rather which transformation of the encoding make the accuracy invariant, which one deteriorates....
To clarify, this concerns mainly the quantization process of the raw data into input data.
EDIT:
Quantize the raw data into input data is already a pre-processing of the data, adding or removing some features (even minor). It seems not very clear the impact in term of accuracy on this quantization process on real dnn computation.
Maybe, some research available.

I'm not aware of any research specifically dealing with quantization of input data, but you may want to check out some related work on quantization of CNN parameters: http://arxiv.org/pdf/1512.06473v2.pdf. Depending on what your end goal is, the "Q-CNN" approach may be useful for you.
My own experience with using various quantizations of the input data for CNNs has been that there's a heavy dependency between the degree of quantization and the model itself. For example, I've played around with using various interpolation methods to reduce image sizes and reducing the color palette size, and in the end, I discovered that each variant required a different tuning of hyper-parameters to achieve optimal results. Generally, I found that minor quantization of data had a negligible impact, but there was a knee in the curve where throwing away additional information dramatically impacted the achievable accuracy. Unfortunately, I'm not aware of any way to determine what degree of quantization will be optimal without experimentation, and even deciding what's optimal involves a trade-off between efficiency and accuracy which doesn't necessarily have a one-size-fits-all answer.
On a theoretical note, keep in mind that CNNs need to be able to find useful, spatially-local features, so it's probably reasonable to assume that any encoding that disrupts the basic "structure" of the input would have a significantly detrimental effect on the accuracy achievable.

In usual practice -- a discrete classification task in classic implementation -- it will have no effect. However, the critical point is in the initial computations for back-propagation. The classic definition depends only on strict equality of the predicted and "base truth" classes: a simple right/wrong evaluation. Changing the class coding has no effect on whether or not a prediction is equal to the training class.
However, this function can be altered. If you change the code to have something other than a right/wrong scoring, something that depends on the encoding choice, then encoding changes can most definitely have an effect. For instance, if you're rating movies on a 1-5 scale, you likely want 1 vs 5 to contribute a higher loss than 4 vs 5.
Does this reasonably deal with your concerns?
I see now. My answer above is useful ... but not for what you're asking. I had my eye on the classification encoding; you're wondering about the input.
Please note that asking for off-site resources is a classic off-topic question category. I am unaware of any such research -- for what little that is worth.
Obviously, there should be some effect, as you're altering the input data. The effect would be dependent on the particular quantization transformation, as well as the individual application.
I do have some limited-scope observations from general big-data analytics.
In our typical environment, where the data were scattered with some inherent organization within their natural space (F dimensions, where F is the number of features), we often use two simple quantization steps: (1) Scale all feature values to a convenient integer range, such as 0-100; (2) Identify natural micro-clusters, and represent all clustered values (typically no more than 1% of the input) by the cluster's centroid.
This speeds up analytic processing somewhat. Given the fine-grained clustering, it has little effect on the classification output. In fact, it sometimes improves the accuracy minutely, as the clustering provides wider gaps among the data points.
Take with a grain of salt, as this is not the main thrust of our efforts.

What is the advantage of the paperboat format in performance optimization of ML?

The paperBoat format claims to provide a better dataset representation for machine learning routines. I'd like to understand the nature of its optimization. I understand that using an integer representation for model attributes means a faster processing of the data set, what are the other improvements.
Also, how to tune an ML algorithm to work with this file format.

I don't know if this format really provides better representation, but I can speculate why it can be more efficient.
First, as they state at format description, "Having data of the same precision consecutive enables hardware vectorization."; consider also wikipedia: "Vector processing techniques have since been added to almost all modern CPU designs".
Second, their format allows you to mix sparse and non-sparse features, but since all sparse features are placed consequently, it is possible to easily take them as a sparse matrix and optimize methods for learning like conjugate gradient.
how to tune an ML algorithm to work with this file format?
What do you mean by ML algorithm tuning? The learning algorithm doesn't know and doesn't need to know anything about file format of the dataset; and you can't increase or decrease accuracy if you know file format. In theory, you can speed up the concrete optimization algorithm (like Gradient descent) if you can rely on some properties of data (and, I guess, Ismion PaperBoat does it), but I don't think that you can tune it by yourself.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas