How to get bounding box for each symbol on number plate - tensorflow

I want to train some neural network to detect symbols on a car license plate.
I got 10k pictures with plates, and 10k strings, that contains text, represented on plates. For example, this picture, has name:"В394ТТ64.png" (others pictures has +- same quality and size, but different shadows\contrast\light and stuff).
So, what do i want to do?
I want to automatically create PASCAL VOC xml files, containing information about each symbol on a plate. Then I want to train neural network to detect symbols and their classes. I already know which symbols appear on picture, but I don't know how to get bounding box coordinates.
I tried to use OpenCV and binary segmentation, but lightning, shadows, size and noise on pictures are too various.
Also, I tried to find trained neural networks, that can detect symbols, or train one by myself, but failed.
So, how can I get bounding box for each symbol on a license plate?

there are multiple methods to do this.
Manly, you will have to go over your image and do object detection on each segment of the image.
In Your case that should be more easy as it is already a defined area. Probably move from left to right in strides.
Using an MNIST trained classifier, you can classify the number on the image part. If you get a result with p of e.g., 90% you get the coordinates from that part of the image as your boundingbox coordiantes.
You can of course reuse known architectures such as R-CNN or Yolo
Here you can find a nice overview.
Good luck

Found another way to solve this problem.
I wrote a script, that generates different images with number plates and xml files for each image. I generated 10k images.
Then i augmented them so they look more like "real world" images. Now i have 14k images. 4 from original set, and 10k augmented.
Trained ssd_mobilenet model.
After, i used autoannotation to detect boxes on real images
Trained model one more time, and that's it.

Related

How do I train a CNN to learn bounding boxes from labeled images?

I am trying to detect faces of specific people in images. Two people. I have 2k images of each of the two people that are labelled. These are normal snapshots, so there are other people in the images. In some cases, both people appear in the same image.
I used Adobe Lightroom's face detection to label the images. In retrospect, this was a mistake; it's very limited. For example, after labelling enough images, LR guesses at the labels that should be applied. These guesses are rather good. However, you must confirm each guess before it can be used to select photos.
Since I already have a substantial corpus of labeled images, I hoped there was a way I could learn the bounding boxes from the labeled images, rather than use something like labelImg to manually redo work I have already performed.
Ideally, I'm looking for a TensorFlow model that I can load and run on the labelled images where the output is the bounding box. If this is a fool's errand, I would also like to know that.

What kind of Neural network to use for classifing a dotted image to a number depending on the number of and size of dots?

I am currently trying to train a CNN to classify images that consists of dots to a class where the class is a value depending on the number and size of the dots. More dots should be in a high number class and less dots a low number class.
I wonder if there is an alternative to a CNN for this task. I just started designing a CNN since it was an image problem but then realized that the difference to other object classification problems in images is that these images don't really have the same properties, like edges for example, that object images have.
The main goal is to get a number out of the network when the input is an image of this kind and I don't have a preference of how to do it except that it must be a Machine learning solution.
This is how the images look. I have the possibility to use two different kinds where one is the original and the other is binary grayscale black and white.
Binary black and white image
Original image
you can convert image to binary, where pixel will have 0 and 1, assume 0 is background and 1 is the dots. you can sum all the 1`s in that image to get class of yours, to normalize output you can divide it but some number.
if you want machine learning solution then just feed that binary image to a single Dense layer, and it will be a regression problem not classification.
your output activation function should be RELU, and loss function MSE.

convolutional neural network image recognition

Currently I am working on a project with convolutional network using tensorflow and I have set up the network and now i need to train it. I don't have a clue of how could the image should be for training. Like how much of % of the image the object is training on.
It's a cigarette that I have to detect and I have tried around 280 individual pictures where the cigarette is about2-5% of the image. I'm thinking of scrapping those pictures and take new one where the cigarette is about 30-50% of the image.
All the pictures are taking outside on the street environment.
So my question is: are there are any kind of rule regarding good pictures in a training set?
I will report back when I have tried my own solution
The object you are trying to recognise is too small. In the Sample, I think first one will be the best bet for you. Convolution neural network works by doing convolution operations on image pixels. In the second picture, background is too large compared to the object you are trying to recongise. Training on such data will not help you.
Just trying to answer your rule question:
Make sure that cigarette occupies maximum region of the Image. It can be 50% to 90% (with experience). You can still identify cigarettes with 2 to 3 % area, but you need millions of images with varying backgrounds.
CNN learns from the input image. Looking at the sample images you shared (I guess all the images are taken from road side platforms and grass areas). CNN may not learn to find the cigarette, instead it will learn to detect the common background, if your background occupies maximum area of the image. Please make sure to keep different background patterns.

small object detection with faster-RCNN in tensorflow-models

I'm attempting to train a faster-rccn model for small digit detection. I'm using the newly released tensorflow object detection API and so far have been fine tuning a pre-trained faster_rcnn_resnet101_coco from the zoo. All my training attempts have resulted in models with high precision but low recall. Out of the ~120 objects (digits) on each image only ~20 objects are ever detected, but when detected the classification is accurate. (Also, I am able to train a simple convnet from scratch on my cropped images with high accuracy so the problem is in the detection aspect of the model.) Each digit is on average 60x30 in the original images (and probably about half that size after the image is resized before being fed into the model.) Here is an example image with detected boxes of what I'm seeing:
What is odd to me is how it is able to correctly detect neighboring digits but completely miss the rest that are very similar in terms of pixel dimensions.
I have tried adjusting the hyperparameters around anchor box generation and first_stage_max_proposals but nothing has improved the results so far. Here is an example config file I have used. What other hyperparameters should I try adjusting? Any other suggestions on how to diagnose the problem? Should I be looking into other architectures or does my task look doable with faster-rccn and/or SSD?
In the end the immediate problem was that I was not using the visualizer correctly. By updating the parameters for visualize_boxes_and_labels_on_image_array as described by Johnathan in the comments I was able to see that that I am at least detecting more boxes than I had thought.
I check your config gile, you are decreasing the resolution of your image to 1024. The region of your digit will not contain a lot of pixel and you are loosing some information. What I suggest is to train the model with an another dataset (smaller images). You can for example crop the images in 4 four area.
If you have a good GPU increase the max dimension in the image_resizer, but I guess you will run out of memory

Can I find the region of the found categories in TensorFlow?

We have been using Tensorflow for image classification, and we all see the results for the Admiral Grace Hopper, and we get:
military uniform (866): 0.647296
suit (794): 0.0477196
academic gown (896): 0.0232411
bow tie (817): 0.0157356
bolo tie (940): 0.0145024
I was wondering if there is any way to get the coordinates for each category within the image.
Tensorflow doesn't have sample code yet on image detection and localization but it's an open research problem with different approaches to do it using deep nets; for example you can lookup the papers on algorithms called OverFeat and YOLO (You Only Look Once).
Also, usually there's some preprocessing on the object coordinates labels, or postprocessing to suppress duplicate detections. Usually a second, different network is used to classify the object once it's detected.