Multi-Class Image Segmentation - How to start - tensorflow

I'm intending on performing a multi-class semantic image segmentation on a very large dataset I've created.
The set consists of images (jpg or png) and masks (png or coco-json) labeled with 5 classes.
Now I'm struggling to find an entry point into the training part, as I find very few resources that describe multi-class segmentation on ones own dataset.
I would be very grateful for any directions or tips on where or what to look for.

Related

Having a trained classifier like VGG16 how to automate image segmentation?

I Have a trained classifier: VGG16 on say Image Net (or my own images DB and classes). I want to segment my images automatically knowing there are classes on images my classifier knows. How to automate image segmentation?
For this you can extract Grad-CAM features. Kears already has published an official documentation for Grad-CAM extraction you can find it here.
So for your task steps need to followed are
Extract Grad-CAM from the images
Based on Grad-CAM create a segmentation mask using simple image processing technique
In this method you can easily create segmentation mask for images but masks may not be so accurate . Beacuse, see this picture,
it is for Xception model (ImageNet).
Hope you will understand and you will be helpful.

How to get bounding box for each symbol on number plate

I want to train some neural network to detect symbols on a car license plate.
I got 10k pictures with plates, and 10k strings, that contains text, represented on plates. For example, this picture, has name:"В394ТТ64.png" (others pictures has +- same quality and size, but different shadows\contrast\light and stuff).
So, what do i want to do?
I want to automatically create PASCAL VOC xml files, containing information about each symbol on a plate. Then I want to train neural network to detect symbols and their classes. I already know which symbols appear on picture, but I don't know how to get bounding box coordinates.
I tried to use OpenCV and binary segmentation, but lightning, shadows, size and noise on pictures are too various.
Also, I tried to find trained neural networks, that can detect symbols, or train one by myself, but failed.
So, how can I get bounding box for each symbol on a license plate?
there are multiple methods to do this.
Manly, you will have to go over your image and do object detection on each segment of the image.
In Your case that should be more easy as it is already a defined area. Probably move from left to right in strides.
Using an MNIST trained classifier, you can classify the number on the image part. If you get a result with p of e.g., 90% you get the coordinates from that part of the image as your boundingbox coordiantes.
You can of course reuse known architectures such as R-CNN or Yolo
Here you can find a nice overview.
Good luck
Found another way to solve this problem.
I wrote a script, that generates different images with number plates and xml files for each image. I generated 10k images.
Then i augmented them so they look more like "real world" images. Now i have 14k images. 4 from original set, and 10k augmented.
Trained ssd_mobilenet model.
After, i used autoannotation to detect boxes on real images
Trained model one more time, and that's it.

Deep learning training with nonidentical images?

[![enter image description here][1]][1]I am actually reconstructing some images using dual photography. Next, I want to train a network to reconstruct clear images by removing noise (Denoising autoencoder).
The input for training the network is reconstructed images, whereas, the output is ground truth or computer based standard test images. Now the input e.g., Lena is some how not exact version of Lena with image shifted in positions and some artifacts.
If I keep input as my reconstructed image and training output as Lena test image (computer standard test image) , will it work?
I only want to know if input/output shifted or some details missing in one of them (due to some cropping) would work.
It depends on many factors like your images for training and the architecture of the network.
However, what you want to do is to make a network that learns the noise or low level information and for this purpose Generative Adversarial Networks (GAN) are very popular. You can read about them here. Maybe, after you have tried your approach and if the results are not satisfactory then try using GANs, like, DCGAN (Deep Convolution GAN).
Also, share your outcomes with the community if you would like.
Denoising Autoencoders! Love it!
There is no reason for not training your model with those images. The autoencoder, if well trained, will eventually learn the transformation if there is enough data.
However, if you have the 'positive' images, I strongly recommend you to create your own noisy images and then train in that controlled working area. You will simplify your problem and it will be easier to solve.
What is stopping you from doing just that?

How do different input image sizes/resolutions affect the output quality of semantic image segmentation networks?

While trying to perform image segmentation on images from one dataset (KITTI) with a deep learning network trained on another dataset (Cityscapes) I realized that there is a big difference in subjectively perceived quality of the output (and probably also when benchmarking the (m)IoU).
This raised my question, if and how size/resolution of an input image affects the output from a network for semantic image segmentation which has been trained on images with different size or resolution than the input image.
I attached two images and their corresponding output images from this network: https://github.com/hellochick/PSPNet-tensorflow (using provided weights).
The first image is from the CityScapes dataset (test set) with a width and height of (2048,1024). The network has been trained with training and validation images from this dataset.
CityScapes original image
CityScapes output image
The second image is from the KITTI dataset with a width and height of (1242,375):
KITTI original image
KITTI output image
As one can see, the shapes in the first segmented image are clearly defined while in the second one a detailed separation of objects is not possible.
Neural networks in general are fairly robust to variations in scale, but they certainly aren't perfect. Although I don't have references available off the top of my head there have been a number of papers that show that scale does indeed affect accuracy.
In fact training your network with a dataset with images at varying scales is almost certainly going to improve it.
Also, many of the image segmentation networks used today explicitly build constructs into the network to improve this at the level of the network architecture.
Since you probably don't know exactly how these networks were trained I would suggest that you resize your images to match the approximate shape that the network you are using was trained on. Resizing an image using normal image resize functions is quite a normal preprocessing step.
Since the images you are referencing there are large I'm also going to say that whatever data input pipeline you're feeding them through is already resizing the images on your behalf. Most neural networks of this type are trained on images of around 256x256. The input image is cropped and centered as necessary before training or prediction. Processing very large images like that is extremely compute-intensive and hasn't been found to improve the accuracy much.

deep learning for shape localization and recognition

There is a set of images, each of which contains different shape entities, such as shown in the following figure. I am trying to localize and recognize these different shapes. For instance, adding a bounding box for each different shape and maybe even label it. What are the major research papers/deep learning models that have been able to solve this kind of problem?
Object detection papers such as rcnn, faster rcnn, yolo and ssd would help you solve this if you were bent on using a deep learning approach.
It’s easy to say this is a trivial problem that can be solved with tools in OpenCV and deep learning is overkill, but I can see many reasons to use deep learning tools and that does not answer your question.
We assume that your shapes has different scales and rotations. Actually your main image shown above is very large for training process and it needs a lot of training samples to generate a good accuracy at the end on test samples. In this case it is better to train a Convolutional Neural Network on a short images (like 128x128) with only one shape per each image and then use slide trick!
This project will have three main steps:
Generate test and train samples, each image should have only one shape
Train a classifier to recognize a single shape within each input image
Use slide trick! Break your original image containing many shapes to overlapping blocks of size 128x128. Pass each block to your model trained in the second step.
In this way at the end you will have label for each shape from your trained model, and also you will have location of each shape using slide trick.
For the classifier you can use exactly CNN structure of Tensorflow's MNIST tutorial.
Here is a paper with exactly same method applied to finger print images to extract local features.
A direct fingerprint minutiae extraction approach based on convolutional neural networks