Extract 2D-image patches for training image and mask used in Semantic Segmentation - tensorflow

I was wondering as to what would be my best approach to this problem. I have a 6000-by-6000 image with a 6000-by-6000 mask. I wanted to crop the image into several sub-images before training and came across Extract_Patches_2d in Scikit learn. Looks like the tool to get the job done but I have one issue. If I run this on a single image, how can I be sure that it will use the same patch as a blueprint for the image mask as well ?

Related

Is there an unsupervised Tensorflow/Pytorch model that can detect objects that keep on changing?

I've been looking for a way to detect objects that keep changing through a series of images.
Since there could be a lot of lighting conditions, a basic image processing script wouldn't do it.
I'm looking for a way to have a model trained on images taken during intervals, and later it outputs a mask with the location of the objects that keep changing.
I've searched for unsupervised deep learning models that do this or similar tasks, but I found nothing.
Is this a problem that can be solved with the use of AI, or is there a better solution for this problem?
The following image clarifies the wanted pipeline:

Deep learning training with nonidentical images?

[![enter image description here][1]][1]I am actually reconstructing some images using dual photography. Next, I want to train a network to reconstruct clear images by removing noise (Denoising autoencoder).
The input for training the network is reconstructed images, whereas, the output is ground truth or computer based standard test images. Now the input e.g., Lena is some how not exact version of Lena with image shifted in positions and some artifacts.
If I keep input as my reconstructed image and training output as Lena test image (computer standard test image) , will it work?
I only want to know if input/output shifted or some details missing in one of them (due to some cropping) would work.
It depends on many factors like your images for training and the architecture of the network.
However, what you want to do is to make a network that learns the noise or low level information and for this purpose Generative Adversarial Networks (GAN) are very popular. You can read about them here. Maybe, after you have tried your approach and if the results are not satisfactory then try using GANs, like, DCGAN (Deep Convolution GAN).
Also, share your outcomes with the community if you would like.
Denoising Autoencoders! Love it!
There is no reason for not training your model with those images. The autoencoder, if well trained, will eventually learn the transformation if there is enough data.
However, if you have the 'positive' images, I strongly recommend you to create your own noisy images and then train in that controlled working area. You will simplify your problem and it will be easier to solve.
What is stopping you from doing just that?

Different results in image classifier using JPG BMP

I trained an Image Classifier with Tensforflow using a bunch of JPG images.
Let's say I have 3 classifiers, ClassifierA, ClassifierB, ClassifierC.
When testing the classifiers, I have no issues at all in 90% of the images I use as a test. But in some cases, I have misclassifications due to the image quality.
For example, the image below is the same, saved as BMP and JPG. You'll see little differences due to the format quality.
When I test the BMP version using tf.image.decode_bmp I get misclassifications, let's say ClassifierA 70%
When I test the JPG version using tf.image.decode_jpeg I get the right one, ClassifierB 90%
When I test the JPG version using tf.image.decode_jpeg and dct_method="INTEGER_ACCURATE" I get the right one with the much better result, ClassifierB 99%
What could be the issue here? Such difference between BMP and JPG, and how can I solve this if there's a solution?
update1: I retrained my Classifier using different effects and randomly changing the quality in which I save the images I use as a dataset.
Now, I get the right output, but still the percentages changes a lot, for example44% with BMP and +90% with JPG
This is a fabulous question, and even more fabulous of an observation. I'm going to use this in my own work in the future!
I expect you have just identified a rather fascinating issue with the dataset. It appears that your model is overfitting to features specific to JPG compression. The solution is to increase data augmentation. In particular, convert your training samples between various formats randomly.
This issue also makes me think that sharpening and blurring operations would make good data augmentation features. It's common to alter color, contrast, rotation, scale, orientation, and translation of the image to augmentat the training dataset, but I don't commonly see blur and sharpness used. I suspect these two data augmentation techniques will go a long way to resolving your issue by themselves.
In case the OP (or others reading this) are not terribly familiar with what "data augmentation" is, I'll define it. It is common to warp your training images in various ways to generate endlessly unique images from your (otherwise finite) dataset. For example, randomly flipping the image left/right is quite simple, common, and effectively doubles your dataset. Changing contrast and brightness settings further alter your images. Adding these and other data augmentation transformations to your pipeline creates a much richer dataset and trains a network that is more robust to these common variations in images.
It's important that the data augmentation techniques you use produce realistic variations. For example, rotating an image is quite a realistic augmentation technique. If your training image is a cat standing horizontally, it's realistically possible that a future sample might be a cat at a 25-degree angle.

(Tensorflow) What is the correct NN needed for this task?

I'm about to start developing a neural net here with Tensorflow, but before I get into it too deep, I was hoping I could get some feedback on exactly what type of neural net I will need for this (If a net is the right way to go about this at all)
I need the NN to input an image, and output another image. This will be used for path-mapping on a robot I'm working on. The input image will be a disparity map, and the output will be a "driveable map" (an image that displays what in the scene can be driven on, and what can't)
I have built a dataset using Unity 3d. Here is an example from the set:
disparity map
driveable map:
As you can probably see, white represents the area where my robot can drive and black is where it can't. I will need the NN to take a disparity map, and give me back a "driveable map". Can this be done?
Thanks!
Sorry I'm not an expert. Since there hasn't been a response on this and if you are still looking, the vocabulary I would use to describe this type of problem is disparity networks and segmentation. Your best bet may be a specific type of disparity network: U-net

How can i detect and localize object using tensorflow and convolutional neural network?

My problem statement is as follows :
" Object Detection and Localization using Tensorflow and convolutional neural network "
What i did ?
I am done with the cat detection from images using tflearn library.I successfully trained a model using 25000 images of cats and its working fine with good accuracy.
Current Result :
What i wanted to do?
If my image consist of two or more than two objects in the same image for example cat and dog together so my result should be 'cat and dog' and apart from this i have to find the exact location of these two objects on the image(bounding box)
I came across many high level libraries like darknet , SSD but not able to get the concept behind it.
Please guide me about the approach to solve the problem.
Note : I am using supervised learning techniques.
Expected Result :
You have several ways to go about it.
The most straight forward way is to get some suggested bounding boxes using some bounding box suggestion algorithm like selective search and run on each on of the suggestion the classification net that you already trained. This approach is the approach taken by R-CNN.
For more advanced algorithm based on the above approach i suggest you read about Fast-R-CNN and Faster R-CNN.
Look at Object detection with R-CNN? for some basic explanation.
Darknet and SSD are based on a different approach if you want to undestand them you can read about them on
http://www.cs.unc.edu/~wliu/papers/ssd.pdf
https://pjreddie.com/media/files/papers/yolo.pdf
Image localization is a complex problem with many different implementations achieving the same result with different efficiency.
There are 2 main types of implementation
-Localize objects with regression
-Single Shot Detectors
Read this https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html to get a better idea.
Cheers
I have done a similar project (detection + localization) on Indian Currencies using PyTorch and ResNet34. Following is the link of my kaggle notebook, hope you find it helpful. I have manually collected images from the internet and made bounding box around them and saved their annotation file (Pascal VOC) using "LabelImg" annotation tool.
https://www.kaggle.com/shweta2407/objectdetection-on-custom-dataset-resnet34