What to expect from deep learning object detection on black and white pictures? - tensorflow

With TensorFlow, I want to train an object detection model with my own images based on ssd_inception_v2_coco model. The problem I have is that all my pictures are black and white. What performance can I expect? Should I try to colorize my B&W pictures first? Or at the opposite, should I try to retrain base network with images "uncolorized"? Are there general guidelines for B&W processing of images for deep learning object detection?

I wouldn't go through the trouble of colorizing if you are planning on using a pretrained model. I would expect that explicitly colorizing your images as a pre-processing step would help very little (if at all) since in theory the features that a colorizing network learns can also be learned by the detection network.
If you are planning on pretraining your detection network that was trained on an RGB dataset, make sure you either (i) replace the first convolution in the network with a convolutional layer that expects a single-channel input, or (ii) pad your image with two all-zero channels.
You may get slightly worse detection performance simply because you lose two thirds of the image's pixel information when using BW instead of RGB.


Image Detector with tensorflow

I want to build a simple image detector for custom Binary shapes on images.
I may train and use the models on object detection zoo such as ssd_inception_v2 and so on. But it's would be extremely un efficient as it has sizes in hundreds of Megabytes.
and I can't even imagine to use that in my simple app. can anybody suggest me how to solve this?
I have already built excellent small size classifiers for my images. but can't build small scale efficient detector. (their position with detection boxes)
I think what you need is transfer learning. I would take one of the lightweight models such as MobileNetV2 and retrain on my dataset. It should be pretty quick.If you want to even decrease your model size further, feel free to only take the first few layers of the CNN and retrain it. It would be a bit more work since you need to re-write the part of network you want to use and load it with the pre-trained weights.

Deep learning training with nonidentical images?

[![enter image description here][1]][1]I am actually reconstructing some images using dual photography. Next, I want to train a network to reconstruct clear images by removing noise (Denoising autoencoder).
The input for training the network is reconstructed images, whereas, the output is ground truth or computer based standard test images. Now the input e.g., Lena is some how not exact version of Lena with image shifted in positions and some artifacts.
If I keep input as my reconstructed image and training output as Lena test image (computer standard test image) , will it work?
I only want to know if input/output shifted or some details missing in one of them (due to some cropping) would work.
It depends on many factors like your images for training and the architecture of the network.
However, what you want to do is to make a network that learns the noise or low level information and for this purpose Generative Adversarial Networks (GAN) are very popular. You can read about them here. Maybe, after you have tried your approach and if the results are not satisfactory then try using GANs, like, DCGAN (Deep Convolution GAN).
Also, share your outcomes with the community if you would like.
Denoising Autoencoders! Love it!
There is no reason for not training your model with those images. The autoencoder, if well trained, will eventually learn the transformation if there is enough data.
However, if you have the 'positive' images, I strongly recommend you to create your own noisy images and then train in that controlled working area. You will simplify your problem and it will be easier to solve.
What is stopping you from doing just that?

Tensorflow object detection: why is the location in image affecting detection accuracy when using ssd mobilnet v1?

I'm training a model to detect meteors within a picture of the night sky and I have a fairly small dataset with about 85 images and each image is annotated with a bounding box. I'm using the transfer learning technique starting with the ssd_mobilenet_v1_coco_11_06_2017 checkpoint and Tensorflow 1.4. I'm resizing images to 600x600pixels during training. I'm using data augmentation in the pipeline configuration to randomly flip the images horizontally, vertically and rotate 90 deg. After 5000 steps, the model converges to a loss of about 0.3 and will detect meteors but it seems to matter where in the image the meteor is located. Do I have to train the model by giving examples of every possible location? I've attached a sample of a detection run where I tiled a meteor over the entire image and received various levels of detection (filtered to 50%). How can I improve this?detected meteors in image example
It could very well be your data and I think you are making a prudent move by improving the heterogeneity of your dataset, BUT it could also be your choice of model.
It is worth noting that ssd_mobilenet_v1_coco has the lowest COCO mAP relative to the other models in the TensorFlow Object Detection API model zoo. You aren't trying to detect a COCO object, but the mAP numbers are a reasonable aproximation for generic model accuracy.
At the highest possible level, the choice of model is largely a tradeoff between speed/accuracy. The model you chose, ssd_mobilenet_v1_coco, favors speed over accuracy. Consequently, I would reccomend you try one of the Faster RCNN models (e.g., faster_rcnn_inception_v2_coco) before you spend a signifigant amount of time preprocessing images.

How do different input image sizes/resolutions affect the output quality of semantic image segmentation networks?

While trying to perform image segmentation on images from one dataset (KITTI) with a deep learning network trained on another dataset (Cityscapes) I realized that there is a big difference in subjectively perceived quality of the output (and probably also when benchmarking the (m)IoU).
This raised my question, if and how size/resolution of an input image affects the output from a network for semantic image segmentation which has been trained on images with different size or resolution than the input image.
I attached two images and their corresponding output images from this network: https://github.com/hellochick/PSPNet-tensorflow (using provided weights).
The first image is from the CityScapes dataset (test set) with a width and height of (2048,1024). The network has been trained with training and validation images from this dataset.
CityScapes original image
CityScapes output image
The second image is from the KITTI dataset with a width and height of (1242,375):
KITTI original image
KITTI output image
As one can see, the shapes in the first segmented image are clearly defined while in the second one a detailed separation of objects is not possible.
Neural networks in general are fairly robust to variations in scale, but they certainly aren't perfect. Although I don't have references available off the top of my head there have been a number of papers that show that scale does indeed affect accuracy.
In fact training your network with a dataset with images at varying scales is almost certainly going to improve it.
Also, many of the image segmentation networks used today explicitly build constructs into the network to improve this at the level of the network architecture.
Since you probably don't know exactly how these networks were trained I would suggest that you resize your images to match the approximate shape that the network you are using was trained on. Resizing an image using normal image resize functions is quite a normal preprocessing step.
Since the images you are referencing there are large I'm also going to say that whatever data input pipeline you're feeding them through is already resizing the images on your behalf. Most neural networks of this type are trained on images of around 256x256. The input image is cropped and centered as necessary before training or prediction. Processing very large images like that is extremely compute-intensive and hasn't been found to improve the accuracy much.

deep learning for shape localization and recognition

There is a set of images, each of which contains different shape entities, such as shown in the following figure. I am trying to localize and recognize these different shapes. For instance, adding a bounding box for each different shape and maybe even label it. What are the major research papers/deep learning models that have been able to solve this kind of problem?
Object detection papers such as rcnn, faster rcnn, yolo and ssd would help you solve this if you were bent on using a deep learning approach.
It’s easy to say this is a trivial problem that can be solved with tools in OpenCV and deep learning is overkill, but I can see many reasons to use deep learning tools and that does not answer your question.
We assume that your shapes has different scales and rotations. Actually your main image shown above is very large for training process and it needs a lot of training samples to generate a good accuracy at the end on test samples. In this case it is better to train a Convolutional Neural Network on a short images (like 128x128) with only one shape per each image and then use slide trick!
This project will have three main steps:
Generate test and train samples, each image should have only one shape
Train a classifier to recognize a single shape within each input image
Use slide trick! Break your original image containing many shapes to overlapping blocks of size 128x128. Pass each block to your model trained in the second step.
In this way at the end you will have label for each shape from your trained model, and also you will have location of each shape using slide trick.
For the classifier you can use exactly CNN structure of Tensorflow's MNIST tutorial.
Here is a paper with exactly same method applied to finger print images to extract local features.
A direct fingerprint minutiae extraction approach based on convolutional neural networks