Image Selection for Training Visual Recognition - tensorflow

I am training a classifier for recognizing certain objects in an image. I am using the Watson Visual Recognition API but I would assume that the same question applies to other recognition APIs as well.
I've collected 400 pictures of something - e.g. dogs.
Before I train Watson, I can delete pictures that may throw things off. Should I delete pictures of:
Multiple dogs
A dog with another animal
A dog with a person
A partially obscured dog
A dog wearing glasses
Also, would dogs on a white background make for better training samples?
Watson also takes negative examples. Would cats and other small animals be good negative examples? What else?

You are right that this is a general issue for all kinds of custom classifiers and recognizers - be it vize.it, clarifai, IBM Watson, or training a neural network on your own say in caffe. (Sorted by the number of example images you need to use.)
The important thing you need to ask is how are you going to use the classifier? What are the real images you will feed the machine to predict the objects shown? As a general rule, your training images should be as similar to predict-time images as possible - both in what they depict (kinds and variety of objects) and how they depict it (e.g. backgrounds). Neural networks are super-powerful and if you feed them enough images, they will learn even the hard cases.
Maybe you want to find dog images in user's folders - which will include family photos, screenshots and document scans. Reflect that variety in the training set. Ask the user if a dog with another animal should be tagged as a dog photo.
Maybe you want to find dog images on a wilderness photo trap. Just use various images taken by that photo trap (or several photo traps, if it's a whole network).
In short — tailor your sample images to the task at hand!

Related

How to build an image generation model for interior room design?

I want to build an image generator model for interior room design.
This model should be able to generate an interior image of living room, bedroom, hall, kitchen, bathroom etc.
I have searched about it and found out the following websites.
https://interiorai.com/
https://image.computer/
And I made this picture when visiting https://image.computer.
[! Contemporary living room with a large TV and 3 windows]1
Above result was perfectly what I want. But free account was restricted to 10 credit images.
And input data don't have to be sentenced, just options are enough for me (e.g. Style: modern, type: living, equipments: [TV: wide, Curtain: white, Window: 3]).
So I decided to google pretrained model of interior design generator, and finally gave up.
I would like to build a tensorflow(or keras) model that acts just like image.computer. Please let me find a model or build model.
Thanks

Reverse Image search (for image duplicates) on local computer

I have a bunch of poor quality photos that I extracted from a pdf. Somebody I know has the good quality photo's somewhere on her computer(Mac), but it's my understanding that it will be difficult to find them.
I would like to
loop through each poor quality photo
perform a reverse image search using each poor quality photo as the query image and using this persons computer as the database to search for the higher quality images
and create a copy of each high quality image in one destination folder.
Example pseudocode
for each image in poorQualityImages:
search ./macComputer for a higherQualityImage of image
copy higherQualityImage to ./higherQualityImages
I need to perform this action once.
I am looking for a tool, github repo or library which can perform this functionality more so than a deep understanding of content based image retrieval.
There's a post on reddit where someone was trying to do something similar
imgdupes is a program which seems like it almost achieves this, but I do not want to delete the duplicates, I want to copy the highest quality duplicate to a destination folder
Update
Emailed my previous image processing prof and he sent me this
Off the top of my head, nothing out of the box.
No guaranteed solution here, but you can narrow the search space.
You’d need a little program that outputs the MSE or SSIM similarity
index between two images, and then write another program or shell
script that scans the hard drive and computes the MSE between each
image on the hard drive and each query image, then check the images
with the top X percent similarity score.
Something like that. Still not maybe guaranteed to find everything
you want. And if the low quality images are of different pixel
dimensions than the high quality images, you’d have to do some image
scaling to get the similarity index. If the poor quality images have
different aspect ratios, that’s even worse.
So I think it’s not hard but not trivial either. The degree of
difficulty is partly dependent on the nature of the corruption in the
low quality images.
UPDATE
Github project I wrote which achieves what I want
What you are looking for is called image hashing
. In this answer you will find a basic explanation of the concept, as well as a go-to github repo for plug-and-play application.
Basic concept of Hashing
From the repo page: "We have developed a new image hash based on the Marr wavelet that computes a perceptual hash based on edge information with particular emphasis on corners. It has been shown that the human visual system makes special use of certain retinal cells to distinguish corner-like stimuli. It is the belief that this corner information can be used to distinguish digital images that motivates this approach. Basically, the edge information attained from the wavelet is compressed into a fixed length hash of 72 bytes. Binary quantization allows for relatively fast hamming distance computation between hashes. The following scatter plot shows the results on our standard corpus of images. The first plot shows the distances between each image and its attacked counterpart (e.g. the intra distances). The second plot shows the inter distances between altogether different images. While the hash is not designed to handle rotated images, notice how slight rotations still generally fall within a threshold range and thus can usually be matched as identical. However, the real advantage of this hash is for use with our mvp tree indexing structure. Since it is more descriptive than the dct hash (being 72 bytes in length vs. 8 bytes for the dct hash), there are much fewer false matches retrieved for image queries.
"
Another blogpost for an in-depth read, with an application example.
Available Code and Usage
A github repo can be found here. There are obviously more to be found.
After importing the package you can use it to generate and compare hashes:
>>> from PIL import Image
>>> import imagehash
>>> hash = imagehash.average_hash(Image.open('test.png'))
>>> print(hash)
d879f8f89b1bbf
>>> otherhash = imagehash.average_hash(Image.open('other.bmp'))
>>> print(otherhash)
ffff3720200ffff
>>> print(hash == otherhash)
False
>>> print(hash - otherhash)
36
The demo script find_similar_images also on the mentioned github, illustrates how to find similar images in a directory.
Premise
I'll focus my answer on the image processing part, as I believe implementation details e.g. traversing a file system is not the core of your problem. Also, all that follows is just my humble opinion, I am sure that there are better ways to retrieve your image of which I am not aware. Anyway, I agree with what your prof said and I'll follow the same line of thought, so I'll share some ideas on possible similarity indexes you might use.
Answer
MSE and SSIM - This is a possible solution, as suggested by your prof. As I assume the low quality images also have a different resolution than the good ones, remember to downsample the good ones (and not upsample the bad ones).
Image subtraction (1-norm distance) - Subtract two images -> if they are equal you'll get a black image. If they are slightly different, the non-black pixels (or the sum of the pixel intensity) can be used as a similarity index. This is actually the 1-norm distance.
Histogram distance - You can refer to this paper: https://www.cse.huji.ac.il/~werman/Papers/ECCV2010.pdf. Comparing two images' histograms might be potentially robust for your task. Check out this question too: Comparing two histograms
Embedding learning - As I see you included tensorflow, keras or pytorch as tags, let's consider deep learning. This paper came to my
mind: https://arxiv.org/pdf/1503.03832.pdf The idea is to learn a
mapping from the image space to a Euclidian space - i.e. compute an
embedding of the image. In the embedding hyperspace, images are
points. This paper learns an embedding function by minimizing the
triplet loss. The triplet loss is meant to maximize the distance
between images of different classes and minimize the distance between
images of the same class. You could train the same model on a Dataset
like ImageNet. You could augment the dataset with by lowering the
quality of the images, in order to make the model "invariant" to
difference in image quality (e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc.). Once you can
compute embedding, you could compute the Euclidian distance (as a
substitute of the MSE). This might work better than using MSE/SSIM as a similarity indexes. Repo of FaceNet: https://github.com/timesler/facenet-pytorch. Another general purpose approach (not related to faces) which might help you: https://github.com/zegami/image-similarity-clustering.
Siamese networks for predicting similarity score - I am referring to this paper on face verification: http://bmvc2018.org/contents/papers/0410.pdf. The siamese network takes two images as input and outputs a value in the [0, 1]. We can interpret the output as the probability that the two images belong to the same class. You can train a model of this kind to predict 1 for image pairs of the following kind: (good quality image, artificially degraded image). To degrade the image, again, you can combine e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc. Let the model predict 0 for image pairs of different classes (e.g. different images). The output of the network can e used as a similarity index.
Remark 1
These different approaches can also be combined. They all provide you with similarity indexes, so you can very easily average the outcomes.
Remark 2
If you only need to do it once, the effort you need to put in implementing and training deep models might be not justified. I would not suggest it. Still, you can consider it if you can't find any other solution and that Mac is REALLY FULL of images and a manual search is not possible.
If you look at the documentation of imgdupes you will see there is the following option:
--dry-run
dry run (do not delete any files)
So if you run imgdupes with --dry-run you will get a listing of all the duplicate images but it will not actually delete anything. You should be able to process that output to move the images around as you need.
Try similar image finder I have developed to address this problem.
There is an explanation and the algorithm there, so you can implement your own version if needed.

Using AI to detect damaged parts

I need to use computer vision to detect damaged parts of cars. I have the images of car before and after damage, how do I use computer vision/ AI to detect that, in this case, left headlight and bumper is damaged. I have a dataset of 70 similar image pairs.
I tried image processing, by overlaying the images on top of each to detect damage. But not all images in the dataset fits when overlaid.
I can Mask RCNN to detect the damaged region but how do I reduce it to the parts being damaged?
Before Damage
After Damage
Check out Mask R-CNN. You can train a model with multiple images of damage on cars. Just annotate your data, then train it. Once you have trained, you can then use the splash feature to only highlight areas that you want, i.e, damage. Its fairly easy to set up, and it seems perfect in your case.

How object detection training works while backpropagating?

I'm using tensorflow to train a f-rcnn inception v2 model.
Let's say I have 6000 images:
from img 1 to 3000: each image has both a dog and a cat, but I only labeled the dog.
from img 3001 to 6000: each image has both a dog and a cat, but I only labeld the cat.
So each image has a dog and a cat in it, but I only labeled the dog in half of them, and labeled the cat in the other half.
When creating the dataset I don't shuffle the images, so I'll have the first 3000 imgs labeled with dogs, then the other 3000 imgs labeled with cats.
My questions are:
Does the order of the images affect the result? Would it change if I create a dataset with the dogs first then the cats? Would that be different if I shuffle the data so I mix cats and dogs?
When backpropagating, does the fact that I didn't label the cats while labeled the dog and viceversa affect the result? Does the model unlearned because I have dogs and cats unlabeled? Do I get the same result as having 3000 images with both dog and cat labeled for each image?
The reason I don't label both dogs and cats in each image is because I have images of a fixed camera where sometimes you see different dogs or the same dog moving around, while a cat is sleeping. So labeling the sleeping cat every time would mean having the same image as input multiple times. (and of course it takes a lot of time to label). How could I solve this? Should I crop the image before creating the dataset? Is it enogh if I create an eval
dataset where I have both dogs and cats labeled in each image and a train dataset where I only have the object (dog) label and not the cat?
Thanks
1- Yes the order of images affects the result[1], and more significantly it will affect the speed at which your algorithm will learn. In its essence your algorithm is trying to learn a configuration of weights which minimise your loss function for all examples which you show to it. It does this by arranging these weights into a configuration which detects those features in the data which discriminate between cats and dogs. But it does this only by considering one batch of inputs at a time. Each image in a batch is considered individually and back-prop decides how the weights should be altered so that the algorithm better detects the cat/dog in that image. It then averages all of these alterations for every image in the batch and makes that adjustment.
If your batch contains all of your images then the order would not matter; it would make the adjustment that it expects will provide the greatest net reduction in your loss function for all data. But if the batch contains less than all data (which it invariably does) then it makes an adjustment that helps detect the dogs/cats only in the images in that batch. This means if you show it more cats than dogs, it will decide that a feature belonging equally to both cats and dogs actually produces an increased probability that the animal in question is a cat, which is false. Because in instances where that feature was detected a higher probability were cats. This will correct itself over time as the ratio of cats:dogs evens out, but will arrive at its final configuration much more slowly, becase it has to learn and unlearn non-helpful features in the data.
As an example, in your setup by the time your algorithm has observed half of the data, all it has learned is that "all things that look like a cat or a dog are dogs". Those features which discriminate between cats and dogs in the images have not been helpful to reducing your loss function. Actually it will have miss-learnt features common to both cats and dogs as being dog-specific, and will have to unlearn them later as it sees more data.
In terms of the overall outcome: during the training process you are essentially traversing a highly dimensional optimisation space following its gradient until the configuration of weights arrives at a local minimum in this space from which the magnitude of the barrier to escape exceeds that which is allowed by your learning rate. Showing one class then the other will lead to a more meandering path towards the global minimum and thus increase the likelihood of becoming stuck in a sub-optimal local minimum. [2]
2- If all of the images in your data set contain a dog, you really want to label that dog in every image. This does three things:
Doubles the size of your data set (more data = better results).
Prevents falsely penalising the model for accurately detecting a dog in the images where you have not labelled the dog.
Prevents the algorithm from detecting unrelated features in the images.
Doubling the size of your data set is good for obvious reasons. But by showing inputs that contain a dog without labelling that dog you are essentially telling your algorithm that that image contains no dog[3]. Which is false. You are essentially changing the patterns you are asking the algorithm to detect from those which can separate cat/dog vs. no-cat/dog and cat vs dog to whose which can separate labelled-dogs vs unlabelled-dogs, which are not helpful features for your task.
Lastly by failing to label half of the dogs, your algorithm will learn to discriminate between those dogs which are labelled and those which are not. This means instead of learning features common to dogs, it will learn features that separate those dogs in the labelled images from those in the unlabelled images. These could be background features in the images or small generalisations which appear more strongly in the labelled dogs that the unlabelled by chance.
3- This question is a little more difficult, and there is no easy solution to your problem here. Your model can only learn features to which it is exposed during training and this means if you only show it one image of a cat (or several images in which the representation of the cat is identical) your model will learn features specific to this one image. This will lead quickly to the common problem of over-fitting, where your model learns features which are specific to your training examples and do not generalise well to other instances of cats.
It would not be sufficient to crop out the cat during training and then simply include the cat in the eval data set because you will be asking the model to detect features to which it has not been exposed during training and thus has not learned.
You want to include your labelled cat in every instance in which it appears in your data and regularise your network to limit over-fitting. In addition to this in the presence of data poverty it is often beneficial to use pre-training to learn cat specific features from unlabelled data prior to training, and/or to use data augmentation to artificially enhance the diversity of your data.
These suggestions are likely to improve your results but the reality is that sourcing large, diverse, data sets that comprehensively incorporate those features that are key to identifying your object is a major part of building a successful deep learning model. It depends on how uniform the instances of the cat are in your data, but if your model has only ever seen a cat from the front, its not going to recognise a cat from the back.
TLDR:
1 YES, shuffle them.
2 YES, label them all.
3 Get better data. Or: pretrain, regularise and augment data.
[1] This does depend on the size of the batches in which you feed your data into the model.
[2] This is based on my own intuition and I am happy to be corrected here.
[3] This depends to some extent on how your loss function handles images in which there is no dog.

using HAAR training for post-it note recognition

I need to be able to detect a variety of coloured post-it notes via a Microsoft Kinect video stream. I have tried using Emgucv for edge detection but it doesn't seem to locate the vertices/edges and also colour segmentation/detection however considering the variety of colours that may not be robust enough.
I am attempting to use HAAR classification. Can anyone suggest the best variety of positive/negative images to use. For example, for the positive images should I take pictures of many different coloured post-it notes in various lighting conditions and orientations? Seeing as it is quite a simple shape ( a square) is using HAAR classification over-complicating things?
I haar classifiers are typically used on black and white images and trigger primarily on morphologic edge like feature. Seems like if you want to find post it notes in an image the easiest method would be to look at colors (since they come in very distinct colors). Have you tried training a SVM of Random forest classifier to detect post it notes based on just color? Once you've identified areas in the image that are probably post it notes you could start looking at things like the shape as additional validation that you are indeed looking at a post it note.
Take a look at the following as an example of how to find rectangles in an image using hough transform:
https://opencv-code.com/tutorials/automatic-perspective-correction-for-quadrilateral-objects/