Reverse Image search (for image duplicates) on local computer - tensorflow

I have a bunch of poor quality photos that I extracted from a pdf. Somebody I know has the good quality photo's somewhere on her computer(Mac), but it's my understanding that it will be difficult to find them.
I would like to
loop through each poor quality photo
perform a reverse image search using each poor quality photo as the query image and using this persons computer as the database to search for the higher quality images
and create a copy of each high quality image in one destination folder.
Example pseudocode
for each image in poorQualityImages:
search ./macComputer for a higherQualityImage of image
copy higherQualityImage to ./higherQualityImages
I need to perform this action once.
I am looking for a tool, github repo or library which can perform this functionality more so than a deep understanding of content based image retrieval.
There's a post on reddit where someone was trying to do something similar
imgdupes is a program which seems like it almost achieves this, but I do not want to delete the duplicates, I want to copy the highest quality duplicate to a destination folder
Update
Emailed my previous image processing prof and he sent me this
Off the top of my head, nothing out of the box.
No guaranteed solution here, but you can narrow the search space.
You’d need a little program that outputs the MSE or SSIM similarity
index between two images, and then write another program or shell
script that scans the hard drive and computes the MSE between each
image on the hard drive and each query image, then check the images
with the top X percent similarity score.
Something like that. Still not maybe guaranteed to find everything
you want. And if the low quality images are of different pixel
dimensions than the high quality images, you’d have to do some image
scaling to get the similarity index. If the poor quality images have
different aspect ratios, that’s even worse.
So I think it’s not hard but not trivial either. The degree of
difficulty is partly dependent on the nature of the corruption in the
low quality images.
UPDATE
Github project I wrote which achieves what I want

What you are looking for is called image hashing
. In this answer you will find a basic explanation of the concept, as well as a go-to github repo for plug-and-play application.
Basic concept of Hashing
From the repo page: "We have developed a new image hash based on the Marr wavelet that computes a perceptual hash based on edge information with particular emphasis on corners. It has been shown that the human visual system makes special use of certain retinal cells to distinguish corner-like stimuli. It is the belief that this corner information can be used to distinguish digital images that motivates this approach. Basically, the edge information attained from the wavelet is compressed into a fixed length hash of 72 bytes. Binary quantization allows for relatively fast hamming distance computation between hashes. The following scatter plot shows the results on our standard corpus of images. The first plot shows the distances between each image and its attacked counterpart (e.g. the intra distances). The second plot shows the inter distances between altogether different images. While the hash is not designed to handle rotated images, notice how slight rotations still generally fall within a threshold range and thus can usually be matched as identical. However, the real advantage of this hash is for use with our mvp tree indexing structure. Since it is more descriptive than the dct hash (being 72 bytes in length vs. 8 bytes for the dct hash), there are much fewer false matches retrieved for image queries.
"
Another blogpost for an in-depth read, with an application example.
Available Code and Usage
A github repo can be found here. There are obviously more to be found.
After importing the package you can use it to generate and compare hashes:
>>> from PIL import Image
>>> import imagehash
>>> hash = imagehash.average_hash(Image.open('test.png'))
>>> print(hash)
d879f8f89b1bbf
>>> otherhash = imagehash.average_hash(Image.open('other.bmp'))
>>> print(otherhash)
ffff3720200ffff
>>> print(hash == otherhash)
False
>>> print(hash - otherhash)
36
The demo script find_similar_images also on the mentioned github, illustrates how to find similar images in a directory.

Premise
I'll focus my answer on the image processing part, as I believe implementation details e.g. traversing a file system is not the core of your problem. Also, all that follows is just my humble opinion, I am sure that there are better ways to retrieve your image of which I am not aware. Anyway, I agree with what your prof said and I'll follow the same line of thought, so I'll share some ideas on possible similarity indexes you might use.
Answer
MSE and SSIM - This is a possible solution, as suggested by your prof. As I assume the low quality images also have a different resolution than the good ones, remember to downsample the good ones (and not upsample the bad ones).
Image subtraction (1-norm distance) - Subtract two images -> if they are equal you'll get a black image. If they are slightly different, the non-black pixels (or the sum of the pixel intensity) can be used as a similarity index. This is actually the 1-norm distance.
Histogram distance - You can refer to this paper: https://www.cse.huji.ac.il/~werman/Papers/ECCV2010.pdf. Comparing two images' histograms might be potentially robust for your task. Check out this question too: Comparing two histograms
Embedding learning - As I see you included tensorflow, keras or pytorch as tags, let's consider deep learning. This paper came to my
mind: https://arxiv.org/pdf/1503.03832.pdf The idea is to learn a
mapping from the image space to a Euclidian space - i.e. compute an
embedding of the image. In the embedding hyperspace, images are
points. This paper learns an embedding function by minimizing the
triplet loss. The triplet loss is meant to maximize the distance
between images of different classes and minimize the distance between
images of the same class. You could train the same model on a Dataset
like ImageNet. You could augment the dataset with by lowering the
quality of the images, in order to make the model "invariant" to
difference in image quality (e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc.). Once you can
compute embedding, you could compute the Euclidian distance (as a
substitute of the MSE). This might work better than using MSE/SSIM as a similarity indexes. Repo of FaceNet: https://github.com/timesler/facenet-pytorch. Another general purpose approach (not related to faces) which might help you: https://github.com/zegami/image-similarity-clustering.
Siamese networks for predicting similarity score - I am referring to this paper on face verification: http://bmvc2018.org/contents/papers/0410.pdf. The siamese network takes two images as input and outputs a value in the [0, 1]. We can interpret the output as the probability that the two images belong to the same class. You can train a model of this kind to predict 1 for image pairs of the following kind: (good quality image, artificially degraded image). To degrade the image, again, you can combine e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc. Let the model predict 0 for image pairs of different classes (e.g. different images). The output of the network can e used as a similarity index.
Remark 1
These different approaches can also be combined. They all provide you with similarity indexes, so you can very easily average the outcomes.
Remark 2
If you only need to do it once, the effort you need to put in implementing and training deep models might be not justified. I would not suggest it. Still, you can consider it if you can't find any other solution and that Mac is REALLY FULL of images and a manual search is not possible.

If you look at the documentation of imgdupes you will see there is the following option:
--dry-run
dry run (do not delete any files)
So if you run imgdupes with --dry-run you will get a listing of all the duplicate images but it will not actually delete anything. You should be able to process that output to move the images around as you need.

Try similar image finder I have developed to address this problem.
There is an explanation and the algorithm there, so you can implement your own version if needed.

Related

How to design a neural network for input of various size?

I'm wondering, how to design a neural network, where the input data can have different shape, as the network has some fixed number of nodes in the input layer.
Typically when I want to train a image classification network for pictures with unknown (various) resolution or when I want to classify a text, with various length.
For example for images I surely can have some preprocessing pipeline which will resize the image, but I can lose some information with it, in the case of text, the "resizing" would be even harder to perform.
Is there any trick, how to design such a network?
Three possibilities come to mind.
The easiest is the zero-padding. Basically, you take a rather big input size and just add zeroes if your concrete input is too small. Of course, this is pretty limited and certainly not useful if your input ranges from a few words to full texts.
Recurrent NNs (RNN) are a very natural NN to choose if you have texts of varying size as input. You input words as word vectors (or embeddings) just one after another and the internal state of the RNN is supposed to encode the meaning of the full string of words. This is one of the earlier papers.
Another possibility is using recursive NNs. This is basically a form of preprocessing in which a text is recursively reduced to a smaller number of word vectors until only one is left - your input, which is supposed to encode the whole text. This makes a lot of sense from a linguistic point of view if your input consists of sentences (which can vary a lot in size), because sentences are structured recursively. For example, the word vector for "the man", should be similar to the word vector for "the man who mistook his wife for a hat", because noun phrases act like nouns, etc. Often, you can use linguistic information to guide your recursion on the sentence. If you want to go way beyond the Wikipedia article, this is probably a good start.
In case of images, instead of asking your NN to recognize what's in your image, you can ask it what's in a particular e.g. 256x256 size part of your picture. You train for that, and use it for certain partly overlapped rolling windows of your whole image. If your to-be-recognized pattern varies in size a lot, you can resize it, and use your NN again.

Should the size of the photos be the same for deep learning?

I have lots of image (about 40 GB).
My images are small but they don't have same size.
My images aren't from natural things because I made them from a signal so all pixels are important and I can't crop or delete any pixel.
Is it possible to use deep learning for this kind of images with different shapes?
All pixels are important, please take this into consideration.
I want a model which does not depend on a fixed size input image. Is it possible?
Without knowing what you're trying to learn from the data, it's tough to give a definitive answer:
You could pad all the data at the beginning (or end) of the signal so
they're all the same size. This allows you to keep all the important
pixels, but adds irrelevant information to the image that the network
will most likely ignore.
I've also had good luck with activations where you take a pretrained
network and pull features from the image at a certain part of the
network regardless of size (as long as it's larger than the network
input size). Then run through a classifier.
https://www.mathworks.com/help/deeplearning/ref/activations.html#d117e95083
Or you could window your data, and only process smaller chunks at one
time.
https://www.mathworks.com/help/audio/examples/cocktail-party-source-separation-using-deep-learning-networks.html

Tiled Instance Normalization

I am currently implementing a few image style transfer algorithms for Tensorflow, but I would like to do it in tiles, so I don't have to run the entire image through the network. Everything works fine, however each image is normalized differently, according to its own statistics, which results in tiles with slightly different characteristics.
I am certain that the only issue is instance normalization, since if I feed the true values (obtained from the entire image) to each tile calculation the result is perfect, however I still have to run the entire image through the network to calculate these values. I also tried calculating these values using a downsampled version of the image, but resolution suffers a lot.
So my question is: is it possible to estimate mean and variance values for instance normalization without feeding the entire image through the network?
You can take a random sample of the pixels of the image, and use the sample mean and sample variance to normalize the whole image. It will not be perfect, but the larger the sample, the better. A few hundred pixels will probably suffice, maybe even less, but you need to experiment.
Use tf.random_uniform() to get random X and Y coordinates, and then use tf.gather_nd() to get the pixel values at the given coordinates.

Should deep learning classification be used to classify details such as liquid level in bottle

Can deep learning classification be used to precisely label/classify both the object and one of its features. For example to identify the bottle (like Grants Whiskey) and liquid level in the bottle (in 10 percent steps - like 50% full). Is this the problem that can be best solved utilizing some of deep learning frameworks (Tensorflow etc) or some other approach is more effective?
Well, this should be well possible if the liquor is well colored. If not (e.g. gin, vodka), I'd say you have no chance with today's technologies when observing the object from a natural view angle and distance.
For colored liquor, I'd train two detectors. One for detecting the bottle, and a second one to detect the liquor given the bottle. The ratio between the two will be your percentage.
Some of the proven state-of-the-art deep learning-based object detectors (just Google them):
Multibox
YOLO
Faster RCNN
Or non-deep-learning-based:
Deformable part model
EDIT:
I was ask to elaborate more. Here is an example:
The box detector e.g. draws a box in the image at [0.1, 0.2, 0.5, 0.6] (min_height, min_width, max_height, max_width) which is the relative location of your bottle.
Now you crop the bottle from the original image and feed it to the second detector. The second detector draws e.g. [0.2, 0.3, 0.7, 0.8] in your cropped bottle image, the location indicates the fluid it has detected. Now (0.7 - 0.2) * (0.8 - 0.3) = 0.25 is the relative area of the fluid with respect to the area of the bottle, which is what OP is asking for.
EDIT 2:
I entered this reply assuming OP wants to use deep learning. I'd agree other methods should be considered if OP is still unsure with deep learning. For bottle detection, deep learning-based methods have shown to outperform traditional methods by a large margin. Bottle detection happens to be one of the classes in the PASCAL VOC challenge. See results comparison here: http://rodrigob.github.io/are_we_there_yet/build/detection_datasets_results.html#50617363616c20564f43203230313020636f6d7034
For the liquid detection however, deep learning might be slightly overkill. E.g. if you know what color you are looking for, even a simple color filter will give you "something"....
The rule of thumb for deep learning is, if it is visible in the image, hence a expert can tell you the answer solely based on the image then the chances are very high that you can learn this with deep learning, given enough annotated data.
However you are quite unlikely to have the required data needed for such a task, therefore I would ask myself the question if i can simplify the problem. For example you could take gin, vodka and so on and use SIFT to find the bottle again in a new scene. Then RANSAC for bottle detection and cut the bottle out of the image.
Then I would try gradient features to find the edge with the liquid level. Finally you can calculate the percentage with (liquid edge - bottom) / (top bottle - bottom bottle).
Identifying the bottle label should not be hard to do - it's even available "on tap" for cheap (these guys actually use it to identify wine bottle labels on their website): https://aws.amazon.com/blogs/aws/amazon-rekognition-image-detection-and-recognition-powered-by-deep-learning/
Regarding the liquid level, this might be a problem AWS wouldn't be able to solve straight away - but I'm sure it's possible to make a custom CNN for it. Alternatively, you could use good old humans on Amazon Mechanical Turk to do the work for you.
(Note: I do not work for Amazon)

Automatic feature extraction from chess board positions

I am working on a project where I take a chess board position (FEN string converted to binary) & it's evaluation score and feed it to a neural network. My aim is to make the neural network differentiate between good and bad positions.
How I encode the position : There are 12 unique pieces in chess i.e pawn, rook, knight, bishop, queen and king for white as well as black. I encode each piece using 4 bits with 0000 denoting an empty square. So the 64 squares are encoded into 256 bits and I use 6 more bits to denote game state like whose turn it is to move, king-castle status, etc.
Problem : Since the input space for chess positions is neither smooth nor uni-modal (one small change in the board position can result in a huge change in the evaluation score), the neural network doesn't learn well. Now, the next logical thing to somehow extract useful features (like material difference, center control, etc) and feed it to the network.
I do not want to hand pick the features as I want the network to learn everything by itself. Therefore I am thinking of extracting features automatically using autoencoders. Is there any better way to accomplish this?
Summary : What is the best way to automatically extract features from a chess board position so that it can be fed into a neural network?
UPDATE : To generate training data, I have modified Stockfish to dump it's evaluation process into a log file. So every new move(position) it considers is written to a file as an FEN string along with it's eval score
Neural networks can give an approximation of any function. The only consideration to do is the dimensionality of the search space, which give constraints to the amount of data you have to get a good approximation.
For a supervised network (you use autoencoders, then I think you use some variant of backpropagation), it's difficult for me to immagine how you think to do the trainig using single positions because you need similar positions in your training set. Maybe your approach is different, but I'm convinced that second strategy (using features) is more promising. I think using positions require a huge amount of data training to get good results.
For features take a look here, and to the classical work of Shannon.
I taked also useful informations from the source code of Crafty.
But you have to extract these informations from the FEN string.
Autoencoders are a way to give a reduction of data (good because increase performances). It seems to be better the use of Pincipal Component Analysys, as reported here.
I hope this can help you.