I'm using SURF to extract features from images and match them to others. My Problem is that some images have in excess of 20000 features which slows down matching to a crawl.
Is there a way I can extract only the n most significant features from that set?
I tried computing MSER for the image and only use features that are within those regions. That gives me a reduction anywhere from 5% to 40% without affecting matching quality negatively, but that's unreliable and still not enough.
I could additionally size the image down, but I that seems to affect the quality of features severely in some cases.
SURF offers a few parameters (hessian threshold, octaves and layers per octave) but I couldn't find anything on how changing these would affect feature significance.
After some researching and testing I have found that the Hessian value for each feature is a rough estimate of it's strength, however using the top n features sorted by the hessian is not optimal.
I achieved better results when doing the following until number of features is below the target of n:
Size the image down, if it is overly large
Only features that lie in MSER regions are considered
For features that lie very close to each other, only the feature with the higher hessian is considered
Of the n features per image that I want to save, 75% are the features with the highest hessian values
The remaining features are taken randomly from the remainder, weighted by distribution of the hessian values computed through a histogram
Now I only need to find a suitable n, but around 1500 seems enough currently.
Related
I have a bunch of poor quality photos that I extracted from a pdf. Somebody I know has the good quality photo's somewhere on her computer(Mac), but it's my understanding that it will be difficult to find them.
I would like to
loop through each poor quality photo
perform a reverse image search using each poor quality photo as the query image and using this persons computer as the database to search for the higher quality images
and create a copy of each high quality image in one destination folder.
Example pseudocode
for each image in poorQualityImages:
search ./macComputer for a higherQualityImage of image
copy higherQualityImage to ./higherQualityImages
I need to perform this action once.
I am looking for a tool, github repo or library which can perform this functionality more so than a deep understanding of content based image retrieval.
There's a post on reddit where someone was trying to do something similar
imgdupes is a program which seems like it almost achieves this, but I do not want to delete the duplicates, I want to copy the highest quality duplicate to a destination folder
Update
Emailed my previous image processing prof and he sent me this
Off the top of my head, nothing out of the box.
No guaranteed solution here, but you can narrow the search space.
You’d need a little program that outputs the MSE or SSIM similarity
index between two images, and then write another program or shell
script that scans the hard drive and computes the MSE between each
image on the hard drive and each query image, then check the images
with the top X percent similarity score.
Something like that. Still not maybe guaranteed to find everything
you want. And if the low quality images are of different pixel
dimensions than the high quality images, you’d have to do some image
scaling to get the similarity index. If the poor quality images have
different aspect ratios, that’s even worse.
So I think it’s not hard but not trivial either. The degree of
difficulty is partly dependent on the nature of the corruption in the
low quality images.
UPDATE
Github project I wrote which achieves what I want
What you are looking for is called image hashing
. In this answer you will find a basic explanation of the concept, as well as a go-to github repo for plug-and-play application.
Basic concept of Hashing
From the repo page: "We have developed a new image hash based on the Marr wavelet that computes a perceptual hash based on edge information with particular emphasis on corners. It has been shown that the human visual system makes special use of certain retinal cells to distinguish corner-like stimuli. It is the belief that this corner information can be used to distinguish digital images that motivates this approach. Basically, the edge information attained from the wavelet is compressed into a fixed length hash of 72 bytes. Binary quantization allows for relatively fast hamming distance computation between hashes. The following scatter plot shows the results on our standard corpus of images. The first plot shows the distances between each image and its attacked counterpart (e.g. the intra distances). The second plot shows the inter distances between altogether different images. While the hash is not designed to handle rotated images, notice how slight rotations still generally fall within a threshold range and thus can usually be matched as identical. However, the real advantage of this hash is for use with our mvp tree indexing structure. Since it is more descriptive than the dct hash (being 72 bytes in length vs. 8 bytes for the dct hash), there are much fewer false matches retrieved for image queries.
"
Another blogpost for an in-depth read, with an application example.
Available Code and Usage
A github repo can be found here. There are obviously more to be found.
After importing the package you can use it to generate and compare hashes:
>>> from PIL import Image
>>> import imagehash
>>> hash = imagehash.average_hash(Image.open('test.png'))
>>> print(hash)
d879f8f89b1bbf
>>> otherhash = imagehash.average_hash(Image.open('other.bmp'))
>>> print(otherhash)
ffff3720200ffff
>>> print(hash == otherhash)
False
>>> print(hash - otherhash)
36
The demo script find_similar_images also on the mentioned github, illustrates how to find similar images in a directory.
Premise
I'll focus my answer on the image processing part, as I believe implementation details e.g. traversing a file system is not the core of your problem. Also, all that follows is just my humble opinion, I am sure that there are better ways to retrieve your image of which I am not aware. Anyway, I agree with what your prof said and I'll follow the same line of thought, so I'll share some ideas on possible similarity indexes you might use.
Answer
MSE and SSIM - This is a possible solution, as suggested by your prof. As I assume the low quality images also have a different resolution than the good ones, remember to downsample the good ones (and not upsample the bad ones).
Image subtraction (1-norm distance) - Subtract two images -> if they are equal you'll get a black image. If they are slightly different, the non-black pixels (or the sum of the pixel intensity) can be used as a similarity index. This is actually the 1-norm distance.
Histogram distance - You can refer to this paper: https://www.cse.huji.ac.il/~werman/Papers/ECCV2010.pdf. Comparing two images' histograms might be potentially robust for your task. Check out this question too: Comparing two histograms
Embedding learning - As I see you included tensorflow, keras or pytorch as tags, let's consider deep learning. This paper came to my
mind: https://arxiv.org/pdf/1503.03832.pdf The idea is to learn a
mapping from the image space to a Euclidian space - i.e. compute an
embedding of the image. In the embedding hyperspace, images are
points. This paper learns an embedding function by minimizing the
triplet loss. The triplet loss is meant to maximize the distance
between images of different classes and minimize the distance between
images of the same class. You could train the same model on a Dataset
like ImageNet. You could augment the dataset with by lowering the
quality of the images, in order to make the model "invariant" to
difference in image quality (e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc.). Once you can
compute embedding, you could compute the Euclidian distance (as a
substitute of the MSE). This might work better than using MSE/SSIM as a similarity indexes. Repo of FaceNet: https://github.com/timesler/facenet-pytorch. Another general purpose approach (not related to faces) which might help you: https://github.com/zegami/image-similarity-clustering.
Siamese networks for predicting similarity score - I am referring to this paper on face verification: http://bmvc2018.org/contents/papers/0410.pdf. The siamese network takes two images as input and outputs a value in the [0, 1]. We can interpret the output as the probability that the two images belong to the same class. You can train a model of this kind to predict 1 for image pairs of the following kind: (good quality image, artificially degraded image). To degrade the image, again, you can combine e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc. Let the model predict 0 for image pairs of different classes (e.g. different images). The output of the network can e used as a similarity index.
Remark 1
These different approaches can also be combined. They all provide you with similarity indexes, so you can very easily average the outcomes.
Remark 2
If you only need to do it once, the effort you need to put in implementing and training deep models might be not justified. I would not suggest it. Still, you can consider it if you can't find any other solution and that Mac is REALLY FULL of images and a manual search is not possible.
If you look at the documentation of imgdupes you will see there is the following option:
--dry-run
dry run (do not delete any files)
So if you run imgdupes with --dry-run you will get a listing of all the duplicate images but it will not actually delete anything. You should be able to process that output to move the images around as you need.
Try similar image finder I have developed to address this problem.
There is an explanation and the algorithm there, so you can implement your own version if needed.
I'm currently writing an SPH Solver using CUDA on https://github.com/Mathiasb17/sph_opengl.
I have pretty good results and performances but in my mind they still seem pretty weird for some reason :
https://www.youtube.com/watch?v=_DdHN8qApns
https://www.youtube.com/watch?v=Afgn0iWeDoc
In some implementations, i saw that a particle does not contribute to its own internal forces (which would be 0 anyways due to the formulas), but it does contribute to its own density.
My simulations work "pretty fine" (i don't like "pretty fine", i want it perfect) and in my implementation a particle does not contribute to its own density.
Besides when i change the code so it does contribute to its own density, the resulting simulation becomes way too unstable (particles explode).
I asked this to a lecturer in physics based animation, he told me a particle should not contribute to its density, but did not give me specific details about this assertion.
Any idea of how it should be ?
As long as you calculate the density with the summation formula instead of the continuity equation, yes you need to do it with self-contribution.
Here is why:
SPH is an interpolation scheme, which allows you to interpolate a specific value in any position in space over a particle cloud. Any position means you are not restricted to evaluate it on a particle, but anywhere in space. If you do so, obviously you need to consider all particles within the influence radius. From this point of view, it is easy to see that interpolating a quantity at a particle's position does not influence its contribution.
For other quantities like forces, where the derivative of some quantity is approximated, you don't need to apply self-contribution (that would lead to the evaluation of 0/0).
To discover the source of the instability:
check if the kernel is normalised
are the stiffness of the liquid and the time step size compatible (for the weakly compressible case)?
In many models the number of channels is kept in powers of 2. Also the batch-sizes are described in powers of 2. Is there any reason behind this design choice?
There is no significant in keeping channels and batch size as powers of 2. You can keep any number you want.
In many models the number of channels is kept in powers of 2. Also the batch-sizes are described in powers of 2. Is there any reason behind this design choice?
While both could probably be optimized for speed (cache-alignment? optimal usage of CUDA cores?) to be powers of two, I am 95% certain that 99.9% do it because others used the same numbers / it worked.
For both hyperparameters you could choose any positive integer. So what would you try? Keep in mind, each complete evaluation takes at least several hours. Hence I guess if people play with this parameter, they make something like a binary search: Starting from one number, doubling keep doubling if it improves until an upper bound is found. At some point the differences are minor and then it is irrelevant what you choose. And people will wonder less if you write that you used a batch size of 64 than if you write that you used 50. Or 42.
I have a set of data, 2D matrix (like Grey pictures).
And use CNN for classifier.
Would like to know if there is any study/experience on the accuracy impact
if we change the encoding from traditionnal encoding.
I suppose yes, question is rather which transformation of the encoding make the accuracy invariant, which one deteriorates....
To clarify, this concerns mainly the quantization process of the raw data into input data.
EDIT:
Quantize the raw data into input data is already a pre-processing of the data, adding or removing some features (even minor). It seems not very clear the impact in term of accuracy on this quantization process on real dnn computation.
Maybe, some research available.
I'm not aware of any research specifically dealing with quantization of input data, but you may want to check out some related work on quantization of CNN parameters: http://arxiv.org/pdf/1512.06473v2.pdf. Depending on what your end goal is, the "Q-CNN" approach may be useful for you.
My own experience with using various quantizations of the input data for CNNs has been that there's a heavy dependency between the degree of quantization and the model itself. For example, I've played around with using various interpolation methods to reduce image sizes and reducing the color palette size, and in the end, I discovered that each variant required a different tuning of hyper-parameters to achieve optimal results. Generally, I found that minor quantization of data had a negligible impact, but there was a knee in the curve where throwing away additional information dramatically impacted the achievable accuracy. Unfortunately, I'm not aware of any way to determine what degree of quantization will be optimal without experimentation, and even deciding what's optimal involves a trade-off between efficiency and accuracy which doesn't necessarily have a one-size-fits-all answer.
On a theoretical note, keep in mind that CNNs need to be able to find useful, spatially-local features, so it's probably reasonable to assume that any encoding that disrupts the basic "structure" of the input would have a significantly detrimental effect on the accuracy achievable.
In usual practice -- a discrete classification task in classic implementation -- it will have no effect. However, the critical point is in the initial computations for back-propagation. The classic definition depends only on strict equality of the predicted and "base truth" classes: a simple right/wrong evaluation. Changing the class coding has no effect on whether or not a prediction is equal to the training class.
However, this function can be altered. If you change the code to have something other than a right/wrong scoring, something that depends on the encoding choice, then encoding changes can most definitely have an effect. For instance, if you're rating movies on a 1-5 scale, you likely want 1 vs 5 to contribute a higher loss than 4 vs 5.
Does this reasonably deal with your concerns?
I see now. My answer above is useful ... but not for what you're asking. I had my eye on the classification encoding; you're wondering about the input.
Please note that asking for off-site resources is a classic off-topic question category. I am unaware of any such research -- for what little that is worth.
Obviously, there should be some effect, as you're altering the input data. The effect would be dependent on the particular quantization transformation, as well as the individual application.
I do have some limited-scope observations from general big-data analytics.
In our typical environment, where the data were scattered with some inherent organization within their natural space (F dimensions, where F is the number of features), we often use two simple quantization steps: (1) Scale all feature values to a convenient integer range, such as 0-100; (2) Identify natural micro-clusters, and represent all clustered values (typically no more than 1% of the input) by the cluster's centroid.
This speeds up analytic processing somewhat. Given the fine-grained clustering, it has little effect on the classification output. In fact, it sometimes improves the accuracy minutely, as the clustering provides wider gaps among the data points.
Take with a grain of salt, as this is not the main thrust of our efforts.
I know how chi-square distribution comes from , and also know how to apply the chi-square test.
However can't figure out why chi-square test can be used to check a significant difference between the expected frequencies and the observed frequencies.
Because the distribution of the chi-squared criterion approaches the chi-squared distribution as the size of the sample grows. The true distribution of this criterion is complex and depends on the base distribution (distribution, concordance to which we are checking). But with large samples, it approaches the chi-squared distribution for any base distribution (it is one of the central limit theorems). Therefore, we may not care about the base distribution and use the chi-squared test universally provided that the size of the sample is enough.