Tiled Instance Normalization - tensorflow

I am currently implementing a few image style transfer algorithms for Tensorflow, but I would like to do it in tiles, so I don't have to run the entire image through the network. Everything works fine, however each image is normalized differently, according to its own statistics, which results in tiles with slightly different characteristics.
I am certain that the only issue is instance normalization, since if I feed the true values (obtained from the entire image) to each tile calculation the result is perfect, however I still have to run the entire image through the network to calculate these values. I also tried calculating these values using a downsampled version of the image, but resolution suffers a lot.
So my question is: is it possible to estimate mean and variance values for instance normalization without feeding the entire image through the network?

You can take a random sample of the pixels of the image, and use the sample mean and sample variance to normalize the whole image. It will not be perfect, but the larger the sample, the better. A few hundred pixels will probably suffice, maybe even less, but you need to experiment.
Use tf.random_uniform() to get random X and Y coordinates, and then use tf.gather_nd() to get the pixel values at the given coordinates.

Related

Reverse Image search (for image duplicates) on local computer

I have a bunch of poor quality photos that I extracted from a pdf. Somebody I know has the good quality photo's somewhere on her computer(Mac), but it's my understanding that it will be difficult to find them.
I would like to
loop through each poor quality photo
perform a reverse image search using each poor quality photo as the query image and using this persons computer as the database to search for the higher quality images
and create a copy of each high quality image in one destination folder.
Example pseudocode
for each image in poorQualityImages:
search ./macComputer for a higherQualityImage of image
copy higherQualityImage to ./higherQualityImages
I need to perform this action once.
I am looking for a tool, github repo or library which can perform this functionality more so than a deep understanding of content based image retrieval.
There's a post on reddit where someone was trying to do something similar
imgdupes is a program which seems like it almost achieves this, but I do not want to delete the duplicates, I want to copy the highest quality duplicate to a destination folder
Update
Emailed my previous image processing prof and he sent me this
Off the top of my head, nothing out of the box.
No guaranteed solution here, but you can narrow the search space.
You’d need a little program that outputs the MSE or SSIM similarity
index between two images, and then write another program or shell
script that scans the hard drive and computes the MSE between each
image on the hard drive and each query image, then check the images
with the top X percent similarity score.
Something like that. Still not maybe guaranteed to find everything
you want. And if the low quality images are of different pixel
dimensions than the high quality images, you’d have to do some image
scaling to get the similarity index. If the poor quality images have
different aspect ratios, that’s even worse.
So I think it’s not hard but not trivial either. The degree of
difficulty is partly dependent on the nature of the corruption in the
low quality images.
UPDATE
Github project I wrote which achieves what I want
What you are looking for is called image hashing
. In this answer you will find a basic explanation of the concept, as well as a go-to github repo for plug-and-play application.
Basic concept of Hashing
From the repo page: "We have developed a new image hash based on the Marr wavelet that computes a perceptual hash based on edge information with particular emphasis on corners. It has been shown that the human visual system makes special use of certain retinal cells to distinguish corner-like stimuli. It is the belief that this corner information can be used to distinguish digital images that motivates this approach. Basically, the edge information attained from the wavelet is compressed into a fixed length hash of 72 bytes. Binary quantization allows for relatively fast hamming distance computation between hashes. The following scatter plot shows the results on our standard corpus of images. The first plot shows the distances between each image and its attacked counterpart (e.g. the intra distances). The second plot shows the inter distances between altogether different images. While the hash is not designed to handle rotated images, notice how slight rotations still generally fall within a threshold range and thus can usually be matched as identical. However, the real advantage of this hash is for use with our mvp tree indexing structure. Since it is more descriptive than the dct hash (being 72 bytes in length vs. 8 bytes for the dct hash), there are much fewer false matches retrieved for image queries.
"
Another blogpost for an in-depth read, with an application example.
Available Code and Usage
A github repo can be found here. There are obviously more to be found.
After importing the package you can use it to generate and compare hashes:
>>> from PIL import Image
>>> import imagehash
>>> hash = imagehash.average_hash(Image.open('test.png'))
>>> print(hash)
d879f8f89b1bbf
>>> otherhash = imagehash.average_hash(Image.open('other.bmp'))
>>> print(otherhash)
ffff3720200ffff
>>> print(hash == otherhash)
False
>>> print(hash - otherhash)
36
The demo script find_similar_images also on the mentioned github, illustrates how to find similar images in a directory.
Premise
I'll focus my answer on the image processing part, as I believe implementation details e.g. traversing a file system is not the core of your problem. Also, all that follows is just my humble opinion, I am sure that there are better ways to retrieve your image of which I am not aware. Anyway, I agree with what your prof said and I'll follow the same line of thought, so I'll share some ideas on possible similarity indexes you might use.
Answer
MSE and SSIM - This is a possible solution, as suggested by your prof. As I assume the low quality images also have a different resolution than the good ones, remember to downsample the good ones (and not upsample the bad ones).
Image subtraction (1-norm distance) - Subtract two images -> if they are equal you'll get a black image. If they are slightly different, the non-black pixels (or the sum of the pixel intensity) can be used as a similarity index. This is actually the 1-norm distance.
Histogram distance - You can refer to this paper: https://www.cse.huji.ac.il/~werman/Papers/ECCV2010.pdf. Comparing two images' histograms might be potentially robust for your task. Check out this question too: Comparing two histograms
Embedding learning - As I see you included tensorflow, keras or pytorch as tags, let's consider deep learning. This paper came to my
mind: https://arxiv.org/pdf/1503.03832.pdf The idea is to learn a
mapping from the image space to a Euclidian space - i.e. compute an
embedding of the image. In the embedding hyperspace, images are
points. This paper learns an embedding function by minimizing the
triplet loss. The triplet loss is meant to maximize the distance
between images of different classes and minimize the distance between
images of the same class. You could train the same model on a Dataset
like ImageNet. You could augment the dataset with by lowering the
quality of the images, in order to make the model "invariant" to
difference in image quality (e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc.). Once you can
compute embedding, you could compute the Euclidian distance (as a
substitute of the MSE). This might work better than using MSE/SSIM as a similarity indexes. Repo of FaceNet: https://github.com/timesler/facenet-pytorch. Another general purpose approach (not related to faces) which might help you: https://github.com/zegami/image-similarity-clustering.
Siamese networks for predicting similarity score - I am referring to this paper on face verification: http://bmvc2018.org/contents/papers/0410.pdf. The siamese network takes two images as input and outputs a value in the [0, 1]. We can interpret the output as the probability that the two images belong to the same class. You can train a model of this kind to predict 1 for image pairs of the following kind: (good quality image, artificially degraded image). To degrade the image, again, you can combine e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc. Let the model predict 0 for image pairs of different classes (e.g. different images). The output of the network can e used as a similarity index.
Remark 1
These different approaches can also be combined. They all provide you with similarity indexes, so you can very easily average the outcomes.
Remark 2
If you only need to do it once, the effort you need to put in implementing and training deep models might be not justified. I would not suggest it. Still, you can consider it if you can't find any other solution and that Mac is REALLY FULL of images and a manual search is not possible.
If you look at the documentation of imgdupes you will see there is the following option:
--dry-run
dry run (do not delete any files)
So if you run imgdupes with --dry-run you will get a listing of all the duplicate images but it will not actually delete anything. You should be able to process that output to move the images around as you need.
Try similar image finder I have developed to address this problem.
There is an explanation and the algorithm there, so you can implement your own version if needed.

Why does SSD resize random crops during data augmentation?

The SSD paper details its random-crop data augmentation scheme as:
Data augmentation To make the model more robust to various input object sizes and
shapes, each training image is randomly sampled by one of the following options:
– Use the entire original input image.
– Sample a patch so that the minimum jaccard overlap with the objects is 0.1, 0.3,
0.5, 0.7, or 0.9.
– Randomly sample a patch.
The size of each sampled patch is [0.1, 1] of the original image size, and the aspect ratio
is between 1 and 2. We keep the overlapped part of the ground truth box if the center of
it is in the sampled patch. After the aforementioned sampling step, each sampled patch
is resized to fixed size and is horizontally flipped with probability of 0.5, in addition to
applying some photo-metric distortions similar to those described in [14].
https://arxiv.org/pdf/1512.02325.pdf
My question is: what is the reasoning for resizing crops that range in aspect ratios between 0.5 and 2.0?
For instance if your input image is 300x300, reshaping a crop with AR=2.0 back to square resolution will severely stretch objects (square features become rectangular, circles become ellipses, etc.) I understand small distortions may be good to improve generalization, but training the network on objects distorted up to 2x in either dimension seems counter-productive. Am I misunderstanding how random-crop works?
[Edit] I completely understand that augmented images need to be the same size as the original -- I'm more wondering why the authors don't fix the Aspect Ratio to 1.0 to preserve object proportions.
GPU architecture enforces us to use batches to speedup training, and these batches should be of the same size. Using not-so-distorted image crops could make training more efficient, but much slower.
Personally I consider that any transformation makes sense as long as you as a human can still identify the object/subject, and as long as they make sense in the receptive field of the network. Also I guess somehow that the aspect ratio might help to learn some kind of perspective distortion (look at the cow in fig 5, it's kind of "compressed"). Objects like a cup, a tree, a chair, even stretched are still identifiable. Otherwise you could also consider that some point-controlled or skew transforms just don't make sense as well.
Then, if you are working with different images than natural images, without perspective, it is probably not a good idea to do so. If your image shows objects of a fixed known size like in a microscope or other medical imaging device, and if your object has more or less a fixed size (let's say a cell), then it's probably not a good idea to perform strong distortion on the scale (like a cell twice as large), maybe then a cell twice as an ellipse actually makes more sense.
With this library, you can perform strong augmentations, but not all of them make sense if you look at the image here:

Should the size of the photos be the same for deep learning?

I have lots of image (about 40 GB).
My images are small but they don't have same size.
My images aren't from natural things because I made them from a signal so all pixels are important and I can't crop or delete any pixel.
Is it possible to use deep learning for this kind of images with different shapes?
All pixels are important, please take this into consideration.
I want a model which does not depend on a fixed size input image. Is it possible?
Without knowing what you're trying to learn from the data, it's tough to give a definitive answer:
You could pad all the data at the beginning (or end) of the signal so
they're all the same size. This allows you to keep all the important
pixels, but adds irrelevant information to the image that the network
will most likely ignore.
I've also had good luck with activations where you take a pretrained
network and pull features from the image at a certain part of the
network regardless of size (as long as it's larger than the network
input size). Then run through a classifier.
https://www.mathworks.com/help/deeplearning/ref/activations.html#d117e95083
Or you could window your data, and only process smaller chunks at one
time.
https://www.mathworks.com/help/audio/examples/cocktail-party-source-separation-using-deep-learning-networks.html

transform a path along an arc

Im trying to transform a path along an arc.
My project is running on osX 10.8.2 and the painting is done via CoreAnimation in CALayers.
There is a waveform in my project which will be painted by a path. There are about 200 sample points which are mirrored to the bottom side. These are painted 60 times per second and updated to a song postion.
Please ignore the white line, it is just a rotation indicator.
What i am trying to achieve is drawing a waveform along an arc. "Up" should point to the middle. It does not need to go all the way around. The waveform should be painted along the green circle. Please take a look at the sketch provided below.
Im not sure how to achieve this in a performant manner. There are many points per second that need coordinate correction.
I tried coming up with some ideas of my own:
1) There is the possibility to add linear transformations to paths, which, i think, will not help me here. The only thing i can think of is adding a point, rotating the path with a transformation, adding another point, rotating and so on. But this would be very slow i think
2) Drawing the path into an image and bending it would surely lead to image-artifacts.
3) Maybe the best idea would be to precompute sample points on an arc, then save save a vector to the center. Taking the y-coordinates of the waveform, placing them on the sample points and moving them along the vector to the center.
But maybe i am just not seeing some kind of easy solution to this problem. Help is really appreciated and fresh ideas very welcome. Thank you in advance!
IMHO, the most efficient way to go (in terms of CPU usage) would be to use some form of pre-computed approach that would take into account the resolution of the display.
Cleverly precomputed values
I would go for the mathematical transformation (from linear to polar) and combine two facts:
There is no need to perform expansive mathematical computation
There is no need to render two points that are too close from each other
I have no ready-made algorithm for you, but you could use a pre-computed sin or cos table, and match the data range to the display size in order to work with integers.
For instance imagine we have some data ranging from 0 to 1E6 and we need to display the sin value of each point in a 100 pix height rectangle. We can use a pre-computed sin table and work with integers. This way displaying the sin value of a point would be much quicker. This concept can be refined to get a nicer result.
Also, there are some ways to retain only significant points of a curve so that the displayed curve actually looks like the original (see the Ramer–Douglas–Peucker algorithm on wikipedia). But I found it to be inefficient for quickly displaying ever-changing data.
Using multicore rendering
You could compute different areas of the curve using multiple cores (can be tricky)
Or you could use pre-computing using several cores, and one core to do finish the job.

Improving Speed of Histogram Back Projection

I am currently using OpenCV's built-in patch-based histogram back projection (cv::calcBackProjectPatch()) to identify regions of a target material in an image. With an image resolution of 640 x 480 and a window size of 10 x 10, processing a single image requires ~1200 ms. While the results are great, this far too slow for a real-time application (which should have a processing time of no more than ~100 ms).
I have already tried reducing the window size and switching from CV_COMP_CORREL to CV_COMP_INTERSECT to speed up the processing, but have not seen any appreciable speed up. This may be explained by the OpenCV documentation (emphasis mine):
Each new image is measured and then
converted into an image image array
over a chosen ROI. Histograms are
taken from this image image in an area
covered by a “patch” with an anchor at
center as shown in the picture below.
The histogram is normalized using the
parameter norm_factor so that it may
be compared with hist. The calculated
histogram is compared to the model
histogram; hist uses The function
cvCompareHist() with the comparison
method=method). The resulting
output is placed at the location
corresponding to the patch anchor in
the probability image dst. This
process is repeated as the patch is
slid over the ROI. Iterative histogram
update by subtracting trailing pixels
covered by the patch and adding newly
covered pixels to the histogram can
save a lot of operations, though it is
not implemented yet.
This leaves me with a few questions:
Is there another library that supports iterative histogram updates?
How significant of a speed-up should I expect from using an iterative update?
Are there any other techniques for speeding up this type of operation?
As mentioned in OpenCV Integral Histograms will definitely improve speed.
Please take a look at a sample implementation in the following link
http://smsoftdev-solutions.blogspot.com/2009/08/integral-histogram-for-fast-calculation.html