Is there any available DM script that can compare two images and know the difference - dm-script

Is there any available DM script that can compare two images and know the difference?
I mean the script can compare two or more images, and it can determine the similarity of two images, for example the 95% area of one image is same as another image, then the similarity of these two images is 95%.
The script can compare brightness and contrast distribution of images.
Thanks,

This question is a bit ill-defined, as "similarity" between images depends a lot on what you want.
If by "95% of the area is the same" you mean that 95% of the pixels are of identical value in images A & B, you can simply create a mask and sum() it to count the number of pixels, i.e.:
sum( abs(A-B)==0 ? 1 : 0 )
However, this will utterly fail if the images A & B are shifted with respect to each other even by a single pixel. It will also fail, if A & B are of same contrast but different absolute value.

I guess the intended question was to find similarity of two images in a fuzzy way.
For these, one way is to do crosscorrelation. DM has this function. Like this,
image xcorr= CrossCorrelate(ref,img)
From xcorr, the peak position gives x- and y- shift between the two, the peak intensity gives "similarity" of the two.
If you know there is no shift between the two, you can just do the sum and multiplication,
number similarity1=sum(img1*img2)
Another way to do similarity is calculate Euclidian distance of the two:
number similarity2=sqrt(sum((img1-img2)**2)).
"similarity2" calculates the "pure" similarity. "similarity1" is the pure similarity plus the mean intensity of img1 and img2. The difference is essentially this,
(a-b)**2=a**2+b**2-2*a*b.
The left term is "similarity2", the last term on the right is the "crosscorrelation" or "similarity1".
I think "similarity1" is called cross-correlation, "similarity2" is called correlation coefficient.
In example comparing two diffraction patterns, if you want to compute the degree of similarity, use "similarity2". If you want to compute the degree of similarity plus a certain character of the diffraction pattern, use "similarity1".

Related

How to perform non-maximum supression of keypoints conditioned on another channel

The standard non maximum suppression or finding peaks in a 2D image involves two steps:
Do a max pool with a kernel size, as maxpooled_image.
Then select the pixels where pixel_value == maxpooled_image value
However, let us say I have an additional channel, value2. Consider two strong pixels that belong in one NMS window. Now, in the standard case, only one of these pixels will be chosen. However, I'd like to add an additional condition that if the value2 are sufficiently different by some threshold (dth), then select both pixels, but if the difference between the value2 value of pixel1 and pixel2 is small, then pick only the brighter pixel.
How do I achieve this? in numpy

Determine the running time of an algorithm with two parameters

I have implemented an algorithm that uses two other algorithms for calculating the shortest path in a graph: Dijkstra and Bellman-Ford. Based on the time complexity of the these algorithms, I can calculate the running time of my implementation, which is easy giving the code.
Now, I want to experimentally verify my calculation. Specifically, I want to plot the running time as a function of the size of the input (I am following the method described here). The problem is that I have two parameters - number of edges and number of vertices.
I have tried to fix one parameter and change the other, but this approach results in two plots - one for varying number of edges and the other for varying number of vertices.
This leads me to my question - how can I determine the order of growth based on two plots? In general, how can one experimentally determine the running time complexity of an algorithm that has more than one parameter?
It's very difficult in general.
The usual way you would experimentally gauge the running time in the single variable case is, insert a counter that increments when your data structure does a fundamental (putatively O(1)) operation, then take data for many different input sizes, and plot it on a log-log plot. That is, log T vs. log N. If the running time is of the form n^k you should see a straight line of slope k, or something approaching this. If the running time is like T(n) = n^{k log n} or something, then you should see a parabola. And if T is exponential in n you should still see exponential growth.
You can only hope to get information about the highest order term when you do this -- the low order terms get filtered out, in the sense of having less and less impact as n gets larger.
In the two variable case, you could try to do a similar approach -- essentially, take 3 dimensional data, do a log-log-log plot, and try to fit a plane to that.
However this will only really work if there's really only one leading term that dominates in most regimes.
Suppose my actual function is T(n, m) = n^4 + n^3 * m^3 + m^4.
When m = O(1), then T(n) = O(n^4).
When n = O(1), then T(n) = O(m^4).
When n = m, then T(n) = O(n^6).
In each of these regimes, "slices" along the plane of possible n,m values, a different one of the terms is the dominant term.
So there's no way to determine the function just from taking some points with fixed m, and some points with fixed n. If you did that, you wouldn't get the right answer for n = m -- you wouldn't be able to discover "middle" leading terms like that.
I would recommend that the best way to predict asymptotic growth when you have lots of variables / complicated data structures, is with a pencil and piece of paper, and do traditional algorithmic analysis. Or possibly, a hybrid approach. Try to break the question of efficiency into different parts -- if you can split the question up into a sum or product of a few different functions, maybe some of them you can determine in the abstract, and some you can estimate experimentally.
Luckily two input parameters is still easy to visualize in a 3D scatter plot (3rd dimension is the measured running time), and you can check if it looks like a plane (in log-log-log scale) or if it is curved. Naturally random variations in measurements plays a role here as well.
In Matlab I typically calculate a least-squares solution to two-variable function like this (just concatenates different powers and combinations of x and y horizontally, .* is an element-wise product):
x = log(parameter_x);
y = log(parameter_y);
% Find a least-squares fit
p = [x.^2, x.*y, y.^2, x, y, ones(length(x),1)] \ log(time)
Then this can be used to estimate running times for larger problem instances, ideally those would be confirmed experimentally to know that the fitted model works.
This approach works also for higher dimensions but gets tedious to generate, maybe there is a more general way to achieve that and this is just a work-around for my lack of knowledge.
I was going to write my own explanation but it wouldn't be any better than this.

Creating an MLP that learns based on GPS coordinates

I have some data that tells me the amount of hours water is available for particular towns.
You can see it here
I want to use train a Multilayer Perceptron based on that data, to take a set of coordinates and indicate the approximate number of hours for which that coordinate will have water.
Does this make sense?
If so, am I correct in saying, there has to be two input layers? One for lat and one for long. And the output layer should be the number of hours.
Would love some guidance.
I would solve that differently:
Just create an ArrayList of WaterInfo:
WaterInfo contains lat,lon, waterHours.
Then for a given coordinate search the closest WaterInfo in the list.
Since you have not many elements, just do a brute force search, to find the closest.
You further can optimize, to find the three closest WaterInfo points, and calculate the weithted average of WaterHours. As weight you use the air distance from current position to Waterinfo position.
To answer your question:
"Does this makes sense"?
From the goal to get a working solution: NO!
Ask yourself, why do you want to use MLP for this task.
Further i doubt that using two layers for lat / long makes sense.
A coordinate (lat/lon) is one point on the world, so that should be one layer in the model. You can convert the lat/lon coord to a cell identifier: Span a grid over Brazil; with cell width 10 or 50km; now convert a lat/long coordinate to a cellId: Like E4 on a chess board, you will calculate one integer value representing the cell. (There are other solutions to get an unique number, too, choose one you like)
Now you have a modell geoCellID -> waterHours, which better represents the real world situation.

How to depict multidimentional vectors on two-dinesional plot?

I have a set of vectors in multidimensional space (may be several thousands of dimensions). In this space, I can calculate distance between 2 vectors (as a cosine of the angle between them, if it matters). What I want is to visualize these vectors keeping the distance. That is, if vector a is closer to vector b than to vector c in multidimensional space, it also must be closer to it on 2-dimensional plot. Is there any kind of diagram that can clearly depict it?
I don't think so. Imagine any twodimensional picture of a tetrahedron. There is no way of depicting the four vertices in two dimensions with equal distances from each other. So you will have a hard time trying to depict more than three n-dimensional vectors in 2 dimensions conserving their mutual distances.
(But right now I can't think of a rigorous proof.)
Update:
Ok, second idea, maybe it's dumb: If you try and find clusters of closer associated objects/texts, then calculate the center or mean vector of each cluster. Then you can reduce the problem space. At first find a 2D composition of the clusters that preserves their relative distances. Then insert the primary vectors, only accounting for their relative distances within a cluster and their distance to the center of to two or three closest clusters.
This approach will be ok for a large number of vectors. But it will not be accurate in that there always will be somewhat similar vectors ending up at distant places.

Need help generating discrete random numbers from distribution

I searched the site but did not find exactly what I was looking for... I wanted to generate a discrete random number from normal distribution.
For example, if I have a range from a minimum of 4 and a maximum of 10 and an average of 7. What code or function call ( Objective C preferred ) would I need to return a number in that range. Naturally, due to normal distribution more numbers returned would center round the average of 7.
As a second example, can the bell curve/distribution be skewed toward one end of the other? Lets say I need to generate a random number with a range of minimum of 4 and maximum of 10, and I want the majority of the numbers returned to center around the number 8 with a natural fall of based on a skewed bell curve.
Any help is greatly appreciated....
Anthony
What do you need this for? Can you do it the craps player's way?
Generate two random integers in the range of 2 to 5 (inclusive, of course) and add them together. Or flip a coin (0,1) six times and add 4 to the result.
Summing multiple dice produces a normal distribution (a "bell curve"), while eliminating high or low throws can be used to skew the distribution in various ways.
The key is you are going for discrete numbers (and I hope you mean integers by that). Multiple dice throws famously generate a normal distribution. In fact, I think that's how we were first introduced to the Gaussian curve in school.
Of course the more throws, the more closely you approximate the bell curve. Rolling a single die gives a flat line. Rolling two dice just creates a ramp up and down that isn't terribly close to a bell. Six coin flips gets you closer.
So consider this...
If I understand your question correctly, you only have seven possible outcomes--the integers (4,5,6,7,8,9,10). You can set up an array of seven probabilities to approximate any distribution you like.
Many frameworks and libraries have this built-in.
Also, just like TokenMacGuy said a normal distribution isn't characterized by the interval it's defined on, but rather by two parameters: Mean μ and standard deviation σ. With both these parameters you can confine a certain quantile of the distribution to a certain interval, so that 95 % of all points fall in that interval. But resticting it completely to any interval other than (−∞, ∞) is impossible.
There are several methods to generate normal-distributed values from uniform random values (which is what most random or pseudorandom number generators are generating:
The Box-Muller transform is probably the easiest although not exactly fast to compute. Depending on the number of numbers you need, it should be sufficient, though and definitely very easy to write.
Another option is Marsaglia's Polar method which is usually faster1.
A third method is the Ziggurat algorithm which is considerably faster to compute but much more complex to program. In applications that really use a lot of random numbers it may be the best choice, though.
As a general advice, though: Don't write it yourself if you have access to a library that generates normal-distributed random numbers for you already.
For skewing your distribution I'd just use a regular normal distribution, choosing μ and σ appropriately for one side of your curve and then determine on which side of your wanted mean a point fell, stretching it appropriately to fit your desired distribution.
For generating only integers I'd suggest you just round towards the nearest integer when the random number happens to fall within your desired interval and reject it if it doesn't (drawing a new random number then). This way you won't artificially skew the distribution (such as you would if you were clamping the values at 4 or 10, respectively).
1 In testing with deliberately bad random number generators (yes, worse than RANDU) I've noticed that the polar method results in an endless loop, rejecting every sample. Won't happen with random numbers that fulfill the usual statistic expectations to them, though.
Yes, there are sophisticated mathematical solutions, but for "simple but practical" I'd go with Nosredna's comment. For a simple Java solution:
Random random=new Random();
public int bell7()
{
int n=4;
for (int x=0;x<6;++x)
n+=random.nextInt(2);
return n;
}
If you're not a Java person, Random.nextInt(n) returns a random integer between 0 and n-1. I think the rest should be similar to what you'd see in any programming language.
If the range was large, then instead of nextInt(2)'s I'd use a bigger number in there so there would be fewer iterations through the loop, depending on frequency of call and performance requirements.
Dan Dyer and Jay are exactly right. What you really want is a binomial distribution, not a normal distribution. The shape of a binomial distribution looks a lot like a normal distribution, but it is discrete and bounded whereas a normal distribution is continuous and unbounded.
Jay's code generates a binomial distribution with 6 trials and a 50% probability of success on each trial. If you want to "skew" your distribution, simply change the line that decides whether to add 1 to n so that the probability is something other than 50%.
The normal distribution is not described by its endpoints. Normally it's described by it's mean (which you have given to be 7) and its standard deviation. An important feature of this is that it is possible to get a value far outside the expected range from this distribution, although that will be vanishingly rare, the further you get from the mean.
The usual means for getting a value from a distribution is to generate a random value from a uniform distribution, which is quite easily done with, for example, rand(), and then use that as an argument to a cumulative distribution function, which maps probabilities to upper bounds. For the standard distribution, this function is
F(x) = 0.5 - 0.5*erf( (x-μ)/(σ * sqrt(2.0)))
where erf() is the error function which may be described by a taylor series:
erf(z) = 2.0/sqrt(2.0) * Σ∞n=0 ((-1)nz2n + 1)/(n!(2n + 1))
I'll leave it as an excercise to translate this into C.
If you prefer not to engage in the exercise, you might consider using the Gnu Scientific Library, which among many other features, has a technique to generate random numbers in one of many common distributions, of which the Gaussian Distribution (hint) is one.
Obviously, all of these functions return floating point values. You will have to use some rounding strategy to convert to a discrete value. A useful (but naive) approach is to simply downcast to integer.