Normal Distribution function - objective-c

edit
So based on the answers so far (thanks for taking your time) I'm getting the sense that I'm probably NOT looking for a Normal Distribution function. Perhaps I'll try to re-describe what I'm looking to do.
Lets say I have an object that returns a number of 0 to 10. And that number controls "speed". However instead of 10 being the top speed, I need 5 to be the top speed, and anything lower or higher would slow down accordingly. (with easing, thus the bell curve)
I hope that's clearer ;/
-original question
These are the times I wish I remembered something from math class.
I'm trying to figure out how to write a function in obj-C where I define the boundries, ex (0 - 10) and then if x = foo y = ? .... where x runs something like 0,1,2,3,4,5,6,7,8,9,10 and y runs 0,1,2,3,4,5,4,3,2,1,0 but only on a curve
Something like the attached image.
I tried googling for Normal Distribution but its way over my head. I was hoping to find some site that lists some useful algorithms like these but wasn't very successful.
So can anyone help me out here ? And if there is some good sites which shows useful mathematical functions, I'd love to check them out.
TIA!!!
-added
I'm not looking for a random number, I'm looking for.. ex: if x=0 y should be 0, if x=5 y should be 5, if x=10 y should be 0.... and all those other not so obvious in between numbers
alt text http://dizy.cc/slider.gif

Okay, your edit really clarifies things. You're not looking for anything to do with the normal distribution, just a nice smooth little ramp function. The one Paul provides will do nicely, but is tricky to modify for other values. It can be made a little more flexible (my code examples are in Python, which should be very easy to translate to any other language):
def quarticRamp(x, b=10, peak=5):
if not 0 <= x <= b:
raise ValueError #or return 0
return peak*x*x*(x-b)*(x-b)*16/(b*b*b*b)
Parameter b is the upper bound for the region you want to have a slope on (10, in your example), and peak is how high you want it to go (5, in the example).
Personally I like a quadratic spline approach, which is marginally cheaper computationally and has a different curve to it (this curve is really nice to use in a couple of special applications that don't happen to matter at all for you):
def quadraticSplineRamp(x, a=0, b=10, peak=5):
if not a <= x <= b:
raise ValueError #or return 0
if x > (b+a)/2:
x = a + b - x
z = 2*(x-a)/b
if z > 0.5:
return peak * (1 - 2*(z-1)*(z-1))
else:
return peak * (2*z*z)
This is similar to the other function, but takes a lower bound a (0 in your example). The logic is a little more complex because it's a somewhat-optimized implementation of a piecewise function.
The two curves have slightly different shapes; you probably don't care what the exact shape is, and so could pick either. There are an infinite number of ramp functions meeting your criteria; these are two simple ones, but they can get as baroque as you want.

The thing you want to plot is the probability density function (pdf) of the normal distribution. You can find it on the mighty Wikipedia.
Luckily, the pdf for a normal distribution is not difficult to implement - some of the other related functions are considerably worse because they require the error function.
To get a plot like you showed, you want a mean of 5 and a standard deviation of about 1.5. The median is obviously the centre, and figuring out an appropriate standard deviation given the left & right boundaries isn't particularly difficult.
A function to calculate the y value of the pdf given the x coordinate, standard deviation and mean might look something like:
double normal_pdf(double x, double mean, double std_dev) {
return( 1.0/(sqrt(2*PI)*std_dev) *
exp(-(x-mean)*(x-mean)/(2*std_dev*std_dev)) );
}

A normal distribution is never equal to 0.
Please make sure that what you want to plot is indeed a
normal distribution.
If you're only looking for this bell shape (with the tangent and everything)
you can use the following formula:
x^2*(x-10)^2 for x between 0 and 10
0 elsewhere
(Divide by 125 if you need to have your peek on 5.)
double bell(double x) {
if ((x < 10) && (x>0))
return x*x*(x-10.)*(x-10.)/125.;
else
return 0.;
}

Well, there's good old Wikipedia, of course. And Mathworld.
What you want is a random number generator for "generating normally distributed random deviates". Since Objective C can call regular C libraries, you either need a C-callable library like the GNU Scientific Library, or for this, you can write it yourself following the description here.

Try simulating rolls of dice by generating random numbers between 1 and 6. If you add up the rolls from 5 independent dice rolls, you'll get a surprisingly good approximation to the normal distribution. You can roll more dice if you'd like and you'll get a better approximation.
Here's an article that explains why this works. It's probably more mathematical detail than you want, but you could show it to someone to justify your approach.

If what you want is the value of the probability density function, p(x), of a normal (Gaussian) distribution of mean mu and standard deviation sigma at x, the formula is
p(x) = exp( ((x-mu)^2)/(2*sigma^2) ) / (sigma * 2 * sqrt(pi))
where pi is the area of a circle divided by the square of its radius (approximately 3.14159...). Using the C standard library math.h, this is:
#include <math>
double normal_pdf(double x, double mu, double sigma) {
double n = sigma * 2 * sqrt(M_PI); //normalization factor
p = exp( -pow(x-mu, 2) / (2 * pow(sigma, 2)) ); // unnormalized pdf
return p / n;
}
Of course, you can do the same in Objective-C.
For reference, see the Wikipedia or MathWorld articles.

It sounds like you want to write a function that yields a curve of a specific shape. Something like y = f(x), for x in [0:10]. You have a constraint on the max value of y, and a general idea of what you want the curve to look like (somewhat bell-shaped, y=0 at the edges of the x range, y=5 when x=5). So roughly, you would call your function iteratively with the x range, with a step that gives you enough points to make your curve look nice.
So you really don't need random numbers, and this has nothing to do with probability unless you want it to (as in, you want your curve to look like a the outline of a normal distribution or something along those lines).
If you have a clear idea of what function will yield your desired curve, the code is trivial - a function to compute f(x) and a for loop to call it the desired number of times for the desired values of x. Plot the x,y pairs and you're done. So that's your algorithm - call a function in a for loop.
The contents of the routine implementing the function will depend on the specifics of what you want the curve to look like. If you need help on functions that might return a curve resembling your sample, I would direct you to the reading material in the other answers. :) However, I suspect that this is actually an assignment of some sort, and that you have been given a function already. If you are actually doing this on your own to learn, then I again echo the other reading suggestions.

y=-1*abs(x-5)+5

Related

Determine the running time of an algorithm with two parameters

I have implemented an algorithm that uses two other algorithms for calculating the shortest path in a graph: Dijkstra and Bellman-Ford. Based on the time complexity of the these algorithms, I can calculate the running time of my implementation, which is easy giving the code.
Now, I want to experimentally verify my calculation. Specifically, I want to plot the running time as a function of the size of the input (I am following the method described here). The problem is that I have two parameters - number of edges and number of vertices.
I have tried to fix one parameter and change the other, but this approach results in two plots - one for varying number of edges and the other for varying number of vertices.
This leads me to my question - how can I determine the order of growth based on two plots? In general, how can one experimentally determine the running time complexity of an algorithm that has more than one parameter?
It's very difficult in general.
The usual way you would experimentally gauge the running time in the single variable case is, insert a counter that increments when your data structure does a fundamental (putatively O(1)) operation, then take data for many different input sizes, and plot it on a log-log plot. That is, log T vs. log N. If the running time is of the form n^k you should see a straight line of slope k, or something approaching this. If the running time is like T(n) = n^{k log n} or something, then you should see a parabola. And if T is exponential in n you should still see exponential growth.
You can only hope to get information about the highest order term when you do this -- the low order terms get filtered out, in the sense of having less and less impact as n gets larger.
In the two variable case, you could try to do a similar approach -- essentially, take 3 dimensional data, do a log-log-log plot, and try to fit a plane to that.
However this will only really work if there's really only one leading term that dominates in most regimes.
Suppose my actual function is T(n, m) = n^4 + n^3 * m^3 + m^4.
When m = O(1), then T(n) = O(n^4).
When n = O(1), then T(n) = O(m^4).
When n = m, then T(n) = O(n^6).
In each of these regimes, "slices" along the plane of possible n,m values, a different one of the terms is the dominant term.
So there's no way to determine the function just from taking some points with fixed m, and some points with fixed n. If you did that, you wouldn't get the right answer for n = m -- you wouldn't be able to discover "middle" leading terms like that.
I would recommend that the best way to predict asymptotic growth when you have lots of variables / complicated data structures, is with a pencil and piece of paper, and do traditional algorithmic analysis. Or possibly, a hybrid approach. Try to break the question of efficiency into different parts -- if you can split the question up into a sum or product of a few different functions, maybe some of them you can determine in the abstract, and some you can estimate experimentally.
Luckily two input parameters is still easy to visualize in a 3D scatter plot (3rd dimension is the measured running time), and you can check if it looks like a plane (in log-log-log scale) or if it is curved. Naturally random variations in measurements plays a role here as well.
In Matlab I typically calculate a least-squares solution to two-variable function like this (just concatenates different powers and combinations of x and y horizontally, .* is an element-wise product):
x = log(parameter_x);
y = log(parameter_y);
% Find a least-squares fit
p = [x.^2, x.*y, y.^2, x, y, ones(length(x),1)] \ log(time)
Then this can be used to estimate running times for larger problem instances, ideally those would be confirmed experimentally to know that the fitted model works.
This approach works also for higher dimensions but gets tedious to generate, maybe there is a more general way to achieve that and this is just a work-around for my lack of knowledge.
I was going to write my own explanation but it wouldn't be any better than this.

scipy.optimize.fmin_l_bfgs_b returns 'ABNORMAL_TERMINATION_IN_LNSRCH'

I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm.
sigma_sp_new, func_val, info_dict = fmin_l_bfgs_b(func_to_minimize, self.sigma_vector[si][pj],
args=(self.w_vectors[si][pj], Y, X, E_step_results[si][pj]),
approx_grad=True, bounds=[(1e-8, 0.5)], factr=1e02, pgtol=1e-05, epsilon=1e-08)
But sometimes I got a warning 'ABNORMAL_TERMINATION_IN_LNSRCH' in the information dictionary:
func_to_minimize value = 1.14462324063e-07
information dictionary: {'task': b'ABNORMAL_TERMINATION_IN_LNSRCH', 'funcalls': 147, 'grad': array([ 1.77635684e-05, 2.87769808e-05, 3.51718654e-05,
6.75015599e-06, -4.97379915e-06, -1.06581410e-06]), 'nit': 0, 'warnflag': 2}
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 6 M = 10
This problem is unconstrained.
At X0 0 variables are exactly at the bounds
At iterate 0 f= 1.14462D-07 |proj g|= 3.51719D-05
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
6 1 21 1 0 0 3.517D-05 1.145D-07
F = 1.144619474757747E-007
ABNORMAL_TERMINATION_IN_LNSRCH
Line search cannot locate an adequate point after 20 function
and gradient evaluations. Previous x, f and g restored.
Possible causes: 1 error in function or gradient evaluation;
2 rounding error dominate computation.
Cauchy time 0.000E+00 seconds.
Subspace minimization time 0.000E+00 seconds.
Line search time 0.000E+00 seconds.
Total User time 0.000E+00 seconds.
I do not get this warning every time, but sometimes. (Most get 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' or 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH').
I know that it means the minimum can be be reached in this iteration. I googled this problem. Someone said it occurs often because the objective and gradient functions do not match. But here I do not provide gradient function because I am using 'approx_grad'.
What are the possible reasons that I should investigate? What does it mean by "rounding error dominate computation"?
======
I also find that the log-likelihood does not monotonically increase:
########## Convergence !!! ##########
log_likelihood_history: [-28659.725891322563, 220.49993177669558, 291.3513633060345, 267.47745327823907, 265.31567762171181, 265.07311121000367, 265.04217683341682]
It usually start decrease at the second or the third iteration, even through 'ABNORMAL_TERMINATION_IN_LNSRCH' does not occurs. I do not know whether it this problem is related to the previous one.
Scipy calls the original L-BFGS-B implementation. Which is some fortran77 (old but beautiful and superfast code) and our problem is that the descent direction is actually going up. The problem starts on line 2533 (link to the code at the bottom)
gd = ddot(n,g,1,d,1)
if (ifun .eq. 0) then
gdold=gd
if (gd .ge. zero) then
c the directional derivative >=0.
c Line search is impossible.
if (iprint .ge. 0) then
write(0,*)' ascent direction in projection gd = ', gd
endif
info = -4
return
endif
endif
In other words, you are telling it to go down the hill by going up the hill. The code tries something called line search a total of 20 times in the descent direction that you provide and realizes that you are NOT telling it to go downhill, but uphill. All 20 times.
The guy who wrote it (Jorge Nocedal, who by the way is a very smart guy) put 20 because pretty much that's enough. Machine epsilon is 10E-16, I think 20 is actually a little too much. So, my money for most people having this problem is that your gradient does not match your function.
Now, it could also be that "2. rounding errors dominate computation". By this, he means that your function is a very flat surface in which increases are of the order of machine epsilon (in which case you could perhaps rescale the function),
Now, I was thiking that maybe there should be a third option, when your function is too weird. Oscillations? I could see something like $\sin({\frac{1}{x}})$ causing this kind of problem. But I'm not a smart guy, so don't assume that there's a third case.
So I think the OP's solution should be that your function is too flat. Or look at the fortran code.
https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb/lbfgsb.f
Here's line search for those who want to see it. https://en.wikipedia.org/wiki/Line_search
Note. This is 7 months too late. I put it here for future's sake.
As pointed out in the answer by Wilmer E. Henao, the problem is probably in the gradient. Since you are using approx_grad=True, the gradient is calculated numerically. In this case, reducing the value of epsilon, which is the step size used for numerically calculating the gradient, can help.
I also got the error "ABNORMAL_TERMINATION_IN_LNSRCH" using the L-BFGS-B optimizer.
While my gradient function pointed in the right direction, I rescaled the actual gradient of the function by its L2-norm. Removing that or adding another appropriate type of rescaling worked. Before, I guess that the gradient was so large that it went out of bounds immediately.
The problem from OP was unbounded if I read correctly, so this will certainly not help in this problem setting. However, googling the error "ABNORMAL_TERMINATION_IN_LNSRCH" yields this page as one of the first results, so it might help others...
I had a similar problem recently. I sometimes encounter the ABNORMAL_TERMINATION_IN_LNSRCH message after using fmin_l_bfgs_b function of scipy. I try to give additional explanations of the reason why I get this. I am looking for complementary details or corrections if I am wrong.
In my case, I provide the gradient function, so approx_grad=False. My cost function and the gradient are consistent. I double-checked it and the optimization actually works most of the time. When I get ABNORMAL_TERMINATION_IN_LNSRCH, the solution is not optimal, not even close (even this is a subjective point of view). I can overcome this issue by modifying the maxls argument. Increasing maxls helps to solve this issue to finally get the optimal solution. However, I noted that sometimes a smaller maxls, than the one that produces ABNORMAL_TERMINATION_IN_LNSRCH, results in a converging solution. A dataframe summarizes the results. I was surprised to observe this. I expected that reducing maxls would not improve the result. For this reason, I tried to read the paper describing the line search algorithm but I had trouble to understand it.
The line "search algorithm generates a sequence of
nested intervals {Ik} and a sequence of iterates αk ∈ Ik ∩ [αmin ; αmax] according to the [...] procedure". If I understand well, I would say that the maxls argument specifies the length of this sequence. At the end of the maxls iterations (or less if the algorithm terminates in fewer iterations), the line search stops. A final trial point is generated within the final interval Imaxls. I would say the the formula does not guarantee to get an αmaxls that respects the two update conditions, the minimum decrease and the curvature, especially when the interval is still wide. My guess is that in my case, after 11 iterations the generated interval I11 is such that a trial point α11 respects both conditions. But, even though I12 is smaller and still containing acceptable points, α12 is not. Finally after 24 iterations, the interval is very small and the generated αk respects the update conditions.
Is my understanding / explanation accurate?
If so, I would then be surprised that when maxls=12, since the generated α11 is acceptable but not α12, why α11 is not chosen in this case instead of α12?
Pragmatically, I would recommend to try a few higher maxls when getting ABNORMAL_TERMINATION_IN_LNSRCH.

Fitting curves to a set of points

Basically, I have a set of up to 100 co-ordinates, along with the desired tangents to the curve at the first and last point.
I have looked into various methods of curve-fitting, by which I mean an algorithm with takes the inputted data points and tangents, and outputs the equation of the cure, such as the gaussian method and interpolation, but I really struggled understanding them.
I am not asking for code (If you choose to give it, thats acceptable though :) ), I am simply looking for help into this algorithm. It will eventually be converted to Objective-C for an iPhone app, if that changes anything..
EDIT:
I know the order of all of the points. They are not too close together, so passing through all points is necessary - aka interpolation (unless anyone can suggest something else). And as far as I know, an algebraic curve is what I'm looking for. This is all being done on a 2D plane by the way
I'd recommend to consider cubic splines. There is some explanation and code to calculate them in plain C in Numerical Recipes book (chapter 3.3)
Most interpolation methods originally work with functions: given a set of x and y values, they compute a function which computes a y value for every x value, meeting the specified constraints. As a function can only ever compute a single y value for every x value, such an curve cannot loop back on itself.
To turn this into a real 2D setup, you want two functions which compute x resp. y values based on some parameter that is conventionally called t. So the first step is computing t values for your input data. You can usually get a good approximation by summing over euclidean distances: think about a polyline connecting all your points with straight segments. Then the parameter would be the distance along this line for every input pair.
So now you have two interpolation problem: one to compute x from t and the other y from t. You can formulate this as a spline interpolation, e.g. using cubic splines. That gives you a large system of linear equations which you can solve iteratively up to the desired precision.
The result of a spline interpolation will be a piecewise description of a suitable curve. If you wanted a single equation, then a lagrange interpolation would fit that bill, but the result might have odd twists and turns for many sets of input data.

Optimize MATLAB code (nested for loop to compute similarity matrix)

I am computing a similarity matrix based on Euclidean distance in MATLAB. My code is as follows:
for i=1:N % M,N is the size of the matrix x for whose elements I am computing similarity matrix
for j=1:N
D(i,j) = sqrt(sum(x(:,i)-x(:,j)).^2)); % D is the similarity matrix
end
end
Can any help with optimizing this = reducing the for loops as my matrix x is of dimension 256x30000.
Thanks a lot!
--Aditya
The function to do so in matlab is called pdist. Unfortunately it is painfully slow and doesnt take Matlabs vectorization abilities into account.
The following is code I wrote for a project. Let me know what kind of speed up you get.
Qx=repmat(dot(x,x,2),1,size(x,1));
D=sqrt(Qx+Qx'-2*x*x');
Note though that this will only work if your data points are in the rows and your dimensions the columns. So for example lets say I have 256 data points and 100000 dimensions then on my mac using x=rand(256,100000) and the above code produces a 256x256 matrix in about half a second.
There's probably a better way to do it, but the first thing I noticed was that you could cut the runtime in half by exploiting the symmetry D(i,j)==D(i,j)
You can also use the function norm(x(:,i)-x(:,j),2)
I think this is what you're looking for.
D=zeros(N);
jIndx=repmat(1:N,N,1);iIndx=jIndx'; %'# fix SO's syntax highlighting
D(:)=sqrt(sum((x(iIndx(:),:)-x(jIndx(:),:)).^2,2));
Here, I have assumed that the distance vector, x is initalized as an NxM array, where M is the number of dimensions of the system and N is the number of points. So if your ordering is different, you'll have to make changes accordingly.
To start with, you are computing twice as much as you need to here, because D will be symmetric. You don't need to calculate the (i,j) entry and the (j,i) entry separately. Change your inner loop to for j=1:i, and add in the body of that loop D(j,i)=D(i,j);
After that, there's really not much redundancy left in what that code does, so your only room left for improvement is to parallelize it: if you have the Parallel Computing Toolbox, convert your outer loop to a parfor and before you run it, say matlabpool(n), where n is the number of threads to use.

approximating log10[x^k0 + k1]

Greetings. I'm trying to approximate the function
Log10[x^k0 + k1], where .21 < k0 < 21, 0 < k1 < ~2000, and x is integer < 2^14.
k0 & k1 are constant. For practical purposes, you can assume k0 = 2.12, k1 = 2660. The desired accuracy is 5*10^-4 relative error.
This function is virtually identical to Log[x], except near 0, where it differs a lot.
I already have came up with a SIMD implementation that is ~1.15x faster than a simple lookup table, but would like to improve it if possible, which I think is very hard due to lack of efficient instructions.
My SIMD implementation uses 16bit fixed point arithmetic to evaluate a 3rd degree polynomial (I use least squares fit). The polynomial uses different coefficients for different input ranges. There are 8 ranges, and range i spans (64)2^i to (64)2^(i + 1).
The rational behind this is the derivatives of Log[x] drop rapidly with x, meaning a polynomial will fit it more accurately since polynomials are an exact fit for functions that have a derivative of 0 beyond a certain order.
SIMD table lookups are done very efficiently with a single _mm_shuffle_epi8(). I use SSE's float to int conversion to get the exponent and significand used for the fixed point approximation. I also software pipelined the loop to get ~1.25x speedup, so further code optimizations are probably unlikely.
What I'm asking is if there's a more efficient approximation at a higher level?
For example:
Can this function be decomposed into functions with a limited domain like
log2((2^x) * significand) = x + log2(significand)
hence eliminating the need to deal with different ranges (table lookups). The main problem I think is adding the k1 term kills all those nice log properties that we know and love, making it not possible. Or is it?
Iterative method? don't think so because the Newton method for log[x] is already a complicated expression
Exploiting locality of neighboring pixels? - if the range of the 8 inputs fall in the same approximation range, then I can look up a single coefficient, instead of looking up separate coefficients for each element. Thus, I can use this as a fast common case, and use a slower, general code path when it isn't. But for my data, the range needs to be ~2000 before this property hold 70% of the time, which doesn't seem to make this method competitive.
Please, give me some opinion, especially if you're an applied mathematician, even if you say it can't be done. Thanks.
You should be able to improve on least-squares fitting by using Chebyshev approximation. (The idea is, you're looking for the approximation whose worst-case deviation in a range is least; least-squares instead looks for the one whose summed squared difference is least.) I would guess this doesn't make a huge difference for your problem, but I'm not sure -- hopefully it could reduce the number of ranges you need to split into, somewhat.
If there's already a fast implementation of log(x), maybe compute P(x) * log(x) where P(x) is a polynomial chosen by Chebyshev approximation. (Instead of trying to do the whole function as a polynomial approx -- to need less range-reduction.)
I'm an amateur here -- just dipping my toe in as there aren't a lot of answers already.
One observation:
You can find an expression for how large x needs to be as a function of k0 and k1, such that the term x^k0 dominates k1 enough for the approximation:
x^k0 +k1 ~= x^k0, allowing you to approximately evaluate the function as
k0*Log(x).
This would take care of all x's above some value.
I recently read how the sRGB model compresses physical tri stimulus values into stored RGB values.
It basically is very similar to the function I try to approximate, except that it's defined piece wise:
k0 x, x < 0.0031308
k1 x^0.417 - k2 otherwise
I was told the constant addition in Log[x^k0 + k1] was to make the beginning of the function more linear. But that can easily be achieved with a piece wise approximation. That would make the approximation a lot more "uniform" - with only 2 approximation ranges. This should be cheaper to compute due to no longer needing to compute an approximation range index (integer log) and doing SIMD coefficient lookup.
For now, I conclude this will be the best approach, even though it doesn't approximate the function precisely. The hard part will be proposing this change and convincing people to use it.