Vectorizing Code for efficient implementation - optimization

The following is an IIR code. I need to vectorize the code so that I can write NEON code efficiently.
Example of vectorization
Non vectorized code
for(i=0;i<100;i++)
a[i] =a[i]*b[i]; //only one independent multiplication cannot take
//advantage of multiple multiplication units
Vectorized code
for(i=0;i<25;i++)
{
a[i*4] =a[i*4]*b[i*4]; //four independent multiplications can use
a[(i+1)*4] =a[(i+1)*4]*b[(i+1)*4]; // multiple multiplication units to perform the
a[(i+2)*4] =a[(i+2)*4]*b[(i+2)*4]; //operation in parallel
a[(i+3)*4] =a[(i+3)*4]*b[(i+3)*4];
}
Please help me in vectorizing the for loop below so as to implement the code efficiently by using the vector capability of hardware (my hardware can perform 4 multiplications simultaneously).
main()
{
for(j=0;j<NUMBQUAD;j++)
{
for(i=2;i<SAMPLES+2 ;i++)
{
w[i] = x[i-2] + a1[j]* w[i-1] + a2[j]*w[i-2];
y[i-2] = w[i] + b1[j]* w[i-1] + b2[j]*w[i-2];
}
w[0]=0;
w[1] =0;
}
}

Once you have fixed (or verified) the equations, you should notice that there are 4 independent multiplications in each round of the equation. The task becomes in finding the proper and least number of instructions to permute input vectors x[...], y[...], w[...] to some register
q0 = | w[i-1] | w[i-2] | w[i-1] | w[i-2]|
q1 = | a1[j] | a2[j] | b1[j] | b2[j] | // vld1.32 {d0,d1}, [r1]!
q2 = q0 .* q1
A potentially much more effective method of wavefront parallelism can be achieved by inverting the for loops.
x0 = *x++;
w0 = x0 + a*w1 + b*w2; // pipeline warming stage
y0 = w0 + c*w1 + d*w2; //
[REPEAT THIS]
// W2 = W1; W1 = W0;
W0 = y0 + A*W1 + B*W2;
Y0 = W0 + C*W1 + D*W2;
// w2 = w1; w1 = w0;
x0 = *x++;
*output++= Y0;
w0 = x0 + a*w1 + b*w2;
y0 = w0 + c*w1 + d*w2;
[REPEAT ENDS]
W0 = y0 + A*W1 + B*W2; // pipeline cooling stage
Y0 = W0 + C*W1 + D*W2;
*output++= Y0;
While there are still dependencies between x0->w0->y0->W0->Y0, there's an opportunity of full 2-way parallelism in between lower-case and upper-case expressions. Also one can try to get rid of shifting the values w2=w1; w1=w0; by unrolling the loop and doing manual register renaming.

Related

Linear programming if/then modification to cost function?

I'm setting up a linear programming optimization model using CPLEX and am wondering if it's possible to accomplish a modification of the cost function dependent upon which binary decision variables are 'active' in an arbitrary solution. This is mostly a question about how to formulate the LP model (if it's even possible), but responses in the context of CPLEX are welcome or even preferred.
Say I have an LP problem in canonical form:
minimize cTx
s.t. Ax <= b
With cost function:
c = [c_1, c_2,...,c_100]
All variables are binary. I have this basic setup modeled and running effectively in CPLEX.
Now say I have a subset of variables:
efficiency_set = [x_1, x_2,...,x_5]
With the condition:
if any x_n in efficiency_set == 1
then c_n for all other x_n in the set = 0.9 * c_n
Essentially there is a dependency where if any x_n in the efficiency set is 'active', it becomes 10% less expensive for other variables in the set to appear in the solution.
I thought that CPLEX indicator constraints were what I was looking for, but after reading through documentation, I don't think I can enforce an on-the-fly change to cost function with them (I could be wrong). So I feel like it needs to be done through formulation of the LP, but I can't reason how to accomplish it. Any ideas?. Thanks.
In CPLEX you have many APIs, let me answer you with the easiest one OPL.
Your canonical form can be written
int n=3;
int m=4;
range N=1..n;
range M=1..m;
float A[N][M]=[[1,4,9,6],[8,5,0,8],[2,9,0,2]];
float B[M]=[3,1,3,0];
float C[N]=[1,1,1];
dvar boolean x[N];
minimize sum(i in N) C[i]*x[i];
subject to
{
forall(j in M) sum(i in N) A[i,j]*x[i]>=B[j];
}
and then you can you write logical constraints:
int n=3;
int m=4;
range N=1..n;
range M=1..m;
float A[N][M]=[[1,4,9,6],[8,5,0,8],[2,9,0,2]];
float B[M]=[3,1,3,0];
float C[N]=[1,1,1];
{int} efficiencySet={1,2};
dvar boolean activeEfficiencySet;
dvar boolean x[N];
minimize sum(i in N) C[i]*x[i]*(1-0.1*activeEfficiencySet*(i not in efficiencySet));
subject to
{
forall(j in M) sum(i in N) A[i,j]*x[i]>=B[j];
activeEfficiencySet==(1<=sum(i in efficiencySet) x[i]);
}
Using Alex's data, I have written the program in docplex (cplex python API)
from docplex.mp.model import Model
n = 3
m = 4
A = {}
A[0, 0] = 1
A[0, 1] = 4
A[0, 2] = 9
A[0, 3] = 6
A[1, 0] = 8
A[1, 1] = 5
A[1, 2] = 0
A[1, 3] = 8
A[2, 0] = 2
A[2, 1] = 9
A[2, 2] = 0
A[2, 3] = 2
B = {}
B[0] = 3
B[1] = 1
B[2] = 3
B[3] = 0
C = {}
C[0] = 1
C[1] = 1
C[2] = 1
efficiencySet = [0, 1]
mdl = Model(name="")
activeEfficiencySet = mdl.binary_var()
x = mdl.binary_var_dict(range(n), name="x")
# constraint 1:
for j in range(m):
mdl.add_constraint(mdl.sum(A[i, j] * x[i] for i in range(n)) >= B[j])
# constraint 2:
mdl.add(activeEfficiencySet == (mdl.sum(x) >= 1))
# objective function:
# expr = mdl.linear_expr()
lst = []
for i in range(n):
if i not in efficiencySet:
lst.append((C[i] * x[i] * (1 - 0.1 * activeEfficiencySet)))
else:
lst.append(C[i] * x[i])
mdl.minimize(mdl.sum(lst))
mdl.solve()
for i in range(n):
print(str(x[i]) + " : " + str(x[i].solution_value))
activeEfficiencySet.solution_value

Algorithm behind Inkscape's auto-smooth nodes

I am generating smooth curves by interpolating (lots of) points. I want to have local support (i.e. only few points determine the smooth curve locally), so I do not want to use a classical interpolation spline. Bezier curves to me would be a natural solution and Inkscape's auto-smooth nodes (http://tavmjong.free.fr/INKSCAPE/MANUAL/html/Paths-Editing.html#Paths-Node-AutoSmooth) do pretty well what I want to have. But I have trouble finding the implementation in the source or some reference to the underlying algorithm.
Is anybody here aware of the algorithm or familiar enough with Inkscape's source so they can point me in the right direction?
For context: I am calculating a smooth path for a pen plotter but can not wait to have all supporting points available.
The code is here and I've implemented a version in Python using the svgpathtools library in a gist
Here is a diagram showing the method.
Given three points a, b, and c where b is auto-smooth and b has two control points u and v, then:
Let x be a unit vector perpendicular to the the angle bisector of ∠abc
u = b - x * 1/3|ba|
v = b + x * 1/3|bc|
As far as I know, there is nothing special about the constant 1/3 and you could vary it to have larger or smaller curvature.
Per #fang's comment below. It may be beter to use Catmull-Rom Interpolating Spline instead, which both interpolates and has local control property. See more here
For stitching together cubic bezier curves that interpolate (more like natural cubic splines) see below original answer.
===================================================================
The following is javascript-like pseudo-code that computes a series of (up to) cubic bezier curves that together combine to achieve one smooth curve passign through given points. Note bezier in below code is assumed to be a function that computes (the polynomial form of) a cubic bezier through given control points (which is already known algorithm). Note2 below algorithm is for 1d curves it is easily adjusted for 2d curves (ie compute x and y coordinates)
function bezierThrough( knots )
{
var i, points, segments;
computePoints = function computePoints( knots ) {
var i, p1, p2, a, b, c, r, m, n = knots.length-1;
p1 = new Array(n);
p2 = new Array(n);
/*rhs vector*/
a = new Array(n);
b = new Array(n);
c = new Array(n);
r = new Array(n);
/*left most segment*/
a[0] = 0;
b[0] = 2;
c[0] = 1;
r[0] = knots[0] + 2*knots[1];
/*internal segments*/
for(i=1; i<n-1; i++)
{
a[i] = 1;
b[i] = 4;
c[i] = 1;
r[i] = 4*knots[i] + 2*knots[i+1];
}
/*right segment*/
a[n-1] = 2;
b[n-1] = 7;
c[n-1] = 0;
r[n-1] = 8*knots[n-1] + knots[n];
/*solves Ax=b with the Thomas algorithm (from Wikipedia)*/
for(i=1; i<n; i++)
{
m = a[i] / b[i-1];
b[i] = b[i] - m*c[i - 1];
r[i] = r[i] - m*r[i-1];
}
p1[n-1] = r[n-1] / b[n-1];
for(i=n-2; i>=0; --i)
p1[i] = (r[i]-c[i]*p1[i+1]) / b[i];
/*we have p1, now compute p2*/
for (i=0;i<n-1;i++)
p2[i] = 2*knots[i+1] - p1[i+1];
p2[n-1] = (knots[n]+p1[n-1])/2;
return [p1, p2];
};
if ( 1 === knots.length )
{
segments = [knots[0]];
}
else if ( 2 === knots.length )
{
segments = [bezier([knots[0], knots[1]])];
}
else
{
segments = [];
points = computePoints(knots);
for(i=0; i<knots.length-1; i++)
segments.push(bezier([knots[i], points[0][i], points[1][i], knots[i+1]]));
}
return segments;
}
see also related post
Adapted code from here

Why is my naive line drawing algorithm faster than Bresenham

I implemented both the naive line drawing algorithm and bresenham algorithm. When I run the program with a 1000 lines, the naive line drawing algorithm is faster than the bresenham algorithm. Could anyone explain why?
Here is my code for both methods
def simpleLine(x1, y1, x2, y2):
dy = y2-y1;
dx = x2-x1;
x = x1
m = dy/dx;
b = y1-m*x1;
if(x1>x2):
x1,x2 = x2,x1
x=x1
while(x<=x2):
y=m*x+b;
PutPixle(win,x,round(y));
x=x+1
'
def BresenhamLine(x1, y1, x2, y2):
dx = abs(x2 - x1)
dy = abs(y2 - y1)
p = 2 * dy - dx
duady = 2 * dy
duadydx = 2 * (dy - dx)
x = x1
y = y1
xend = x2
if(x1 > x2):
x, y,xend = x2, y2,x1
while(x < xend):
PutPixle(win,x,y)
x =x+1
if(p<0):
p = p + 2*dy
else:
y = y-1 if y1>y2 else y+1
p = p+2*(dy-dx)
Bresenham's algorithm was invented for languages and machines with different performance characteristics than your python environment. In particular, low-level languages on systems where floating point math is much more expensive than integer math and branches.
In Python, your simple version is faster even though it uses floating point and rounding, because Python is slow and it executes fewer python operations per pixel. Any difference in speed between single integer or floating point operations is dwarfed by the cost of just doing python stuff.

Check if Bezier Curve is sub-curve of another Bezier

I want to check if a cubic Bezier curve is a sub-curve of another Bezier.
I think I understand basically how to do this, express the Beziers as two cubics, in x and y, then test if the cubics are scalings or translations of each other. If the scaling and translations match that tells us the curves are sub-segments of the same curve and gives us t0 prime and t1 prime of curve B in curve As space.
But I can't quite work out how to check the cubics for equivalence.
Answer based on the following comment:
Say we take a Bezier Curve, and split it up using de Casteljau's algorithm. Obviously the result is a lot of sub-curves of the original curve.The question is how to go back, and recover the t values, and the fact that the curves are part of the same curve, given only their 4 control points
Short answer: unless you have an infinite precision machine, you can't.
So we're stuck with "error threshold" testing. Given a master curve A and a "hopefully subcurve" curve B, run through the things that need to be true if B was a subcurve of A:
If B is a true subcurve then its start and end point lie on curve A. So check if that's true, within some error threshold. If they don't, then B is not a subcurve of A.
If B is a true subcurve then the derivatives at B's start and end points are the same as the derivatives for the corresponding coordinates on A. So check if that's true, within some error threshold. If they're not, B is not a subcurve of A.
If B is a true subcurve then the second derivatives at B's start an end points are the same as the second derivatives for the corresponding coordinates on A. So check if that's true, within some error threshold. If they're not, B is not a subcurve of A.
If all of these hold, we can be reasonably sure that B is a subcurve of A.
Also, since we need to come up with t values in order to check whether a point lies on A, and what derivative of A is at that point, we already know the t values that define the interval on A that maps to the full curve B.
Here's the working code.
(You can find cubic root finders quite easily)
/*
A = p3 + 3.0 * p1 - 3.0 * p2 - p0;
B = 3.0 * p0 - 6.0 * p1 + 3.0 * p2;
C = 3.0 * p1 - 3.0 * p0;
D = p0;
*/
bool CurveIsSubCurve(BezierCurve bez, BezierCurve sub, double epsilon, double *t)
{
int Nr;
double tcand[6];
int i, ii;
double ts[6], te[6];
int Ns = 0;
int Ne = 0;
Vector2 p;
/*
Take two bites at the cherry. The points may have slight errors, and a small error in x or y could represent a big error in
t. However with any luck either x or y will be close
*/
Nr = cubic_roots(bez.Ax(), bez.Bx(), bez.Cx(), bez.Dx() - sub.P0().x, tcand);
Nr += cubic_roots(bez.Ay(), bez.By(), bez.Cy(), bez.Dy() - sub.P0().y, tcand + Nr);
for(i=0;i<Nr;i++)
{
p = bez.Eval(tcand[i]);
if(fabs(p.x - sub.P0().x) < epsilon && fabs(p.y - sub.P0().y) < epsilon)
{
ts[Ns++] = tcand[i];
}
}
/* same thing of sub curve end point */
Nr = cubic_roots(bez.Ax(), bez.Bx(), bez.Cx(), bez.Dx() - sub.P3().x, tcand);
Nr += cubic_roots(bez.Ay(), bez.By(), bez.Cy(), bez.Dy() - sub.P3().y, tcand + Nr);
for(i=0;i<Nr;i++)
{
p = bez.Eval(tcand[i]);
if(fabs(p.x - sub.P3().x) < epsilon && fabs(p.y - sub.P3().y) < epsilon)
{
te[Ne++] = tcand[i];
}
}
/* do an all by all to get matches (Ns, Ne will be small, but if
we have a degenerate, i.e. a loop, the loop intersection point is
where the mother curve is quite likely to be cut, so test everything*/
for(i = 0; i < Ns; i++)
{
double s,d;
double Ax, Bx, Cx, Dx;
double Ay, By, Cy, Dy;
for(ii=0;ii<Ne;ii++)
{
s = (te[ii] - ts[i]);
d = ts[i];
/* now substitute back */
Ax = bez.Ax() *s*s*s;
Bx = bez.Ax() *2*s*s*d + bez.Ax()*s*s*d + bez.Bx()*s*s;
Cx = bez.Ax()*s*d*d + bez.Ax()*2*s*d*d + bez.Bx()*2*s*d + bez.Cx() * s;
Dx = bez.Ax() *d*d*d + bez.Bx()*d*d + bez.Cx()*d + bez.Dx();
Ay = bez.Ay() *s*s*s;
By = bez.Ay() *2*s*s*d + bez.Ay()*s*s*d + bez.By()*s*s;
Cy = bez.Ay()*s*d*d + bez.Ay()*2*s*d*d + bez.By()*2*s*d + bez.Cy() * s;
Dy = bez.Ay() *d*d*d + bez.By()*d*d + bez.Cy()*d + bez.Dy();
if(fabs(Ax - sub.Ax()) < epsilon && fabs(Bx - sub.Bx()) < epsilon &&
fabs(Cx - sub.Cx()) < epsilon && fabs(Dx - sub.Dx()) < epsilon &&
fabs(Ay - sub.Ay()) < epsilon && fabs(By - sub.By()) < epsilon &&
fabs(Cy - sub.Cy()) < epsilon && fabs(Dy - sub.Dy()) < epsilon)
{
if(t)
{
t[0] = ts[i];
t[1] = te[ii];
}
return true;
}
}
}
return false;
}

Find control point on piecewise quadratic Bezier curve

I need to write a program to generate and display a piecewise quadratic Bezier curve that interpolates each set of data points (I have a txt file contains data points). The curve should have continuous tangent directions, the tangent direction at each data point being a convex combination of the two adjacent chord directions.
0.1 0,
0 0,
0 5,
0.25 5,
0.25 0,
5 0,
5 5,
10 5,
10 0,
9.5 0
The above are the data points I have, does anyone know what formula I can use to calculate control points?
You will need to go with a cubic Bezier to nicely handle multiple slope changes such as occurs in your data set. With quadratic Beziers there is only one control point between data points and so each curve segment much be all on one side of the connecting line segment.
Hard to explain, so here's a quick sketch of your data (black points) and quadratic control points (red) and the curve (blue). (Pretend the curve is smooth!)
Look into Cubic Hermite curves for a general solution.
From here: http://blog.mackerron.com/2011/01/01/javascript-cubic-splines/
To produce interpolated curves like these:
You can use this coffee-script class (which compiles to javascript)
class MonotonicCubicSpline
# by George MacKerron, mackerron.com
# adapted from:
# http://sourceforge.net/mailarchive/forum.php?thread_name=
# EC90C5C6-C982-4F49-8D46-A64F270C5247%40gmail.com&forum_name=matplotlib-users
# (easier to read at http://old.nabble.com/%22Piecewise-Cubic-Hermite-Interpolating-
# Polynomial%22-in-python-td25204843.html)
# with help from:
# F N Fritsch & R E Carlson (1980) 'Monotone Piecewise Cubic Interpolation',
# SIAM Journal of Numerical Analysis 17(2), 238 - 246.
# http://en.wikipedia.org/wiki/Monotone_cubic_interpolation
# http://en.wikipedia.org/wiki/Cubic_Hermite_spline
constructor: (x, y) ->
n = x.length
delta = []; m = []; alpha = []; beta = []; dist = []; tau = []
for i in [0...(n - 1)]
delta[i] = (y[i + 1] - y[i]) / (x[i + 1] - x[i])
m[i] = (delta[i - 1] + delta[i]) / 2 if i > 0
m[0] = delta[0]
m[n - 1] = delta[n - 2]
to_fix = []
for i in [0...(n - 1)]
to_fix.push(i) if delta[i] == 0
for i in to_fix
m[i] = m[i + 1] = 0
for i in [0...(n - 1)]
alpha[i] = m[i] / delta[i]
beta[i] = m[i + 1] / delta[i]
dist[i] = Math.pow(alpha[i], 2) + Math.pow(beta[i], 2)
tau[i] = 3 / Math.sqrt(dist[i])
to_fix = []
for i in [0...(n - 1)]
to_fix.push(i) if dist[i] > 9
for i in to_fix
m[i] = tau[i] * alpha[i] * delta[i]
m[i + 1] = tau[i] * beta[i] * delta[i]
#x = x[0...n] # copy
#y = y[0...n] # copy
#m = m
interpolate: (x) ->
for i in [(#x.length - 2)..0]
break if #x[i] <= x
h = #x[i + 1] - #x[i]
t = (x - #x[i]) / h
t2 = Math.pow(t, 2)
t3 = Math.pow(t, 3)
h00 = 2 * t3 - 3 * t2 + 1
h10 = t3 - 2 * t2 + t
h01 = -2 * t3 + 3 * t2
h11 = t3 - t2
y = h00 * #y[i] +
h10 * h * #m[i] +
h01 * #y[i + 1] +
h11 * h * #m[i + 1]
y