Is there a vDSP function to do the following operation?

Is there a vDSP function to do the following operation? - objective-c

Sorry if this is obvious. I'm just getting into the Accelerate framework and trying to go beyond the very simple stuff. I'm staring down the vDSP reference but I'm not sure how the following would be phrased or what it might be called in technical lingo. I want the following operation - what's the best way to do this with vDSP? I'm just having trouble finding it. (In pseudocode, for i from 0 to some N:)
O[i]= A + B * (sum of vector I from 0 to i)
Thanks!
To clarify: these are both vectors of floats and speed is critical.

It turns out this is equivalent to:
vDSP_vrsum(I, 1, &B, O, 1, N);
vDSP_vsadd(O, 1, &A, O, 1, N);

Related

Does any one know how to solve the following equation?

When I reading this paper http://articles.adsabs.harvard.edu/cgi-bin/nph-iarticle_query?1976ApJ...209..214B&data_type=PDF_HIGH&whole_paper=YES&type=PRINTER&filetype=.pdf
I try to solve eq(49) numerically, it seems a fokker-planck equation, I find finite difference method doesn't work, it's unstable.
Does any one know how to solve it?

computational science stack exchange is where you can ask and hope for an answer. Or you could try its physics cousin. The equation, you quote, is integro-differential equation, fairly non-linear... Fokker-Plank looking equation. Definitely not the typical Fokker-Plank.
What you can try is to discretize the space part of the function g(x,t) using finite differences or finite-elements. After all, 0 < x < x_max and you have boundary conditions. You also have to discretize the corresponding integration. So maybe finite elements might be more appropriate? Finite elements means you can write g(x, t) as a series of a well chosen basis of compactly supported simple enough functions Bj(x) : j = 1...N in the interval [0, x_max]
g(x,t) = sum_j=1:N gj(t)*Bj(x)
That will turn your function into a (large) vector gj(t) = g(x_j, t), for j = 1, 1, ...., N. As a result, you will obtain a non-linear system of ODEs
dgj(t)/dt = Qj(g1(t), g2(t), ..., gN(t))
j = 1 ... N
After that use something like Runge-Kutta to integrate numerically the ODE system.

Efficiently implementing DXT1 texture decompression in hardware

DXT1 compression is designed to be fast to decompress in hardware where its used in texture samplers. The Wikipedia article says that under certain circumstances you can work out the co-efficients of the interpolated colours as:
c2 = (2/3)*c0+(1/3)*c1
or rearranging that:
c2 = (1/3)*(2*c0+c1)
However you re-arrange the above equation, then you end up always having to multiply something by 1/3 (or dividing by 3, same deal even more expensive). And it seems weird to me that a texture format which is designed to be fast to decompress in hardware would require a multiplication or division. The FPGA I'm implementing my GPU on only has limited resources for multiplications and I want to save those for where they're really required.
So am I missing something? Is there an efficient way of avoiding the multiplications of the colour channels by a 1/3? Or should I just eat the cost of that multiplication?

This might be a bad way of imagining it, but could you implement it via the use of addition/subtraction of successive halves (shifts)?
As you have 16 bits this gives you the ability to get quite accurate with successive additions and subtractions.
A third could be represented as
a(n+1) = a(n) +/- A>>1, where, the list [0, 0, 1, 0, 1, etc] shows whether to add or subtract the shifted result.
I believe this is called fractional maths.
However, in FPGAs, it is difficult to know whether this is actually more power efficient than the native DSP blocks (e.g. DSP48E1) provided.

MY best answer I can come up with is that I can use the identity:
x/3 = sum(n=1 to infinity) (x/2^(2n))
and then take the first n terms. Using 4 terms I get:
(x/4)+(x/16)+(x/64)+(x/256)
which equals
x*0.33203125
which is probably good enough.
This relies on multiplication by a fixed power of 2 being free in hardware, then 3 additions of which I can run 2 in parallel.
Any better answer is appreciated though.
** EDIT **: Using a combination of this and #dyslexicgruffalo's answer I made a simple c++ program which iterated over the various sequences and tried them all and recorded the various average/max errors.
I did this for 0 <= x <= 189 (as 189 is the value of 2*c0.g + c1.g when g (which is 6 bits) maxes out.
The shortest good sequence (with a max error of 2, average error of 0.62) and is 4 ops was:
1 + x/4 + x/16 + x/64.
The best sequence which had a max error of 1, average error of 0.32, but is 6 ops was:
x/2 - x/4 + x/8 - x/16 + x/32 - x/64.
For the 5 bit values (red and blue) the maximum value is 31*3 and the above sequences are still good but not the best. These are:
x/4 + x/8 - x/16 + x/32 [max error of 1, average 0.38]
and
1 + x/4 + x/16 [max error of 2, average of 0.68]
(And, luckily, none of the above sequences ever guesses an answer which is too big so no clamping is needed even though they're not perfect)

backtracking line search parameter

I am reading/practicing a bit with optimization using Nocedal&Wright, when I got the the simple backtracking algorithm, where if d is my line direction and a is the step size the algorithm looks for a such that
for some 0 < c < 1. They advised to use a very small c, order of 10^-4.
That seemed very odd to me, as a very loss demand.
I did some experimenting with c = 0.3 and it seemed to work much better then the sugested 10^-4 ( for a simple quadratic problem and steepest descent).
Any intuition as to why such a low value should work and why didn't it do well for me?
Thanks.

∇ f() may have completely different scales for different problems;
one stepsize cannot fit all.
Consider f(x) = sin( ω . x ): the right c will depend on ω,
which may be on the order of 1, or 1e-6, or ...
Thus it's a good idea to scale ∇ f() to about norm 1, then play with c.
(People who recommend "c = ...", please describe your problem size and scales.)
Add some noise to your quadratic, see what happens as you increase the noise.
Try quadratic + noise in 2d, 10d.
In machine learning, there seems to be quite a lot of folklore on c a.k.a. learning rate;
google
learning-rate on stackexchange.com ,
also gradient-descent step-size
and adagrad adaptive gradient.

Alternative for swi prologs clpq library for soving simplex

Excuse me if this is the wrong place to ask.
I have been using SWI Prolog's clpq library to solve simplex. I find the syntax pretty simple and expressive. It looks like this:
:- use_module(library(clpq)).
main(U, V, W) :-
{ 0 =< U, U =< 1,
0 =< V, V =< 1,
0 =< W, W =< 1
},
maximize(U + V - W).
No need to convert into any special format, you just type your constraints and the object function. Great, but it has come to my attention that clpq has bugs and is un-maintained, so I lack confidence in it.
So I was wondering if someone knows something opensource and equally as simple, without bugs? The best I have found so far is the GNU linear programming kit. What are other people using for experimenting with simplex?

For the archive, the simplex implementation in maxima (http://maxima.sourceforge.net/) is very good.

Mathematica exponentiation and finding a specified coefficient

I have the following code, and it does exactly what I want it to do, except that it is ridiculously slow. I would not be so bothered, except that when I process the code "manually", i.e., I break it into parts and do them individually, it's near instantaneous.
Here is my code:
Coefficient[Product[Sum[x^(j*Prime[i]), {j, 0, Floor[q/Prime[i]]}],
{i, 1, PrimePi[q]}], x, q]
Picture added for clarity:
I think it is trying to optimize the sum, but am not sure. Is there a way to stop that?
In addition, since all my coefficients are positive, and I only want the x^qth one, is there a way to get Mathematica to discard all exponents that are larger than that and not do all the multiplication with those?

I may be misunderstanding what you want but, as the coefficient will depend on q, I assume you want it evaluated for specific q. Since I suspected (like you) that the time is taken to optimise the produt and sum, I rewrote it. You had something like:
With[{q = 80}, Coefficient[\!\(
\*UnderoverscriptBox[\(\[Product]\), \(i = 1\), \(PrimePi[q]\)]\((
\*UnderoverscriptBox[\(\[Sum]\), \(j = 0\), \(\[LeftFloor]
\*FractionBox[\(q\), \(Prime[i]\)]\[RightFloor]\)]
\*SuperscriptBox[\(x\), \(j*Prime[i]\)])\)\), x, q]] // Timing
(*
-> {8.36181, 10003}
*)
which I rewrote with purely structural operations as
With[{q = 80},
Coefficient[Times ##
Table[Plus ## Table[x^(j*Prime[i]), {j, 0, Floor[q/Prime[i]]}],
{i, 1, PrimePi[q]}], x, q]] // Timing
(*
-> {8.36357, 10003}
*)
(this just builds up a list of the terms and then multiplies them, so no symbolic analysis is performed).
Just building up the polynomial is instantaneous, but it has a few thousand terms, so what is probably happening is that Coefficient spends a lot of time to make sure it has the right coefficient. Actually you can solve this by Expanding the polynomial. Thus:
With[{q = 80}, Coefficient[Expand[\!\(
\*UnderoverscriptBox[\(\[Product]\), \(i = 1\), \(PrimePi[q]\)]\((
\*UnderoverscriptBox[\(\[Sum]\), \(j = 0\), \(\[LeftFloor]
\*FractionBox[\(q\), \(Prime[i]\)]\[RightFloor]\)]
\*SuperscriptBox[\(x\), \(j*Prime[i]\)])\)\)], x, q]] // Timing
(*
-> {0.240862, 10003}
*)
and it also works for my method.
So to summarise, just stick Expand in front of the expression and before you take the coefficient.

I think that the reason that the original code is slow is because Coefficient is made to work even with very large expressions - ones that would not fit into the memory if naively expanded.
Here's the original polynomial:
poly[q_, x_] := Product[Sum[ x^(j*Prime[i]),
{j, 0, Floor[q/Prime[i]]}], {i, 1, PrimePi[q]}]
See how for not too large q, expanding the polynomial takes up a lot more memory and becomes fairly slow:
In[2]:= Through[{LeafCount, ByteCount}[poly[300, x]]] // Timing
Through[{LeafCount, ByteCount}[Expand#poly[300, x]]] // Timing
Out[2]= { 0.01, { 1859, 55864}}
Out[3]= {25.27, {77368, 3175840}}
Now let's define the coefficient in 3 different ways and time them
coeff[q_] := Module[{x}, Coefficient[poly[q, x], x, q]]
exCoeff[q_] := Module[{x}, Coefficient[Expand#poly[q, x], x, q]]
serCoeff[q_] := Module[{x}, SeriesCoefficient[poly[q, x], {x, 0, q}]]
In[7]:= Table[ coeff[q],{q,1,30}]//Timing
Table[ exCoeff[q],{q,1,30}]//Timing
Table[serCoeff[q],{q,1,30}]//Timing
Out[7]= {0.37,{0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}}
Out[8]= {0.12,{0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}}
Out[9]= {0.06,{0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}}
In[10]:= coeff[100]//Timing
exCoeff[100]//Timing
serCoeff[100]//Timing
Out[10]= {56.28,40899}
Out[11]= { 0.84,40899}
Out[12]= { 0.06,40899}
So SeriesCoefficient is definitely the way to go. Unless of course you're
a bit better at combinatorics than me and you know the following prime partition formulae
(oeis)
In[13]:= CoefficientList[Series[1/Product[1-x^Prime[i],{i,1,30}],{x,0,30}],x]
Out[13]= {1,0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}
In[14]:= f[n_]:=Length#IntegerPartitions[n,All,Prime#Range#PrimePi#n]; Array[f,30]
Out[14]= {0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Is there a vDSP function to do the following operation? - objective-c

It turns out this is equivalent to: vDSP_vrsum(I, 1, &B, O, 1, N); vDSP_vsadd(O, 1, &A, O, 1, N);

Related

Does any one know how to solve the following equation?

Efficiently implementing DXT1 texture decompression in hardware

backtracking line search parameter

Alternative for swi prologs clpq library for soving simplex

Mathematica exponentiation and finding a specified coefficient

Categories

Resources