Matlab: Optimize this? - optimization

Im v new to matlab. Have been tasked with speeding up a procedure. Im sure there is a better way to do the following statements:
for i = 2:length(WallId)
if WallId(i) ~= WallId(i-1)
ReducedWallId = [ReducedWallId;WallId(i)];
ReducedWallTime = [ReducedWallTime;WallTime(i)];
GroupCount = [GroupCount;tempCount];
tempCount = 1;
else
tempCount = tempCount +1;
end
end
I can preallocate the various vars to 'length(WallId)' but what do I do with the extra after its done? Do I care?

idx = find([true diff(WallId) ~= 0]);
ReducedWallId = WallId(idx);
ReducedWallTime = WallTime(idx);
GroupCount = diff([idx numel(WallId)+1]);
Assuming what you want is a summary of the unique data in WallId and WallTime, then you should
make sure that WallId is sorted first. You can re-organise WallTime to match, as follows:
[WallId, ind] = sort(WallId);
WallTime = WallTime(ind);
Also, you'll only get the right answer if WallTime matches whenever WallId does.

Use vectorization.
ReducedWallId=WallId(find(diff(WallId)~=0));
and similarly for ReducedWallTime.
The explicit for-loops are extremely slow. Using vector-operations speeds everything up considerably. This is a general theme in optimizing MATLAB code and is covered in detail in various documents found in the web.

Related

Single Value Decomposition algorithm not working

I wrote the following function to perform SVD according to page 45 of 'the deep learning book' by Ian Goodfellow and co.
def SVD(A):
#A^T
AT = np.transpose(A)
#AA^T
AAT = A.dot(AT)
#A^TA
ATA = AT.dot(A)
#Left single values
LSV = np.linalg.eig(AAT)[1]
U = LSV #some values of U have the wrong sign
#Right single values
RSV = np.linalg.eig(ATA)[1]
V = RSV
V[:,0] = V[:,0] #V isnt arranged properly
values = np.sqrt(np.linalg.eig(ata)[0])
#descending order
values = np.sort(values)[::-1]
rows = A.shape[0]
columns = A.shape[1]
D = np.zeros((rows,columns))
np.fill_diagonal(D,values)
return U, D, V
However for any given matrix the results are not the same as using
np.linalg.svd(A)
and I have no idea why.
I tested my algorithm by saying
abs(UDV^T - A) < 0.0001
to check if it decomposed properly and it hasn't. The problem seems to lie with the V and U components but I can't see what's going wrong. D seems to be correct.
If anyone can see the problem it would be much appreciated.
I think you have a problem with the order of the eigenpairs that eig(ATA) and eig(AAT) return. The documentation of np.linalg.eig tells that no order is guaranteed. Replacing eig by eigh, which returns the eigenpairs in ascending order, should help. Also don't rearrange the values.
By the way, eigh is specific for symmetric matrices, such as the ones that you are passing, and will not return complex numbers if the original matrix is real.

What is faster to calculate: Large looped calculations or Vlookup on multiple large data grids?

This is a conceptual question that will help me before I start coding my next project. Which approach do you think will be faster?
Loop Calculations: An example of how it would be set up:
for i = 0 to 67
While x < 350
y = 0
While y < 600
Call solved() 'solves and returns "concentration"
If c(y, x) <> Empty = True Then
c(y, x) = c(row, x) + concentration
Else
c(y, x) = concentration
End If
y = y + 1
Wend
x = x + 1
Wend
Next i
Vlookup:
Using Matlab I can generate millions of solved data points and store them into a matrix. This can be stored in a database.
Restrictions: Only Access available as a database. and with the amount of data needed to be stored it will hit memory limit. Excel is not a good idea to store this much data as well.
Taking the restriction into account, I have thought of using multiple text files to store data and use Excel to search and pull values.
Theoretically, a search should be faster. But with different files to open up and a large matrix to look through, the speed will be affected. What do you guys think, and please input if there is a better approach. Thanks!

numpy.einsum for Julia? (2)

Coming from this question, I wonder if a more generalized einsum was possible. Let us assume, I had the problem
using PyCall
#pyimport numpy as np
a = rand(10,10,10)
b = rand(10,10)
c = rand(10,10,10)
Q = np.einsum("imk,ml,lkj->ij", a,b,c)
Or something similar, how were I to solve this problem without looping through the sums?
with best regards
Edit/Update: This is now a registered package, so you can Pkg.add("Einsum") and you should be good to go (see the example below to get started).
Original Answer: I just created some very preliminary code to do this. It follows exactly what Matt B. described in his comment. Hope it helps, let me know if there are problems with it.
https://github.com/ahwillia/Einsum.jl
This is how you would implement your example:
using Einsum
a = rand(10,10,10)
b = rand(10,10)
c = rand(10,10,10)
Q = zeros(10,10)
#einsum Q[i,j] = a[i,m,k]*b[m,l]*c[l,k,j]
Under the hood the macro builds the following series of nested for loops and inserts them into your code before compile time. (Note this is not the exact code inserted, it also checks to make sure the dimensions of the inputs agree, using macroexpand to see the full code):
for j = 1:size(Q,2)
for i = 1:size(Q,1)
s = 0
for l = 1:size(b,2)
for k = 1:size(a,3)
for m = 1:size(a,2)
s += a[i,m,k] * b[m,l] * c[l,k,j]
end
end
end
Q[i,j] = s
end
end

Matlab: how do I run the optimization (fmincon) repeately?

I am trying to follow the tutorial of using the optimization tool box in MATLAB. Specifically, I have a function
f = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1)+b
subject to the constraint:
(x(1))^2+x(2)-1=0,
-x(1)*x(2)-10<=0.
and I want to minimize this function for a range of b=[0,20]. (That is, I want to minimize this function for b=0, b=1,b=2 ... and so on).
Below is the steps taken from the MATLAB's tutorial webpage(http://www.mathworks.com/help/optim/ug/nonlinear-equality-and-inequality-constraints.html), how should I change the code so that, the optimization will run for 20 times, and save the optimal values for each b?
Step 1: Write a file objfun.m.
function f = objfun(x)
f = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1)+b;
Step 2: Write a file confuneq.m for the nonlinear constraints.
function [c, ceq] = confuneq(x)
% Nonlinear inequality constraints
c = -x(1)*x(2) - 10;
% Nonlinear equality constraints
ceq = x(1)^2 + x(2) - 1;
Step 3: Invoke constrained optimization routine.
x0 = [-1,1]; % Make a starting guess at the solution
options = optimoptions(#fmincon,'Algorithm','sqp');
[x,fval] = fmincon(#objfun,x0,[],[],[],[],[],[],...
#confuneq,options);
After 21 function evaluations, the solution produced is
x, fval
x =
-0.7529 0.4332
fval =
1.5093
Update:
I tried your answer, but I am encountering problem with your step 2. Bascially, I just fill the my step 2 to your step 2 (below the comment "optimization just like before").
%initialize list of targets
b = 0:1:20;
%preallocate/initialize result vectors using zeros (increases speed)
opt_x = zeros(length(b));
opt_fval = zeros(length(b));
>> for idx = 1, length(b)
objfun = #(x)objfun_builder(x,b)
%optimization just like before
x0 = [-1,1]; % Make a starting guess at the solution
options = optimoptions(#fmincon,'Algorithm','sqp');
[x,fval] = fmincon(#objfun,x0,[],[],[],[],[],[],...
#confuneq,options);
%end the stuff I fill in
opt_x(idx) = x
opt_fval(idx) = fval
end
However, it gave me the output is:
Error: "objfun" was previously used as a variable, conflicting
with its use here as the name of a function or command.
See "How MATLAB Recognizes Command Syntax" in the MATLAB
documentation for details.
There are two things you need to change about your code:
Creation of the objective function.
Multiple optimizations using a loop.
1st Step
For more flexibility with regard to b, you need to set up another function that returns a handle to the desired objective function, e.g.
function h = objfun_builder(x, b)
h = #(x)(objfun(x));
function f = objfun(x)
f = exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1) + b;
end
end
A more elegant and shorter approach are anonymous functions, e.g.
objfun_builder = #(x,b)(exp(x(1))*(4*x(1)^2+2*x(2)^2+4*x(1)*x(2)+2*x(2)+1) + b);
After all, this works out to be the same as above. It might be less intuitive for a Matlab-beginner, though.
2nd Step
Instead of placing an .m-file objfun.m in your path, you will need to call
objfun = #(x)(objfun_builder(x,myB));
to create an objective function in your workspace. In order to loop over the interval b=[0,20], use the following loop
%initialize list of targets
b = 0:1:20;
%preallocate/initialize result vectors using zeros (increases speed)
opt_x = zeros(length(b))
opt_fval = zeros(length(b))
%start optimization of list of targets (`b`s)
for idx = 1, length(b)
objfun = #(x)objfun_builder(x,b)
%optimization just like before
opt_x(idx) = x
opt_fval(idx) = fval
end

optimise distance calculation in matlab

I am a newbie with Matlab and I have the following scenario( which is part of a larger problem).
matrix A with 4754x1024 and matrix B with 6800x1024 rows.
For every row in matrix A i need to calculate the euclidean distance in matrix B. I am using the following technique to calculate the distance but I find that this is very inefficient and very time consuming in Matlab.
for i=1:row_A
A_data=A_test(i,:);
for j=1:row_B
B_data=B_train(j,:);
X=[A_data;B_data];
%calculate distance
d=pdist(X,'euclidean');
dist(j,i)=d;
end
end
Any suggestions to optimise this because the final step involves performing this operation on 50 such sets of A and B.
Thanks and Regards,
Bhavya
I'm not sure what your code is actually doing.
Assuming your data has the following properties
assert(size(A,2) == size(B,2))
Try
d = zeros(size(A,1), size(B,1));
for i = 1:size(A,1)
d(i,:) = sqrt(sum(bsxfun(#minus, B, A(i,:)).^2, 2));
end
Or possibly better organised by columns (See "Store and Access Data in Columns" in http://www.mathworks.co.uk/company/newsletters/news_notes/june07/patterns.html):
At = A.'; Bt = B.';
d = zeros(size(At,2), size(Bt,2));
for i = 1:size(At,2)
d(i,:) = sqrt(sum(bsxfun(#minus, Bt, At(:,i)).^2, 1));
end