What is faster to calculate: Large looped calculations or Vlookup on multiple large data grids? - vba

This is a conceptual question that will help me before I start coding my next project. Which approach do you think will be faster?
Loop Calculations: An example of how it would be set up:
for i = 0 to 67
While x < 350
y = 0
While y < 600
Call solved() 'solves and returns "concentration"
If c(y, x) <> Empty = True Then
c(y, x) = c(row, x) + concentration
Else
c(y, x) = concentration
End If
y = y + 1
Wend
x = x + 1
Wend
Next i
Vlookup:
Using Matlab I can generate millions of solved data points and store them into a matrix. This can be stored in a database.
Restrictions: Only Access available as a database. and with the amount of data needed to be stored it will hit memory limit. Excel is not a good idea to store this much data as well.
Taking the restriction into account, I have thought of using multiple text files to store data and use Excel to search and pull values.
Theoretically, a search should be faster. But with different files to open up and a large matrix to look through, the speed will be affected. What do you guys think, and please input if there is a better approach. Thanks!

Related

Are calculations involving a large matrix using arrays in VBA faster than doing the same calculation manually in Excel?

I am trying to do calculations as part of regression model in Excel.
I need to calculate ((X^T)WX)^(-1)(X^T)WY. Where X, W, Y are matrices and ^T and ^-1 are denoting the matrix transpose and inverting operation.
Now when X, W, Y are of small dimensions I simply run my macro which calculates these values very very fast.
However sometimes I am dealing with the case when say, the dimensions of X, W, Y are 5000 X 5, 5000 X 1 and 5000 X 1 respectively, then the macro can take a lot longer to run.
I have two questions:
Would, instead of using my macro which generates the matrices on Excel sheets and then uses Excel formulas like MMULT and MINVERSE etc. to calculate the output, it be faster for larger dimension matrices if I used arrays in VBA to do all the calculations? (I am not too sure how arrays work in VBA so I don't actually know if it would do anything to excel, and hence if it would be any quicker/less computationally intensive.)
If the answer to the above question is no it would be no quicker. Then does anybody have an idea how to speed such calculations up? Or do I need to simply put up with it and wait.
Thanks for your time.
Considering that the algorithm of the code is the same, the speed ranking is the following:
Dll custom library with C#, C++, C, Java or anything similar
VBA
Excel
I have compared a VBA vs C++ function here, in the long term the result is really bad for VBA.
So, the following Fibonacci with recursion in C++:
int __stdcall FibWithRecursion(int & x)
{
int k = 0;
int p = 0;
if (x == 0)
return 0;
if (x == 1)
return 1;
k = x - 1;
p = x - 2;
return FibWithRecursion(k) + FibWithRecursion(p);
}
is exponentially better, when called in Excel, than the same complexity function in VBA:
Public Function FibWithRecursionVBA(ByRef x As Long) As Long
Dim k As Long: k = 0
Dim p As Long: p = 0
If (x = 0) Then FibWithRecursionVBA = 0: Exit Function
If (x = 1) Then FibWithRecursionVBA = 1: Exit Function
k = x - 1
p = x - 2
FibWithRecursionVBA = FibWithRecursionVBA(k) + FibWithRecursionVBA(p)
End Function
Better late than never:
I use matrices that are bigger, 3 or 4 dimensions, sized like 16k x 26 x 5.
I run through them to find data, apply one or two formulas or make combos with other matrices.
Number one, after starting the macro, open another application like notepad, you might have a nice speed increase ☺ !
Then, I guess you switched of screen updating etc, and turned of automatic calculation
As last: don't put the data in cells, not in arrays.
Just something like:
Dim Matrix1 as String ===>'put it in declarations if you want to use it in other macros as well. Remember you can not do "blabla=activecell.value2" etc anymore!!
In the "Sub()" code, use ReDim Matrix1(1 to a_value, 1 to 2nd_value, ... , 1 to last_value)
Matrix1(45,32,63)="what you want to put there"
After running, just drop the
Matrix1(1 to a_value, 1 to 2nd_value,1) at 1st sheet,
Matrix1(1 to a_value, 1 to 2nd_value,2) at 2nd sheet, etc
Switch on screen updating again, etc
In this way my calculation went from 45 minutes to just one, by avoiding the intermediary screen update
Success, I hope it is useful for somebody

Function call faster than on the fly calculation?

I am now seriously confused. I have a function creating a table with a random number of entries, and I tried two different methods to choose that number (which is somewhat wheighted):
Method 1, separated function
local function n()
local n = math.random()
if n < .7 then return 0
elseif n < .8 then return 1
end
return 2
end
local function final()
for i = 1, n() do
...
end
end
Method 2, direct calculation
local function final()
local n = math.random()
if n < .7 then n = 0
elseif n < .8 then n = 1
else n = 2
end
for i = 1, n do
...
end
end
The problem is: for some reason, the first method performs 30% faster than the second. Why is this?
No, call will never be faster than plainly inlining it. All the difference for the first method is adding extra work of setting up stack and dismantling it. The rest of code, both original and compiled is exactly the same, so it is only natural that "just calculation" will be faster than "just calculation + some extra work".
Your benchmark seem to be imprecise. For such a lightweight function a for loop and os.clock call themselves will take almost as many time as the function itself, so combined with os.clock inherent low resoulution and small amount of loops your data is not really statistically significant and you're mostly seeing results of random hiccups in your hardware. Use better timer and increase number of loops to at least 1000000.

Is *0.25 faster than / 4 in VB.NET

I know similar questions have been answered before but none of the answers I could find were specific to .NET.
Does the VB.NET compiler optimize expressions like:
x = y / 4
by compiling:
x = y * 0.25
And before anyone says don't worry the difference is small, i already know that but this will be executed a lot and choosing one over the other could make a useful difference in total execution time and will be much easier to do than a more major refactoring exercise.
Perhaps I should have mentioned for the benefit of those who live in an environment of total freedom: I am not at liberty to change to a different language. If I were I would probably have written this code in Fortran.
As suggested here is a simple comparison:
Dim y = 1234.567
Dim x As Double
Dim c = 10000000000.0
Dim ds As Date
Dim df As Date
ds = Now
For i = 1 To c
x = y / 4
Next
df = Now
Console.WriteLine("divide " & (df - ds).ToString)
ds = Now
For i = 1 To c
x = y * 0.25
Next
df = Now
Console.WriteLine("multiply " & (df - ds).ToString)
The output is:
divide 00:00:52.7452740
multiply 00:00:47.2607256
So divide does appear to be slower by about 10%. But this difference is so small that I suspected it to be accidental. Another two runs give:
divide 00:00:45.1280000
multiply 00:00:45.9540000
divide 00:00:45.9895985
multiply 00:00:46.8426838
Suggesting that in fact the optimization is made or that the arithmetic operations are a vanishingly small part of the total time.
In either case it means that I don't need to care which is used.
In fact ildasm shows that the IL uses div in the first loop and mul in the second. So it doesn't make the subsitution after all. Unless the JIT compiler does.

How to choose a range for a loop based upon the answers of a previous loop?

I'm sorry the title is so confusingly worded, but it's hard to condense this problem down to a few words.
I'm trying to find the minimum value of a specific equation. At first I'm looping through the equation, which for our purposes here can be something like y = .245x^3-.67x^2+5x+12. I want to design a loop where the "steps" through the loop get smaller and smaller.
For example, the first time it loops through, it uses a step of 1. I will get about 30 values. What I need help on is how do I Use the three smallest values I receive from this first loop?
Here's an example of the values I might get from the first loop: (I should note this isn't supposed to be actual code at all. It's just a brief description of what's happening)
loop from x = 1 to 8 with step 1
results:
x = 1 -> y = 30
x = 2 -> y = 28
x = 3 -> y = 25
x = 4 -> y = 21
x = 5 -> y = 18
x = 6 -> y = 22
x = 7 -> y = 27
x = 8 -> y = 33
I want something that can detect the lowest three values and create a loop. From theses results, the values of x that get the smallest three results for y are x = 4, 5, and 6.
So my "guess" at this point would be x = 5. To get a better "guess" I'd like a loop that now does:
loop from x = 4 to x = 6 with step .5
I could keep this pattern going until I get an absurdly accurate guess for the minimum value of x.
Does anybody know of a way I can do this? I know the values I'm going to get are going to be able to be modeled by a parabola opening up, so this format will definitely work. I was thinking that the values could be put into a column. It wouldn't be hard to make something that returns the smallest value for y in that column, and the corresponding x-value.
If I'm being too vague, just let me know, and I can answer any questions you might have.
nice question. Here's at least a start for what I think you should do for this:
Sub findMin()
Dim lowest As Integer
Dim middle As Integer
Dim highest As Integer
lowest = 999
middle = 999
hightest = 999
Dim i As Integer
i = 1
Do While i < 9
If (retVal(i) < retVal(lowest)) Then
highest = middle
middle = lowest
lowest = i
Else
If (retVal(i) < retVal(middle)) Then
highest = middle
middle = i
Else
If (retVal(i) < retVal(highest)) Then
highest = i
End If
End If
End If
i = i + 1
Loop
End Sub
Function retVal(num As Integer) As Double
retVal = 0.245 * Math.Sqr(num) * num - 0.67 * Math.Sqr(num) + 5 * num + 12
End Function
What I've done here is set three Integers as your three Min values: lowest, middle, and highest. You loop through the values you're plugging into the formula (here, the retVal function) and comparing the return value of retVal (hence the name) to the values of retVal(lowest), retVal(middle), and retVal(highest), replacing them as necessary. I'm just beginning with VBA so what I've done likely isn't very elegant, but it does at least identify the Integers that result in the lowest values of the function. You may have to play around with the values of lowest, middle, and highest a bit to make it work. I know this isn't EXACTLY what you're looking for, but it's something along the lines of what I think you should do.
There is no trivial way to approach this unless the problem domain is narrowed.
The example polynomial given in fact has no minimum, which is readily determined by observing y'>0 (hence, y is always increasing WRT x).
Given the wide interpretation of
[an] equation, which for our purposes here can be something like y =
.245x^3-.67x^2+5x+12
many conditions need to be checked, even assuming the domain is limited to polynomials.
The polynomial order is significant, and the order determines what conditions are necessary to check for how many solutions are possible, or whether any solution is possible at all.
Without taking this complexity into account, an iterative approach could yield an incorrect solution due to underflow error, or an unfortunate choice of iteration steps or bounds.
I'm not trying to be hard here, I think your idea is neat. In practice it is more complicated than you think.

optimise distance calculation in matlab

I am a newbie with Matlab and I have the following scenario( which is part of a larger problem).
matrix A with 4754x1024 and matrix B with 6800x1024 rows.
For every row in matrix A i need to calculate the euclidean distance in matrix B. I am using the following technique to calculate the distance but I find that this is very inefficient and very time consuming in Matlab.
for i=1:row_A
A_data=A_test(i,:);
for j=1:row_B
B_data=B_train(j,:);
X=[A_data;B_data];
%calculate distance
d=pdist(X,'euclidean');
dist(j,i)=d;
end
end
Any suggestions to optimise this because the final step involves performing this operation on 50 such sets of A and B.
Thanks and Regards,
Bhavya
I'm not sure what your code is actually doing.
Assuming your data has the following properties
assert(size(A,2) == size(B,2))
Try
d = zeros(size(A,1), size(B,1));
for i = 1:size(A,1)
d(i,:) = sqrt(sum(bsxfun(#minus, B, A(i,:)).^2, 2));
end
Or possibly better organised by columns (See "Store and Access Data in Columns" in http://www.mathworks.co.uk/company/newsletters/news_notes/june07/patterns.html):
At = A.'; Bt = B.';
d = zeros(size(At,2), size(Bt,2));
for i = 1:size(At,2)
d(i,:) = sqrt(sum(bsxfun(#minus, Bt, At(:,i)).^2, 1));
end