Are calculations involving a large matrix using arrays in VBA faster than doing the same calculation manually in Excel? - vba

I am trying to do calculations as part of regression model in Excel.
I need to calculate ((X^T)WX)^(-1)(X^T)WY. Where X, W, Y are matrices and ^T and ^-1 are denoting the matrix transpose and inverting operation.
Now when X, W, Y are of small dimensions I simply run my macro which calculates these values very very fast.
However sometimes I am dealing with the case when say, the dimensions of X, W, Y are 5000 X 5, 5000 X 1 and 5000 X 1 respectively, then the macro can take a lot longer to run.
I have two questions:
Would, instead of using my macro which generates the matrices on Excel sheets and then uses Excel formulas like MMULT and MINVERSE etc. to calculate the output, it be faster for larger dimension matrices if I used arrays in VBA to do all the calculations? (I am not too sure how arrays work in VBA so I don't actually know if it would do anything to excel, and hence if it would be any quicker/less computationally intensive.)
If the answer to the above question is no it would be no quicker. Then does anybody have an idea how to speed such calculations up? Or do I need to simply put up with it and wait.
Thanks for your time.

Considering that the algorithm of the code is the same, the speed ranking is the following:
Dll custom library with C#, C++, C, Java or anything similar
VBA
Excel
I have compared a VBA vs C++ function here, in the long term the result is really bad for VBA.
So, the following Fibonacci with recursion in C++:
int __stdcall FibWithRecursion(int & x)
{
int k = 0;
int p = 0;
if (x == 0)
return 0;
if (x == 1)
return 1;
k = x - 1;
p = x - 2;
return FibWithRecursion(k) + FibWithRecursion(p);
}
is exponentially better, when called in Excel, than the same complexity function in VBA:
Public Function FibWithRecursionVBA(ByRef x As Long) As Long
Dim k As Long: k = 0
Dim p As Long: p = 0
If (x = 0) Then FibWithRecursionVBA = 0: Exit Function
If (x = 1) Then FibWithRecursionVBA = 1: Exit Function
k = x - 1
p = x - 2
FibWithRecursionVBA = FibWithRecursionVBA(k) + FibWithRecursionVBA(p)
End Function

Better late than never:
I use matrices that are bigger, 3 or 4 dimensions, sized like 16k x 26 x 5.
I run through them to find data, apply one or two formulas or make combos with other matrices.
Number one, after starting the macro, open another application like notepad, you might have a nice speed increase ☺ !
Then, I guess you switched of screen updating etc, and turned of automatic calculation
As last: don't put the data in cells, not in arrays.
Just something like:
Dim Matrix1 as String ===>'put it in declarations if you want to use it in other macros as well. Remember you can not do "blabla=activecell.value2" etc anymore!!
In the "Sub()" code, use ReDim Matrix1(1 to a_value, 1 to 2nd_value, ... , 1 to last_value)
Matrix1(45,32,63)="what you want to put there"
After running, just drop the
Matrix1(1 to a_value, 1 to 2nd_value,1) at 1st sheet,
Matrix1(1 to a_value, 1 to 2nd_value,2) at 2nd sheet, etc
Switch on screen updating again, etc
In this way my calculation went from 45 minutes to just one, by avoiding the intermediary screen update
Success, I hope it is useful for somebody

Related

VBA, min/max ... or other mathematical functions

I googled a lot, but I want to be sure:
Do I really need to use "Application.WorksheetFunction.Max" for the max-Function?
If yes, can I shorten this? Is there an overhead if I warp this long construct into a function?
Edit: I have removed the vba-access tag.
After I see that my question was unclear, I answer it by myself.
Some people did not know if I mean EXCEL or ACCESS. My fault to give the wrong tag. It was meant as a pure VBA question.
Second mistake: I was providing a EXCEL-way (Worksheet) for my question. But it was meant as pure VBA question.
I can not delete the question, but I like to do that.
So the answer is:
Public Function max(x, y As Variant) As Variant
max = IIf(x > y, x, y)
End Function
Public Function min(x, y As Variant) As Variant
min = IIf(x < y, x, y)
End Function
... is doing the job.
Sorry for wasting our time!
There are two shorter ways I've found to code that:
One
Sub MaxTest()
Dim A As Integer, B As Integer
A = Sheet1.Range("$A$1").Value
B = Sheet1.Range("$A$2").Value
Sheet1.Range("$B$1").Value = WorksheetFunction.Max(A, B)
End Sub
Two
Sub MaxTest()
Dim A As Integer, B As Integer
A = Sheet1.Range("$A$1").Value
B = Sheet1.Range("$A$2").Value
Sheet1.Range("$B$1").Value = IIf(A > B, A, B)
End Sub
In access, you dont have those worksheet functions, the only way you would is if either A you coded them in yourself or, B you import the excel library into your access project. Personally, i would go with A since if you import the excel library you are stuck on the exact version of excel that you imported in.
A quick and dirty example would be
Public Function Max(ByVal A As Variant,ByVal B As Variant) As Variant
If A > B Then
Max = A
Else
Max = B
End If
End Function
This needs a little TLC so that you aren't trying to compare recordsets or other nonsense and have it throw an error.
EDIT: i suppose you could also get the excel library stuff in if you late bound the reference to excel, so thats a third option.
I have 3 ideas for you,
you can skip the "Application" and just write 'Worksheetfunction.Max' in almost all circumstances using VBA.
You can set a variable to be 'Worksheetfunction', i.e.
Dim foo As Object
Set foo = Application.WorksheetFunction
foo.Max(MyArray)
You can create a function or Sub in the module that does only this. and have a long as the input, not sure this is worth the effort and it may slow the code down a fair amount.
There is no context provided here but if you are basically looping and finding a max for each pair, then you're Doing It Wrong™, especially that you are in an Access database. Doing the same thing as a SQL query would be much quicker and easier to design and optimize than trying to come up with a VBA hacky workarounds which force you in a RBAR* mode, which is anathema to all things databases.
So, I would first look carefully at what the code is trying to do, and if this is basically just reams of data to be aggregated, you should not be using VBA but rather SQL.
Heck, if you want to be lazy, Access already provides you with DMax domain function (though they are problematic because they invariably get used in VBA and thus get used in a RBAR manner). You really just want to let SQL do all the work and get a final recordset which the operation becomes a straight read/export/display without any additional work.
If you have looked long and hard at the code and the inputs are not from a data source, are not run in a loop (whether directly or indirectly), then sure, you can knock yourself out with some homebrewn VBA function or referencing Excel library (preferably late-bound).
1) Row-By-Agonizing-Row -- iterations are cool in imperative languages. It's not as cool in declarative languages like SQL.
As straight VBA you can do it in one line:
Max = -x * (x > y) - y*(y > x)
Min = -x * (x < y) - y*(y < x)
Based on #cybernetic.nomad answer but corrected:
Public Function Max(x, y As Variant) As Variant
'Max = -x * (x > y) - y * (y > x) - x * (x = y)
Max = -x * (x > y) - y * (y > x) - y * (x = y) 'same
End Function
Public Function Min(x, y As Variant) As Variant
Min = -x * (x < y) - y * (y < x) - x * (x = y)
Min = -x * (x < y) - y * (y < x) - y * (x = y) 'same
End Function

What is faster to calculate: Large looped calculations or Vlookup on multiple large data grids?

This is a conceptual question that will help me before I start coding my next project. Which approach do you think will be faster?
Loop Calculations: An example of how it would be set up:
for i = 0 to 67
While x < 350
y = 0
While y < 600
Call solved() 'solves and returns "concentration"
If c(y, x) <> Empty = True Then
c(y, x) = c(row, x) + concentration
Else
c(y, x) = concentration
End If
y = y + 1
Wend
x = x + 1
Wend
Next i
Vlookup:
Using Matlab I can generate millions of solved data points and store them into a matrix. This can be stored in a database.
Restrictions: Only Access available as a database. and with the amount of data needed to be stored it will hit memory limit. Excel is not a good idea to store this much data as well.
Taking the restriction into account, I have thought of using multiple text files to store data and use Excel to search and pull values.
Theoretically, a search should be faster. But with different files to open up and a large matrix to look through, the speed will be affected. What do you guys think, and please input if there is a better approach. Thanks!

Is *0.25 faster than / 4 in VB.NET

I know similar questions have been answered before but none of the answers I could find were specific to .NET.
Does the VB.NET compiler optimize expressions like:
x = y / 4
by compiling:
x = y * 0.25
And before anyone says don't worry the difference is small, i already know that but this will be executed a lot and choosing one over the other could make a useful difference in total execution time and will be much easier to do than a more major refactoring exercise.
Perhaps I should have mentioned for the benefit of those who live in an environment of total freedom: I am not at liberty to change to a different language. If I were I would probably have written this code in Fortran.
As suggested here is a simple comparison:
Dim y = 1234.567
Dim x As Double
Dim c = 10000000000.0
Dim ds As Date
Dim df As Date
ds = Now
For i = 1 To c
x = y / 4
Next
df = Now
Console.WriteLine("divide " & (df - ds).ToString)
ds = Now
For i = 1 To c
x = y * 0.25
Next
df = Now
Console.WriteLine("multiply " & (df - ds).ToString)
The output is:
divide 00:00:52.7452740
multiply 00:00:47.2607256
So divide does appear to be slower by about 10%. But this difference is so small that I suspected it to be accidental. Another two runs give:
divide 00:00:45.1280000
multiply 00:00:45.9540000
divide 00:00:45.9895985
multiply 00:00:46.8426838
Suggesting that in fact the optimization is made or that the arithmetic operations are a vanishingly small part of the total time.
In either case it means that I don't need to care which is used.
In fact ildasm shows that the IL uses div in the first loop and mul in the second. So it doesn't make the subsitution after all. Unless the JIT compiler does.

How to choose a range for a loop based upon the answers of a previous loop?

I'm sorry the title is so confusingly worded, but it's hard to condense this problem down to a few words.
I'm trying to find the minimum value of a specific equation. At first I'm looping through the equation, which for our purposes here can be something like y = .245x^3-.67x^2+5x+12. I want to design a loop where the "steps" through the loop get smaller and smaller.
For example, the first time it loops through, it uses a step of 1. I will get about 30 values. What I need help on is how do I Use the three smallest values I receive from this first loop?
Here's an example of the values I might get from the first loop: (I should note this isn't supposed to be actual code at all. It's just a brief description of what's happening)
loop from x = 1 to 8 with step 1
results:
x = 1 -> y = 30
x = 2 -> y = 28
x = 3 -> y = 25
x = 4 -> y = 21
x = 5 -> y = 18
x = 6 -> y = 22
x = 7 -> y = 27
x = 8 -> y = 33
I want something that can detect the lowest three values and create a loop. From theses results, the values of x that get the smallest three results for y are x = 4, 5, and 6.
So my "guess" at this point would be x = 5. To get a better "guess" I'd like a loop that now does:
loop from x = 4 to x = 6 with step .5
I could keep this pattern going until I get an absurdly accurate guess for the minimum value of x.
Does anybody know of a way I can do this? I know the values I'm going to get are going to be able to be modeled by a parabola opening up, so this format will definitely work. I was thinking that the values could be put into a column. It wouldn't be hard to make something that returns the smallest value for y in that column, and the corresponding x-value.
If I'm being too vague, just let me know, and I can answer any questions you might have.
nice question. Here's at least a start for what I think you should do for this:
Sub findMin()
Dim lowest As Integer
Dim middle As Integer
Dim highest As Integer
lowest = 999
middle = 999
hightest = 999
Dim i As Integer
i = 1
Do While i < 9
If (retVal(i) < retVal(lowest)) Then
highest = middle
middle = lowest
lowest = i
Else
If (retVal(i) < retVal(middle)) Then
highest = middle
middle = i
Else
If (retVal(i) < retVal(highest)) Then
highest = i
End If
End If
End If
i = i + 1
Loop
End Sub
Function retVal(num As Integer) As Double
retVal = 0.245 * Math.Sqr(num) * num - 0.67 * Math.Sqr(num) + 5 * num + 12
End Function
What I've done here is set three Integers as your three Min values: lowest, middle, and highest. You loop through the values you're plugging into the formula (here, the retVal function) and comparing the return value of retVal (hence the name) to the values of retVal(lowest), retVal(middle), and retVal(highest), replacing them as necessary. I'm just beginning with VBA so what I've done likely isn't very elegant, but it does at least identify the Integers that result in the lowest values of the function. You may have to play around with the values of lowest, middle, and highest a bit to make it work. I know this isn't EXACTLY what you're looking for, but it's something along the lines of what I think you should do.
There is no trivial way to approach this unless the problem domain is narrowed.
The example polynomial given in fact has no minimum, which is readily determined by observing y'>0 (hence, y is always increasing WRT x).
Given the wide interpretation of
[an] equation, which for our purposes here can be something like y =
.245x^3-.67x^2+5x+12
many conditions need to be checked, even assuming the domain is limited to polynomials.
The polynomial order is significant, and the order determines what conditions are necessary to check for how many solutions are possible, or whether any solution is possible at all.
Without taking this complexity into account, an iterative approach could yield an incorrect solution due to underflow error, or an unfortunate choice of iteration steps or bounds.
I'm not trying to be hard here, I think your idea is neat. In practice it is more complicated than you think.

I need some help on designing a program that will perform a minimization using VBA Excel

How do I use Excel VBA to find the minimum value of an equation?
For example, if I have the equation y = 2x^2 + 14, and I want to make a loop that will slowly increase/decrease the value of x until it can find the smallest value possible for y, and then let me know what the corresponding value of x is, how would I go about doing that?
Is there a method that would work for much more complicated equations?
Thank you for your help!
Edit: more details
I'm trying to design a program that will find a certain constant needed to graph a nuclear decay. This constant is a part of an equation that gets me a calculated decay. I'm comparing this calculated decay against a measured decay. However, the constant changes very slightly as the decay happens, which means I have to use something called a residual-square to find the best constant to use that will fit the entire decay best to make my calculated decay as accurate as possible.
It works by doing (Measured Decay - Calculated Decay) ^2
You do that for the decay at several times, and add them all up. What I need my program to do is to slowly increase and decrease this constant until I can find a minimum value for the value I get when I add up the residual-squared results for all the times using this decay. The residual-squared that has the smallest value has the value of the constant that I want.
I already drafted a program that does all the calculations and such. I'm just not sure how to find this minimum value. I'm sure if a method works for something like y = x^2 + 1, I can adapt it to work for my needs.
Test the output while looping to look for the smallest output result.
Here's an Example:
Sub FormulaLoop()
Dim x As Double
Dim y As Double
Dim yBest As Double
x = 1
y = (x ^ 2) + 14
yBest = y
For x = 2 To 100
y = (x ^ 2) + 14
If y < yBest Then
yBest = y
End If
Next x
MsgBox "The smallest output of y was: " & yBest
End Sub
If you want to loop through all the possibilities of two variables that make up x then I'd recommend looping in this format:
Sub FormulaLoop_v2()
Dim MeasuredDecay As Double
Dim CalculatedDecay As Double
Dim y As Double
Dim yBest As Double
MeasuredDecay = 1
CalculatedDecay = 1
y = ((MeasuredDecay - CalculatedDecay) ^ 2) + 14
yBest = y
For MeasuredDecay = 2 To 100
For CalculatedDecay = 2 To 100
y = ((MeasuredDecay - CalculatedDecay) ^ 2) + 14
If y < yBest Then
yBest = y
End If
Next CalculatedDecay
Next MeasuredDecay
MsgBox "The smallest output of y was: " & yBest
End Sub