Working with Excel ranges and arrays - vb.net

In VBA, I can easily pull in an sheet\range into an array, manipulate, then pass back to the sheet\range. I'm having trouble doing this in VB.Net though.
Here's my code.
Rng = .Range("a4", .Cells(.UsedRange.Rows.Count, .UsedRange.Columns.Count))
Dim SheetArray(,) As Object = DirectCast(Rng.Value(Excel.XlRangeValueDataType.xlRangeValueDefault), Object(,))
For X As Integer = 0 To SheetArray.GetUpperBound(0)
If IsNothing(SheetArray(X, 0)) Then Exit For
SheetArray(X, 6) = SheetArray(X, 3)
SheetArray(X, 7) = CDbl(SheetArray(X, 3).ToString) - CDbl(SheetArray(X, 1).ToString) - _
CDbl(SheetArray(X, 7).ToString)
For Y As Integer = 0 To 3
SheetArray(X, Y * 2 + 1) = Math.Round(CDbl(SheetArray(X, Y * 2 + 1).ToString), 3)
Next
If Math.Abs(CDbl(SheetArray(X, 7).ToString)) > 0.1 Then _
.Range(.Cells(X + 1, 1), .Cells(X + 1, 8)).Font.Color = -16776961
Next
I'm getting an error on the first If IsNothing(SheetArray(X, 0)) Then Exit For
line. It is telling me index is out of bounds of the array. Any idea why? The SheetArray object contains the data, but I just am not sure how to get to it.

In the For you have to loop from 0 to Count - 1:
For X As Integer = 0 To SheetArray.GetUpperBound(0) - 1
'...
Next
That will fix your problem.

Related

Min function not working properly in VBA

I'm working on a macro right now and it's producing weird results. The part that is specifically not working is a Min function.
a1RowTemp1 = a1Row
For i = 0 To diff1
intercept = Application.WorksheetFunction.intercept(a(),c())
LinReg1 = (slope * Cells(a1RowTemp1, 1)) + intercept
difference1 = Worksheets("GF9").Cells(a1RowTemp1, 2) - LinReg1
e(i) = difference1
a1RowTemp1 = a1RowTemp1 + 1
Next i
a2RowTemp2 = a2Row
For i = 0 To diff2
intercept2 = Application.WorksheetFunction.intercept(b(), d())
LinReg2 = (slope2 * Cells(a2RowTemp2, 1)) + intercept2
difference2 = Worksheets("GF9").Cells(a2RowTemp2, 2) - LinReg2
f(i) = difference2
a2RowTemp2 = a2RowTemp2 + 1
Next i
Worksheets("Chart").Cells(currentRow, 12) = Application.Max(e())
Worksheets("Chart").Cells(currentRow, 13) = Application.Min(e())
Worksheets("Chart").Cells(currentRow, 25) = Application.Max(f())
Worksheets("Chart").Cells(currentRow, 26) = Application.Min(f())
In the bottom of the code it stores the difference1 and difference2 values in arrays e() and f(). When I use the functions max/min the macro only outputs the correct values for the max functions. I suspect this has something to do with my incorrectly using the arrays.
If e is one dimensional array you should be able to write
Application.WorksheetFunction.Min(e)
Example:
Option Explicit
Public Sub TEST()
Dim e()
e = Array(3, 4, 2, 5)
MsgBox Application.WorksheetFunction.Min(e)
End Sub
If you are still getting the wrong values you need to step though with F8 and check the values being assigned to e in the loop are the expected ones.
You've omitted the declaration and dimensioning of the e and f array. This was an important factor in your problem.
When you declared your e and f as long or double arrays, they were instantiated with zero values.
Dim v() As Double, i As Long
ReDim v(5) '<~~ all zero values
For i = LBound(v) To UBound(v) - 1 '<~~fill all but the last one
v(i) = i + 10
Next i
Debug.Print Application.Min(v) 'zero as v(5) is zero
If you want to ignore array elements that you have not assigned values to, declare the arrays as a variant type.
Dim v() As Variant, i As Long
ReDim v(5) '<~~ all empty values
For i = LBound(v) To UBound(v) - 1 '<~~fill all but the last one
v(i) = i + 10
Next i
Debug.Print Application.Min(v) '10 as v(5) is empty and not considered in Min
An unassigned variant array element is considered empty and is not used in the Min calculation.
Alternately, use one of two methods to remove unused array elements.
'...
'redimension before the loop to the known ubound
redim e(diff1)
For i = 0 To diff1
intercept = Application.WorksheetFunction.intercept(a(),c())
LinReg1 = (slope * Cells(a1RowTemp1, 1)) + intercept
difference1 = Worksheets("GF9").Cells(a1RowTemp1, 2) - LinReg1
e(i) = difference1
a1RowTemp1 = a1RowTemp1 + 1
Next i
'...
'or redimension after the loop with Preserve
For i = 0 To diff2
intercept2 = Application.WorksheetFunction.intercept(b(), d())
LinReg2 = (slope2 * Cells(a2RowTemp2, 1)) + intercept2
difference2 = Worksheets("GF9").Cells(a2RowTemp2, 2) - LinReg2
f(i) = difference2
a2RowTemp2 = a2RowTemp2 + 1
Next i
'i exits with a value 1 greater than diff2
redim preserve f(i-1)
'...

vba array element removal

j = LBound(arrayTime)
Do Until j = UBound(arrayTime)
j = j + 1
b = b + 1
cnc = b + r
MsgBox cnc
If cnc > 7 Then
b = 0
r = 0
cnc = b + r
End If
numMins = Sheet5.Cells(cnc + 3, 2) - arrayTime(j)
If numMins < 0 Then
g = g + 1
ReArrangeArray arrayTime, j
'ReDim Preserve arrayTime(numrows - 1 + g)
'arrayTime(numrows - 1 + g) = arrayTime(j)
'MsgBox (arrayTime(numrows - 1 + g))
Else
Sheet5.Cells(cnc + 3, 2) = numMins
End If
Loop
If the if statement is true I want to be able to put the array value at the end of the array and remove that value from its current spot. As the code is, it just adds it to the end and increases the size of the array from 12 to 13. How can I get the array to remain size 12 and still place the value at the end of the array and then remove it from its original position? I do not want to touch the array values in front. Just want to take that value and move it to the end.
For instance
array(1,2,3,4,5)
If statement
j on third loop.
array(j)=3
end array should be
array(1,2,4,5,3)
You could use a helper Sub like this one:
Sub ReArrangeArray(inputArray as Variant, indexToSwap as long)
Dim I As Long
Dim tempVal As Variant
If indexToSwap >= LBound(inputArray) And indexToSwap < UBound(inputArray) Then
tempVal = inputArray(indexToSwap)
For I = indexToSwap To UBound(inputArray) - 1
inputArray(i) = inputArray(i + 1)
Next I
InputArray(UBound(inputArray)) = tempVal
End If
End Sub
To be called by your main Sub as follows:
ReArrangeArray arrayTime, j

INDEX MATCH array formula for 1M rows

I have two sets of data that need to be matched based on IDs and timestamp (+/- 3 units converted from time), and below is the formula that I've been using in Excel to do the matching. Recently I've had to run this formula on up to 1 million rows in Excel, and it takes a REALLY long time, crashes too. I'm wondering if there is a faster way to do this, if not in Excel?
=INDEX(A:A,MATCH(1,--(B:B=E3)*--(ABS(C:C-F3)<=3),0),1)
Data Set 1:
Column A: States
Column B: IDs
Column C: Timestamp
Data Set 2:
Column D: Email Addresses
Column E: IDs
Column F: Timestamp
Column G: =INDEX(A:A,MATCH(1,--(B:B=E3)*--(ABS(C:C-F3)<=3),0),1)
Goal: Append "States" Column to Data Set 2 matched on IDs and Timestamp (+/- 3 time units) match.
Just don't know how to run this formula on very large data sets.
Place the following VBA routines in a standard code module.
Run the MIAB1290() routine.
This emulates the precise outcome of your INDEX/MATCH formula, but it is much more efficient. On my computer, a million records are correctly correlated and the results displayed in Column G in just 10 seconds.
Public Sub MIAB1290()
Dim lastB&, k&, e, f, z, v, w, vErr, r As Range
With [a2]
Set r = .Resize(.Item(.Parent.Rows.Count - .Row + 1, 5).End(xlUp).Row - .Row + 1, .Item(, .Parent.Columns.Count - .Column + 1).End(xlToLeft).Column - .Column + 1)
lastB = .Item(.Parent.Rows.Count - .Row + 1, 2).End(xlUp).Row - .Row + 1
End With
With r
.Worksheet.Sort.SortFields.Clear
.Sort Key1:=.Item(1, 2), Order1:=1, Key2:=.Item(1, 2), Order2:=1, Header:=xlYes
v = .Value2
End With
ReDim w(1 To UBound(v), 1 To 1)
vErr = CVErr(xlErrNA)
For k = 2 To UBound(v)
e = v(k, 5)
f = v(k, 6)
w(k, 1) = vErr
z = BSearch(v, 2, e, 1, lastB)
If z Then
Do While v(z, 2) = e
If Abs(v(z, 3) - f) <= 3 Then
w(k, 1) = v(z, 1)
Exit Do
End If
z = z + 1
If z > UBound(v) Then Exit Do
Loop
End If
Next
r(1, 8).Resize(r.Rows.Count) = w
End Sub
Private Function BSearch(vA, col&, vVal, ByVal first&, ByVal last&)
Dim k&, middle&
While last >= first
middle = (last + first) / 2
Select Case True
Case vVal < vA(middle, col)
last = middle - 1
Case vVal > vA(middle, col)
first = middle + 1
Case Else
k = middle - 1
Do While vA(k, col) = vA(middle, col)
k = k - 1
If k > last Then Exit Do
Loop
BSearch = k + 1
Exit Function
End Select
Wend
BSearch = 0
End Function
Excel isn't really made for large ammount of data, and probably no code will do it faster for you then a builtin excel formula. In this case, I would sugest you to give a try to the PowerPivot addin, and see how it handles the situation.

My MInverse function will not work in VBA

EDIT: I fixed it, the ReDim and all starts at 0 and not 1 so I had a cell that was empty which wasn't supposed to be there!
It now works, thanks for the help!
I'm trying to take a matrix and invert it, but for some reason I get this error:
Unable to get the MInverse property of the WorksheetFunction class.
My (relevant) code is as following:
Dim covar() As Variant
ReDim covar(UBound(assetNames), UBound(assetNames))
Dim corr() As Double
ReDim corr(UBound(assetNames), UBound(assetNames))
Dim covarTmp As Double
For i = 0 To UBound(assetNames) - 1
For j = 0 To UBound(assetNames) - 1
covarTmp = 0
For t = 1 To wantedT
covarTmp = covarTmp + (Log((prices(histAmount + 1 - t, i + 1)) / (prices(histAmount - t, i + 1))) - mu(i) * dt) * (Log((prices(histAmount + 1 - t, j + 1)) / (prices(histAmount - t, j + 1))) - mu(j) * dt)
Next t
covar(i, j) = covarTmp * (1 / ((wantedT - 1) * dt))
corr(i, j) = covar(i, j) / (sigma(i) * sigma(j))
Next j
Next i
Dim covarInv() As Variant
ReDim covarInv(UBound(assetNames), UBound(assetNames))
'ReDim covar(1 To UBound(assetNames), 1 To UBound(assetNames))
covarInv = Application.WorksheetFunction.MInverse(covar)
This last row is where the error occurs.
I've tried many things, having covar and covarInv dim as double, variant etc. Different ReDims on covar and covarInv.
You don't say what version of Excel you are using, but with Excel 2010 there seems to be a Minverse maximum limit of 200 * 200 (for Excel 2003 its probably around 80 * 80): How many asset names do you have?

Performance difference between two implementations of the same algorithm

I'm working on an application that will require the Levenshtein algorithm to calculate the similarity of two strings.
Along time ago I adapted a C# version (which can be easily found floating around in the internet) to VB.NET and it looks like this:
Public Function Levenshtein1(s1 As String, s2 As String) As Double
Dim n As Integer = s1.Length
Dim m As Integer = s2.Length
Dim d(n, m) As Integer
Dim cost As Integer
Dim s1c As Char
For i = 1 To n
d(i, 0) = i
Next
For j = 1 To m
d(0, j) = j
Next
For i = 1 To n
s1c = s1(i - 1)
For j = 1 To m
If s1c = s2(j - 1) Then
cost = 0
Else
cost = 1
End If
d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost)
Next
Next
Return (1.0 - (d(n, m) / Math.Max(n, m))) * 100
End Function
Then, trying to tweak it and improve its performance, I ended with version:
Public Function Levenshtein2(s1 As String, s2 As String) As Double
Dim n As Integer = s1.Length
Dim m As Integer = s2.Length
Dim d(n, m) As Integer
Dim s1c As Char
Dim cost As Integer
For i = 1 To n
d(i, 0) = i
s1c = s1(i - 1)
For j = 1 To m
d(0, j) = j
If s1c = s2(j - 1) Then
cost = 0
Else
cost = 1
End If
d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost)
Next
Next
Return (1.0 - (d(n, m) / Math.Max(n, m))) * 100
End Function
Basically, I thought that the array of distances d(,) could be initialized inside of the main for cycles, instead of requiring two initial (and additional) cycles. I really thought this would be a huge improvement... unfortunately, not only does not improve over the original, it actually runs slower!
I have already tried to analyze both versions by looking at the generated IL code but I just can't understand it.
So, I was hoping that someone could shed some light on this issue and explain why the second version (even when it has fewer for cycles) runs slower than the original?
NOTE: The time difference is about 0.15 nano seconds. This don't look like much but when you have to check thousands of millions of strings... the difference becomes quite notable.
It's because of this:
For i = 1 To n
d(i, 0) = i
s1c = s1(i - 1)
For j = 1 To m
d(0, j) = j 'THIS LINE HERE
You were just initializing this array at the beginning, but now you are initializing it n times. There is a cost involved with accessing memory in an array like this, and you are doing it an extra n times now. You could change the line to say: If i = 1 Then d(0, j) = j. However, in my tests, you still basically end up with a slightly slower version than the original. And that again makes sense. You're performing this if statement n*m times. Again there is some cost. Moving it out like it is in the original version is a lot cheaper It ends up being O(n). Since the overall algorithm is O(n*m), any step you can move out into an O(n) step is going to be a win.
You can split the following line:
d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost)
as follows:
tmp = Math.Min(d(i - 1, j), d(i, j - 1)) + 1
d(i, j) = Math.Min(tmp, d(i - 1, j - 1) + cost)
It this way you avoid one summation
Further more you can place the last "min" comparison inside the if part and avoid assigning cost:
tmp = Math.Min(d(i - 1, j), d(i, j - 1)) + 1
If s1c = s2(j - 1) Then
d(i, j) = Math.Min(tmp, d(i - 1, j - 1))
Else
d(i, j) = Math.Min(tmp, d(i - 1, j - 1)+1)
End If
So you save a summation when s1c = s2(j - 1)
Not the direct answer to your question, but for faster performance you should consider either using a jagged array (array of arrays) instead of a multidimensional array. What are the differences between a multidimensional array and an array of arrays in C#? and Why are multi-dimensional arrays in .NET slower than normal arrays?
You will see that the jagged array has a code size of 7 as opposed to 10 with multidimensional arrays.
The code below is uses a jagged array, single dimensional array
Public Function Levenshtein3(s1 As String, s2 As String) As Double
Dim n As Integer = s1.Length
Dim m As Integer = s2.Length
Dim d()() As Integer = New Integer(n)() {}
Dim cost As Integer
Dim s1c As Char
For i = 0 To n
d(i) = New Integer(m) {}
Next
For j = 1 To m
d(0)(j) = j
Next
For i = 1 To n
d(i)(0) = i
s1c = s1(i - 1)
For j = 1 To m
If s1c = s2(j - 1) Then
cost = 0
Else
cost = 1
End If
d(i)(j) = Math.Min(Math.Min(d(i - 1)(j) + 1, d(i)(j - 1) + 1), d(i - 1)(j - 1) + cost)
Next
Next
Return (1.0 - (d(n)(m) / Math.Max(n, m))) * 100
End Function