INDEX MATCH array formula for 1M rows - sql

I have two sets of data that need to be matched based on IDs and timestamp (+/- 3 units converted from time), and below is the formula that I've been using in Excel to do the matching. Recently I've had to run this formula on up to 1 million rows in Excel, and it takes a REALLY long time, crashes too. I'm wondering if there is a faster way to do this, if not in Excel?
=INDEX(A:A,MATCH(1,--(B:B=E3)*--(ABS(C:C-F3)<=3),0),1)
Data Set 1:
Column A: States
Column B: IDs
Column C: Timestamp
Data Set 2:
Column D: Email Addresses
Column E: IDs
Column F: Timestamp
Column G: =INDEX(A:A,MATCH(1,--(B:B=E3)*--(ABS(C:C-F3)<=3),0),1)
Goal: Append "States" Column to Data Set 2 matched on IDs and Timestamp (+/- 3 time units) match.
Just don't know how to run this formula on very large data sets.

Place the following VBA routines in a standard code module.
Run the MIAB1290() routine.
This emulates the precise outcome of your INDEX/MATCH formula, but it is much more efficient. On my computer, a million records are correctly correlated and the results displayed in Column G in just 10 seconds.
Public Sub MIAB1290()
Dim lastB&, k&, e, f, z, v, w, vErr, r As Range
With [a2]
Set r = .Resize(.Item(.Parent.Rows.Count - .Row + 1, 5).End(xlUp).Row - .Row + 1, .Item(, .Parent.Columns.Count - .Column + 1).End(xlToLeft).Column - .Column + 1)
lastB = .Item(.Parent.Rows.Count - .Row + 1, 2).End(xlUp).Row - .Row + 1
End With
With r
.Worksheet.Sort.SortFields.Clear
.Sort Key1:=.Item(1, 2), Order1:=1, Key2:=.Item(1, 2), Order2:=1, Header:=xlYes
v = .Value2
End With
ReDim w(1 To UBound(v), 1 To 1)
vErr = CVErr(xlErrNA)
For k = 2 To UBound(v)
e = v(k, 5)
f = v(k, 6)
w(k, 1) = vErr
z = BSearch(v, 2, e, 1, lastB)
If z Then
Do While v(z, 2) = e
If Abs(v(z, 3) - f) <= 3 Then
w(k, 1) = v(z, 1)
Exit Do
End If
z = z + 1
If z > UBound(v) Then Exit Do
Loop
End If
Next
r(1, 8).Resize(r.Rows.Count) = w
End Sub
Private Function BSearch(vA, col&, vVal, ByVal first&, ByVal last&)
Dim k&, middle&
While last >= first
middle = (last + first) / 2
Select Case True
Case vVal < vA(middle, col)
last = middle - 1
Case vVal > vA(middle, col)
first = middle + 1
Case Else
k = middle - 1
Do While vA(k, col) = vA(middle, col)
k = k - 1
If k > last Then Exit Do
Loop
BSearch = k + 1
Exit Function
End Select
Wend
BSearch = 0
End Function

Excel isn't really made for large ammount of data, and probably no code will do it faster for you then a builtin excel formula. In this case, I would sugest you to give a try to the PowerPivot addin, and see how it handles the situation.

Related

Generate "n" random numbers between a and b to reach desired average in m rows

Suppose in column Z with 200 rows, are my optimal averages.
Now I want a macro that generates n random integers between a and b inclusive (n <= 20) so that difference between the average of numbers generated with optimal average is in (-0.15,+0.15).
Example:
Z1:optimal average1=5.5
Z2:optimal average2=5.3
Z200:optimal average200=6.3
n=8
a=1; b=10
numbers of generated:
A1:H1)5-9-4-3-7-4-9-3
A2:H2)10-7-3-2-5-4-3-9
.
.
.
A200:H200)4-8-9-6-6-6-10-2
Here is a hit-or-miss approach (which is often the only viable way to get random numbers which satisfy additional constraints in an unbiased way):
Function RandIntVect(n As Long, a As Long, b As Long, mean As Double, tol As Double, Optional maxTries As Long = 1000) As Variant
'Uses a hit-or-miss approach to generate a vector of n random ints in a,b inclusive whose mean is
'within the tolerance tol of the given target mean
'The function raises an error if maxTries misses occur without a hit
Dim sum As Long, i As Long, j As Long
Dim lowTarget As Double, highTarget As Double 'targets for *sums*
Dim vect As Variant
lowTarget = n * (mean - tol)
highTarget = n * (mean + tol)
For i = 1 To maxTries
ReDim vect(1 To n)
sum = 0
j = 0
Do While j < n And sum + a * (n - j) <= highTarget And sum + b * (n - j) >= lowTarget
j = j + 1
vect(j) = Application.WorksheetFunction.RandBetween(a, b)
sum = sum + vect(j)
Loop
If j = n And lowTarget <= sum And sum <= highTarget Then
'Debug.Print i 'uncomment this line to see how many tries required
RandIntVect = vect
Exit Function
End If
Next i
'error if we get to here
RandIntVect = CVErr(xlErrValue)
End Function
This could be used as a worksheet array formula. The target means were in column I and in A2:H2 I entered =RandIntVect(8,1,10,I2,0.15) (with ctrl+shift+enter as an array formula) and then copied down:
Note that array formulas are volatile, so these numbers would be recalculated every time the worksheet is. You could use the function in VBA to place the numbers directly in the ranges rather than using the function as a worksheet formula. Something like:
Sub test()
Dim i As Long
For i = 1 To 3
Range(Cells(i + 1, 1), Cells(i + 1, 8)).Value = RandIntVect(8, 1, 10, Cells(i + 1, 9).Value, 0.15)
Next i
End Sub
enter image description here
The difference between two means is not within range (0.15+, 0.15-)

How come VBA excel keeps running the statement inside a conditional statement even if it is false?

I am trying to create an automatic filling of the payroll spreadsheet I created. However, no matter how much I try it the value of z = 1 all the time even if the logic returns FALSE values (I validated this using MsgBox).
My goal in this code is to check whether there is already a record in another sheet. If there isn't it will automatically add the record with the appropriate details based on the available data.
Below is the full VBA code (Note code is incomplete so it is a bit unpolished still):
Option Explicit
Public p As Long
Sub test()
Dim Total_rows_PR As Long
Dim Total_rows_DTR As Long
Total_rows_PR = Worksheets("Payroll - Regular").Range("B" & Rows.Count).End(xlUp).Row
Total_rows_DTR = Worksheets("DTR").Range("B" & Rows.Count).End(xlUp).Row
Dim q As Long
Dim j As Long
Dim z As Long
For j = 1 To Total_rows_DTR - 1
For q = 1 To Total_rows_PR + p - 2
If Worksheets("DTR").Cells(1 + j, 33) = Worksheets("Payroll - Regular").Cells(2 + q, 1) Then
If Worksheets("DTR").Cells(1 + j, 34) = Worksheets("Payroll - Regular").Cells(2 + q, 2) Then
If Worksheets("DTR").Cells(1 + j, 2) = Worksheets("Payroll - Regular").Cells(2 + q, 3) Then
z = 1
Exit For
End If
End If
End If
Next q
' Below is where the assignment should happen but only returns a blank cell
If z = 0 Then Worksheets("Payroll - Regular").Cells(Total_rows_PR + 1 + p, 1) = Worksheets("DTR").Cells(1 + j, 33)
If z = 0 Then Worksheets("Payroll - Regular").Cells(Total_rows_PR + 1 + p, 2) = Worksheets("DTR").Cells(1 + j, 34)
If z = 0 Then Worksheets("Payroll - Regular").Cells(Total_rows_PR + 1 + p, 3) = Worksheets("DTR").Cells(1 + j, 2)
If z = 0 Then p = p + 1
z = 0
Next j
End Sub
Update: I realized that even if the conditions are not being a met in the first portion of If-Then loops, the value of z is set to 1 for no reason. This is the reason why it won't assign values. However, I do not see why it keeps assigning to 1.
Update#2: #ShaiRado
So the first image is where data is encoded (not shown in image because it is in the leftmost part of the spreadsheet, but basically it inputs the name of the person, date, and the daily time record (DTR) of the person). When the data is encoded, it will automatically indicate what month and year it is based on the helper column AG month and column AH for year. Somewhere in the start of the same worksheet at column B is where the name of the person is. All of these 3 will be used.
This second image is where the summaries are computed. If there is an entry for a specific person at a certain month and year and it is not located in this worksheet, it will automatically fill in that person's name as well as the month and year. Basically that's what the code i'm trying to create does.
The output is a fully automated spreadsheet that only requires data entry in the DTR sheet. All computations already have their corresponding formulas.
First: You have a really strange way of writing your if-statements.
I think what you mean is
For q = 1 To Total_rows_PR + p
If Worksheets("DTR").Cells(1 + j, 33) = Worksheets("Payroll - Regular").Cells(2 + q, 1) _
And Worksheets("DTR").Cells(1 + j, 34) = Worksheets("Payroll - Regular").Cells(2 + q, 2) _
And Worksheets("DTR").Cells(1 + j, 2) = Worksheets("Payroll - Regular").Cells(2 + q, 3) Then
z = 1
Exit For ' Once found, z stays 1 so you don't have to continue the inner loop.
End If
Next q
Second: I am not sure what exactly you want to achieve, but as far as I understand, your problem is that you are looping to far. At the last iteration of the outer loop, your accessing row 1 + j of sheet DTR which is empty at that time, and you are accessing row 2 + q (which is the same as 2 + Total_rows_PR + p) - also empty (and comparing the two emtpy lines sets z to 1).
A variable is never set for no reason. Is is maybe set and you don't understand the reason.
Debug your code step by step, watch where it behaves different as you expect and find the reason why it does what is does.

Excel VBA: "Too many different cell formats" - Is there a way to remove or clear these formats in a Macro?

So, I made a fun and simple macro that randomly selects R, G, and B values until it uses every possible combination (skipping repeats), and setting the color values of a 10x10 square with each new color.
The only problem is that I have run into the limit for the number of cell formats. Microsoft says that the limit should be around 64000, but I found it to be exactly 65429 on a blank workbook in Excel 2013.
I've included a clear format code, but it seems to have no effect:
Cells(X, Y).ClearFormats
Microsoft lists some resolutions, but 3 out of the 4 of them are essentially "Don't make too many formats", and the 4th format is to use a third party application.
Is there really nothing that can be done in VBA?
A1:J10 will print a new color
K1 will print the percentage to completion
L1 will print the number of colors used
M1 will print the number of times a color combination is repeated
Dim CA(255, 255, 255) As Integer
Dim CC As Long
Dim RC As Long
Dim R As Integer
Dim G As Integer
Dim B As Integer
Dim X As Integer
Dim Y As Integer
CC = 0
RC = 0
X = 1
Y = 1
Do While ColorCount < 16777216
R = ((Rnd * 256) - 0.5)
G = ((Rnd * 256) - 0.5)
B = ((Rnd * 256) - 0.5)
If CA(R, G, B) <> 1 Then
CA(R, G, B) = 1
'Step down to the next row
'If at the 10th row, jump back to the first and move to the next column
If X < 10 Then
X = X + 1
Else
X = 1
If Y < 10 Then
Y = Y + 1
Else
Y = 1
End If
End If
Cells(X, Y).ClearFormats 'doesn't do what I hope :(
Cells(X, Y).Interior.Color = RGB(R, G, B)
CC = CC + 1
Cells(1, 11).Value = (CC / 16777216) * 100
Cells(1, 12).Value = CC
Else
RC = RC + 1
Cells(1, 13).Value = RC
End If
Loop
There are several ways to resolve this issue, but the cleanest and easiest method is to remove all extra styles (I have seen workbooks with 9000+ styles )
With the following simple VBA code you can remove all non-builtin styles and in the vast majority of cases this fixes the error.
Sub removeStyles()
Dim li as long
On Error Resume Next
With ActiveWorkbook
For li = .Styles.Count To 1 Step -1
If Not .Styles(li).BuiltIn Then
.Styles(li).Delete
End If
Next
End With
End Sub

vba array element removal

j = LBound(arrayTime)
Do Until j = UBound(arrayTime)
j = j + 1
b = b + 1
cnc = b + r
MsgBox cnc
If cnc > 7 Then
b = 0
r = 0
cnc = b + r
End If
numMins = Sheet5.Cells(cnc + 3, 2) - arrayTime(j)
If numMins < 0 Then
g = g + 1
ReArrangeArray arrayTime, j
'ReDim Preserve arrayTime(numrows - 1 + g)
'arrayTime(numrows - 1 + g) = arrayTime(j)
'MsgBox (arrayTime(numrows - 1 + g))
Else
Sheet5.Cells(cnc + 3, 2) = numMins
End If
Loop
If the if statement is true I want to be able to put the array value at the end of the array and remove that value from its current spot. As the code is, it just adds it to the end and increases the size of the array from 12 to 13. How can I get the array to remain size 12 and still place the value at the end of the array and then remove it from its original position? I do not want to touch the array values in front. Just want to take that value and move it to the end.
For instance
array(1,2,3,4,5)
If statement
j on third loop.
array(j)=3
end array should be
array(1,2,4,5,3)
You could use a helper Sub like this one:
Sub ReArrangeArray(inputArray as Variant, indexToSwap as long)
Dim I As Long
Dim tempVal As Variant
If indexToSwap >= LBound(inputArray) And indexToSwap < UBound(inputArray) Then
tempVal = inputArray(indexToSwap)
For I = indexToSwap To UBound(inputArray) - 1
inputArray(i) = inputArray(i + 1)
Next I
InputArray(UBound(inputArray)) = tempVal
End If
End Sub
To be called by your main Sub as follows:
ReArrangeArray arrayTime, j

Code either overloads memory or wont compile VBA

Trying to write a macro to insert a hyphen at specific points in a text string depending on how long the string is or delete all text after said point.
i.e
- if 6 characters, insert a hyphen between char 4+5 or delete all text after char 4
- if 7 characters, insert a hyphen between char 5+6 or delete all text after char 5
Ideally i would love to be able to truncate the string at that point rather than hyphenate the text but i couldn't get my head around how to make it work so i decided to hyphen and then just run a find and replace '-*' to remove the unwanted characters. Can get this working on small sample sets 100-300 cells but i need the code to be able to go through workbooks with 70,000+ cells. I've tried tweaking the code to stop the memory issue but now i can't seem to get it to work.
Sub Postcodesplitter()
Dim b As Range, w As Long, c As Range, x As Long, d As Range, y As Long
For Each b In Selection
w = Len(b)
If w = 8 And InStr(b, "-") = 0 Then b = Application.WorksheetFunction.Replace(b, 15 - w, 0, "-")
For Each c In Selection
x = Len(c)
If x = 7 And InStr(c, "-") = 0 Then c = Application.WorksheetFunction.Replace(c, 13 - x, 0, "-")
For Each d In Selection
y = Len(d)
If y = 6 And InStr(d, "-") = 0 Then d = Application.WorksheetFunction.Replace(d, 11 - y, 0, "-")
Next
Next
Next
End Sub
That's the original code i put together, but it caused memory issues over 300 target cells. I'm a pretty bad coder even at the best of times but with some advice from a friend i tried this instead.
Sub Postcodesplitter()
Dim b As Range, x As Long
If (Len(x) = 6) Then
b = Application.WorksheetFunction.Replace(b, 11 - x, 0, "-")
Else
If (Len(x) = 7) Then
b = Application.WorksheetFunction.Replace(b, 13 - x, 0, "-")
Else
If (Len(x) = 8) Then b = Application.WorksheetFunction.Replace(b, 15 - x, 0, "-")
End Sub
But this just throws out errors when compiling. I feel like im missing something really simple.
Any tips?
It looks as though you want to truncate to two less than the existing number of characters, if that number is 6-8? If so, something like this:
Sub Postcodesplitter()
Dim data
Dim x as Long
Dim y as Long
data = Selection.Value
For x = 1 to ubound(data,1)
for y = 1 to ubound(data, 2)
Select Case Len(data(x, y))
Case 6 to 8
data(x, y) = left(data(x, y), len(data(x, y)) - 2)
end select
next y
next x
selection.value = data
End Sub