Need Fastest Search Method in Excel VBA - vba

Consider a scenario, I have 2 columns (Column "A" & Column "B").
Column A has around 130000 rows/Strings
Column B has around 10000 rows/Strings
I would like to search each string of Column "B" from Column "A".
As you can see the volume of data is very high. I have already tried with Range.Find() method. But it's taking lot of time to complete. I am searching for a method/way that will give me result in very less turnaround time.
* Some more Clarification on my requirement *
(1) Column A & B contains string values, NOT NUMBERS. And the string can be very large
(2) For each cell in column "B", There can be many occurrence in column "A"
(3) I would like to fetch all the occurrence of column "B" in column "A" with Row Number
(4) For a string present in column "B". It can be found as a Substring of any cell in column "A"
Download file link - wikisend.com/download/431054/StackOverFlow_Sample.xlsx *
Any Suggestions ?
Feel free incase you need any extra details to solve above problem !

Try this.
This took 3 seconds for 130000 rows in Col A and 10000 rows in Col B. The output is generated in Col C.
NOTE: I have taken the worst case scenario where all 10000 values in Col B are present in Col A
This is how my data looks.
Sub Sample()
Debug.Print Now
Dim col As New Collection
Dim ws As Worksheet
Dim i As Long
Set ws = ThisWorkbook.Sheets("Sheet1")
Application.ScreenUpdating = False
With ws
.Range("C1:C10000").Value = "No"
For i = 1 To 130000
On Error Resume Next
col.Add .Range("A" & i).Value, CStr(.Range("A" & i).Value)
On Error GoTo 0
Next i
On Error Resume Next
For i = 1 To 10000
col.Add .Range("B" & i).Value, CStr(.Range("B" & i).Value)
If Err.Number <> 0 Then .Range("C" & i).Value = "Yes"
Err.Clear
Next i
End With
Application.ScreenUpdating = True
Debug.Print Now
End Sub
And this was the result

NEW
Column A 130000 100-character strings, Column B 10000 30-character strings, 27 minutes.
Column C is populated with row locations of occurrences of Column B string.
Column D is populated with number of occurrences of Column B string.
Public Sub searchcells()
Dim arrA(1 To 130000) As String, arrB(1 To 10000) As String, t As Date, nLen As Integer
t = Now
Me.Range("c:d") = ""
For i = 1 To 130000
arrA(i) = Me.Cells(i, 1)
Next
For i = 1 To 10000
arrB(i) = Me.Cells(i, 2)
Next
For i = 1 To 130000
nLen = Len(arrA(i))
For j = 1 To 10000
If InStrRev(arrA(i), arrB(j), nLen - Len(arrB(j)) + 1) > 0 Then Me.Cells(j, 4) = Me.Cells(j, 4) + 1: Me.Cells(j, 3) = Me.Cells(j, 3) & i & "; "
Next
Me.Cells(1, 5) = i
Next
Debug.Print CDbl(Now - t) * 24 * 3600 & " seconds"
End Sub
The cells can be populated easily with the following, changing i and j limits for the desired number of strings and string lengths in each section.
Public Sub fillcells()
Dim temp As String
Randomize
For i = 1 To 13000
temp = ""
For j = 1 To 100
temp = temp & Chr(70 + Int(10 * Rnd()))
Next
Me.Cells(i, 1) = temp
Next
For i = 1 To 10000
temp = ""
For j = 1 To 30
temp = temp & Chr(70 + Int(10 * Rnd()))
Next
Me.Cells(i, 2) = temp
Next
End Sub
I am unable to download your spreadsheet at work, so disregard this if it missed the mark.

Related

Auto scheduling

I am trying to make an auto scheduling program with an excel.
For example, each number is certain job assigned to the person given day.
1/2 1/3 1/4 1/5
Tom 1 2 2 ?
Justin 2 3 1 ?
Mary 3 3 ?
Sam 1 ?
Check O O X ? ## check is like =if(b2=c2,"O","X")
The things I want to make sure is every person is given a different job from yesterday.
My idea
while
randomly distribute jobs for 1/5
wend CheckCell = "O"
But I found that checking cell in the vba script doesn't work - the cell is not updated in each while loop.
Could you give me a little pointer for these kinds of program? Because I am new to vbaScript, any kinds of help would be appreciated.
Using VBA, I'm sure there are better ways to do this, but this will check the values from the penultimate column against values from last column and if they match it will write "O" to under the last column, else it will write "X":
Sub foo()
Dim ws As Worksheet: Set ws = Sheets("Sheet1")
'declare and set your worksheet, amend as required
LastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
'get the last row with data on Column A
LastCol = ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column
counter = 0 'set counter
For i = 2 To LastRow 'loop through penultimate column and add values to array
If ws.Cells(i, LastCol - 1).Value <> "" Then
Values = Values & ws.Cells(i, LastCol - 1) & ","
End If
Next i
Values = Left(Values, Len(Values) - 1)
Values = Split(Values, ",") 'split values into array
For i = 2 To LastRow 'loop through last column and add values to array
If ws.Cells(i, LastCol).Value <> "" Then
ValuesCheck = ValuesCheck & ws.Cells(i, LastCol) & ","
End If
Next i
ValuesCheck = Left(ValuesCheck, Len(ValuesCheck) - 1)
ValuesCheck = Split(ValuesCheck, ",")
For y = LBound(Values) To UBound(Values) 'loop through both arrays to find all values match
For x = LBound(ValuesCheck) To UBound(ValuesCheck)
If Values(y) = ValuesCheck(x) Then counter = counter + 1
Next x
Next y
If counter = UBound(Values) + 1 Then 'if values match
ws.Cells(LastRow + 1, LastCol).Value = "O"
Else 'else write X
ws.Cells(LastRow + 1, LastCol).Value = "X"
End If
End Sub
just to clarify are you looking to implement the random number in the vba or the check.
To do the check the best way would be to set the area as a range and then check each using the cells(r,c) code, like below
Sub checker()
Dim rng As Range
Dim r As Integer, c As Integer
Set rng = Selection
For r = 1 To rng.Rows.Count
For c = 1 To rng.Columns.Count
If rng.Cells(r, c) = rng.Cells(r, c + 1) Then
rng.Cells(r, c).Interior.Color = RGB(255, 0, 0)
End If
Next c
Next r
End Sub
this macro with check the text you have selected for the issue and change the cell red if it matches the value to the right.
To make it work for you change set rng = selection to your range and change the rng.Cells(r, c).Interior.Color = RGB(255, 0, 0) to the action you want
A sligthly different approach than the other answers.
Add this function:
Function PickJob(AvailableJobs As String, AvoidJob As String)
Dim MaxTries As Integer
Dim RandomJob As String
Dim Jobs() As String
Jobs = Split(AvailableJobs, ",")
MaxTries = 100
Do
MaxTries = MaxTries - 1
If MaxTries = 0 Then
MsgBox "Could find fitting job"
End
End If
RandomJob = Jobs(Int((1 + UBound(Jobs)) * Rnd()))
Loop Until RandomJob <> AvoidJob
PickJob = RandomJob
End Function
And put this formula in your sheet
=PickJob("1,2,3",D2)
where D2 points to is the previous job

Excel VBA - delete / Copy a record from a sheet to another

Lets Say I have two sheets, Sheet 1 and Sheet 2
I have four columns in sheet1 and three similar column headers in sheet 2.
A record from sheet 1 gets deleted if it is not found in sheet2.
A record from sheet 2 is copied into sheet 1 if it is not already there in sheet 1.
In Sheet1, I have the following columns
Name Age Gender Group
I 25 M A1
A 24 M B1
M 23 M C1
E 23 M D1
In Sheet 2, I have the following columns
Name Age Gender
F 25 M
A 24 M
M 23 M
And my output needs to be in sheet1 :
Name Age Gender Group
A 24 M B1
M 23 M C1
F 25 M
Note : Each Record is Removed / Copied every time as per the combination of Name, Age and Gender and not just the Name alone.
I created a Concatenated column using VBA and now lost for ideas.
For j = 2 To lastrow
strA = Sheets(TabName).Range("A" & j).Value
strB = Sheets(TabName).Range("B" & j).Value
StrC = Sheets(TabName).Range("C" & j).Value
Range(CombinedKeyColLet & j).Value = Application.WorksheetFunction.Concat(strA & strB & StrC)
Cells.Select
Selection.Columns.AutoFit
Next
'Copy or Delete code
'--------------------------------'
Here is the code, that I am trying with On error method
CombinedKeyCol = WorksheetFunction.Match("CombinedKey", Sheets(TabName1).Rows(1), 0)
CombinedKeyColLet = GetColumnLetter(CombinedKeyCol)
For i = lastrow To 2 Step -1
Sheets(TabName2).Activate
CombinedKeyVal = Range(CombinedKeyColLet & i).Value
On Error GoTo Jumpdelete
Present = WorksheetFunction.Match(CombinedKeyVal, Sheets(TabName1).Columns(6), 0)
If Present <> "" Then
GoTo Jumpdontdelete
End If
Jumpdelete:
Sheets(TabName2).Activate
Rows(i & ":" & i).Delete
Present = ""
Jumpdontdelete:
Present = ""
Next
This seems to do the trick. There are two loops here, in the first loop we look at each row in tbl1 and see if it exists in tbl2. If it doesn't, then we delete it. If it does exist, we put its concatenated value in a Dictionary so we can remember it exists in both places. In the second loop, we go over tbl2 and for any concatenated value that doesn't exist in dict (Dictionary) then we know it's a "new" row, so we add this data to tbl1.
Option Explicit
Sub foo()
Dim j As Long
Dim rng As Range
Dim tbl1 As Range, tbl2 As Range
Dim dict As Object
Dim val As String
Dim r As Variant
Dim nextRow
Set dict = CreateObject("Scripting.Dictionary")
With Sheet2
Set tbl2 = .Range("A1:A" & .Range("A" & .Rows.Count).End(xlUp).Row).CurrentRegion
tbl2.Columns(4).Formula = "=c[-3]&c[-2]&c[-1]"
End With
With Sheet1
Set tbl1 = .Range("A1:A" & .Range("A" & .Rows.Count).End(xlUp).Row).CurrentRegion
End With
For j = tbl1.Rows.Count To 2 Step -1
'Does this row exist in Table2?
val = tbl1.Cells(j, 1) & tbl1.Cells(j, 2) & tbl1.Cells(j, 3)
r = Application.Match(val, tbl2.Columns(4), False)
If IsError(r) Then
tbl1.Rows(j).Delete Shift:=xlUp
Else
dict(val) = "" 'Keep track that this row exists in tbl1 AND tbl2
End If
Next
tbl2.Columns(4).ClearContents
Set tbl2 = tbl2.Resize(, 3)
For j = 2 To tbl2.Rows.Count
val = Join(Application.Transpose(Application.Transpose(tbl2.Rows(j).Value)), "")
'If the value doesn't exist, then we add row to Tbl1:
If Not dict.Exists(val) Then
nextRow = tbl1.Cells(1, 1).End(xlDown).Row + 1
tbl1.Rows(nextRow).Resize(, 3).Value = tbl2.Rows(j).Value
End If
Next
End Sub
Note: this necessarily assumes uniqueness in the concatenation of Name/Age/Gender. If there may be duplicates, then this method would need to be modified to not use a Dictionary object, could be done with array or collection etc.

Counting the number of Colourfilled cells

I am trying to find the number of colourfilled cells in B coloumn. I want to count and display the number of cour filled coloumns
But I am getting error :
Dim sum As Long
Dim count As Long
sum = 0
count = 0
strFileName = Application.GetOpenFilename("Excel files (*.xls*),*.xl*", Title:="Open data")
Set Target = Workbooks.Open(strFileName)
Set tabWS = Target.Worksheets("Tabelle1")
' lastrow = tabWS.Range("D" & tabWS.Rows.count).End(xlUp).Row 'Trigger Description starts from 2 row A coloumn
lastrow = tabWS.Range("B" & tabWS.Rows.count).End(xlUp).Row 'Trigger Description starts from 2 row A coloumn
For j = 2 To lastrow
If tabWS.Cells(j, 2).Interior.ColorIndex = 4 Then
sum = sum + tabWS.Cells(j, 8).value
count = count + 1
End If
Next j
MsgBox ("the value is" & sum)
End sub
I am getting error for sum = sum +tabs.cell(j,8).value
I can't figure it out whyI am getting this error. Can any one give me a suggestion
It looks to me like you're opening the Workbook each time you use a method on tabWS. Try setting tabWS equal to the following instead:
tabWS = Worksheets("Tabelle1")
Now when you're setting your lastrow and sum variables in the latter part of your code you won't be trying to open the workbook over and over again.
Edit (continued from comment below)*:
lastrow = Worksheets("Tabelle1").Range("B" & Worksheets("Tabelle1").Rows.count).End(xlUp).Row
For j = 2 To lastrow
If Worksheets("Tabelle1").Cells(j, 2).Interior.ColorIndex = 4 Then
sum = sum + Worksheets("Tabelle1").Cells(j, 8).value
count = count + 1
End If
Next j
MsgBox ("the value is" & sum)
End sub

Using SUMIFS to add time duration always gives 00:00:00

Sub Add_sumf()
Dim i As Integer
i = 3
Dim cellDate As Integer
cellDate = 0
Dim cellDate1 As Date
cellDate1 = TimeValue("00:00:00")
Dim total As Integer
total = 0
Dim j As Integer
j = 2
Dim k As Integer
k = 2
Set aa = Workbooks("Book3").Worksheets(1)
Set bb = Workbooks("Final_result").Worksheets(1)
Do While bb.Cells(1, k).Value <> ""
For Each y In bb.Range("A:A")
On Error GoTo Label
If UCase(bb.Cells(j, "A").Value) <> "" Then
cellDate1 = WorksheetFunction.SumIfs(aa.Range("F:F"), aa.Range("B:B"), UCase(bb.Cells(1, k).Value), aa.Range("G:G"), UCase(bb.Cells(j, "A").Value))
bb.Cells(j, k).Value = TimeValue(cellDate1)
cellDate1 = TimeValue("00:00:00")
bb.Cells(j, k).NumberFormat = "[h]:mm:ss"
On Error GoTo Label
j = j + 1
Else
Exit For
End If
Next
j = 2
k = k + 1
Loop
Label:
'MsgBox Err.Description
Exit Sub
End Sub
I am using above code to add time duration based upon value of two other columns but I always get 00:00:00 as result.
if i use below code i get the answer but its too slow very slow
Sub add_it_time()
Dim i As Integer
i = 3
Dim cellDate As Integer
cellDate = 0
Dim cellDate1 As Date
cellDate1 = TimeValue("00:00:00")
Dim total As Integer
total = 0
Dim j As Integer
j = 2
Dim k As Integer
k = 2
Set aa = Workbooks("Book3").Worksheets(1)
Set bb = Workbooks("Final_result").Worksheets(1)
Do While bb.Cells(1, k).Value <> ""
'MsgBox bb.Cells(1, k).Value
For Each y In bb.Range("A:A")
On Error GoTo Label
' MsgBox UCase(bb.Cells(j, "A").Value)
If UCase(bb.Cells(j, "A").Value) <> "" Then
For Each x In aa.Range("F:F")
On Error Resume Next
If UCase(aa.Cells(i, "B").Value) = UCase(bb.Cells(j, "A").Value) Then
' MsgBox aa.Cells(i, "F").Text
' total = total + Int(get_Second(aa.Cells(i, "F").Text))
If UCase(aa.Cells(i, "G").Value) = UCase(bb.Cells(1, k).Value) Then
'MsgBox aa.Cells(i, "F").Text
cellDate1 = cellDate1 + TimeValue(aa.Cells(i, "F").Value)
End If
End If
i = i + 1
Next
i = 3
On Error GoTo Label
bb.Cells(j, k).NumberFormat = "h:mm:ss"
bb.Cells(j, k).Value = WorksheetFunction.Text(cellDate1, "[hh]:mm:ss")
total = 0
cellDate1 = 0
j = j + 1
Else
Exit For
End If
Next
j = 2
k = k + 1
Loop
Label:
'MsgBox Err.Description
Exit Sub
End Sub
The source column which contains date is of general formatt
I am new to VBA macros
UPDATED SOLUTION:
After discussion in chat with OP it was decided that pure formula solution is fine - below are formulas / actions to do on the separate sheet starting A1:
Row A will be resulting table header: in A1 I added Agent Name / Release Code, and starting B1 there's a list of all available Release Code values (easily got using Remove Duplicates).
I defined the following named ranges for the simplicity and effectiveness (since initial data is NOT static): AgentNames=OFFSET('Agent State'!$B$2,0,0,COUNTA('Agent State'!$B:$B)-1,1) - this will return the range of names on the initial sheet excluding the header; TimeInStateData=OFFSET(AgentNames,0,4) and ReleaseCodes=OFFSET(AgentNames,0,5) as shifted AgentNames range.
In column A we should obtain the list of names, which should be unique, so select in column A any number of cells which is NOT less that number of unique names - for the sample I used A2:A51, and type that formula: =IFERROR(INDEX(AgentNames,SMALL(IF(MATCH(AgentNames,AgentNames,0)=ROW(INDIRECT("1:"&ROWS(AgentNames))),MATCH(AgentNames,AgentNames,0),""),ROW(INDIRECT("1:"&ROWS(AgentNames))))),"") and press CTRL+SHIFT+ENTER instead of usual ENTER - this will define a Multicell ARRAY formula and will result in curly {} brackets around it (but do NOT type them manually!).
B2: =IF(OR($A2="",SUMPRODUCT(--($A2=AgentNames),--(B$1=ReleaseCodes),TIMEVALUE(TimeInStateData))=0),"",SUMPRODUCT(--($A2=AgentNames),--(B$1=ReleaseCodes),TIMEVALUE(TimeInStateData))) - normal formula, which will return empty value for either empty name or zero time.
Copy formula from B2 to the whole table.
Remarks:
Resulting range for the sum of time values should be formatted as Time.
If the list of names should be expanded in the future - repeat step 3 for the new range, but do NOT drag the formula down - this will result in You cannot change part of an array error.
Sample file: https://www.dropbox.com/s/quudyx1v2fup6sh/AgentsTimeSUM.xls
INITIAL ANSWER:
Perhaps that's too simple and obvious, but at a glance I don't understand why you have that line of code:
cellDate1 = TimeValue("00:00:00")
right after your SUMIFS: cellDate1 = WorksheetFunction.SumIfs(aa.Range("F:F"), ...
Try to remove the first one where you assign zeros to cellDate1.

Excel Loop through list,transpose and create a matrix based on cell content

I am receiving a large file 500k+ lines but all the content is in column A. I need to run a macro that will transpose the data into matrix form but will only create a new row when it finds "KEY*" in the ActiveCell. For example:
| KEY 4759839 | asljhk | 35049 | | sklahksdjf|
| KEY 359 | skj | 487 |y| 2985789 |
The above data in my file would originally look like this in column A:
KEY 4759839
asljhk
35049
sklahksdjf
KEY 359
skj
487
y
2985789
Considerations:
Blank cells need to be transposed as well, so the macro cant stop based on emptyCell
The number of cells between KEY's is not constant so it actually needs to read the cell to know if it should create a new row
It can either stop based on say 20 empty cells in a row or prompt for a max row number
(Optional) It would be nice if there was some sort of visual indicator for the last item in a row so that its possible to tell if the last item(s) were blank cells
I searched around and found a macro that had the same general theme but it went based on every 6 lines and I did not know enough to try to modify it for my case. But in case it helps here it is:
Sub kTest()
Dim a, w(), i As Long, j As Long, c As Integer
a = Range([a1], [a500000].End(xlUp))
ReDim w(1 To UBound(a, 1), 1 To 6)
j = 1
For i = 1 To UBound(a, 1)
c = 1 + (i - 1) Mod 6: w(j, c) = a(i, 1)
If c = 6 Then j = j + 1
Next i
[c1].Resize(j, 6) = w
End Sub
I would greatly appreciate any help you can give me!
This works with the sample data you provided in your question - it outputs the result in a table starting in B1. It runs in less than one second for 500k rows on my machine.
Sub kTest()
Dim originalData As Variant
Dim result As Variant
Dim i As Long
Dim j As Long
Dim k As Long
Dim countKeys As Long
Dim countColumns As Long
Dim maxColumns As Long
originalData = Range([a1], [a500000].End(xlUp))
countKeys = 0
maxColumns = 0
'Calculate the number of lines and columns that will be required
For i = LBound(originalData, 1) To UBound(originalData, 1)
If Left(originalData(i, 1), 3) = "KEY" Then
countKeys = countKeys + 1
maxColumns = IIf(countColumns > maxColumns, countColumns, maxColumns)
countColumns = 1
Else
countColumns = countColumns + 1
End If
Next i
'Create the resulting array
ReDim result(1 To countKeys, 1 To maxColumns) As Variant
j = 0
k = 1
For i = LBound(originalData, 1) To UBound(originalData, 1)
If Left(originalData(i, 1), 3) = "KEY" Then
j = j + 1
k = 1
Else
k = k + 1
End If
result(j, k) = originalData(i, 1)
Next i
With ActiveSheet
.Cells(1, 2).Resize(UBound(result, 1), UBound(result, 2)) = result
End With
End Sub
Tested and works:
Sub test()
Row = 0
col = 1
'Find the last not empty cell by selecting the bottom cell and moving up
Max = Range("A650000").End(xlUp).Row 'Or whatever the last allowed row number is
'loop through the data
For i = 1 To Max
'Check if the left 3 characters of the cell are "KEY" and start a new row if they are
If (Left(Range("A" & i).Value, 3) = "KEY") Then
Row = Row + 1
col = 1
End If
Cells(Row, col).Value = Range("A" & i).Value
If (i > Row) Then
Range("A" & i).Value = ""
End If
col = col + 1
Next i
End Sub