Excel-VBA: Count occurrencies of a different strings and list them - vba

Today I have the following problem: I have 2 columns of x rows (doesn't matter how many) in Excel with a string in each one, like this
A B
Apple Potato
Banana Potato
Apple Potato
Orange Apple
Each string can appear in both column.
I need to obtain the following results:
Fruit Occurrencies
Apple 3
Banana 1
Potato 3
Orange 1
Now, I know for sure that there's a way much faster than what I could think of and I'd appreciate any help you can give.
My solution would consist in storing one by one the strings in an array checking each time if they are already contained in the slots before the current one and, if not, counting its occurrencies too. For example after having stored all the strings in an array (which I will now call Fruit()):
Dim Str() As Variant
Dim Flag As Boolean
For i = LBound(Fruit)+1 to Ubound(Fruit)
Flag = True
For j = i to LBound(Fruit)
If Fruit(i) = Fruit(j) Then
Flag = False
Exit For
End If
Next
If Flag = True Then
Str(k,0) = Fruit(i)
For y = LBound(Fruit) to UBound(Fruit)
if Str(k,0) = Fruit(y) Then Str(k,1) = Str(k,1)+1
Next
k = k+1
End If
Next
This is totally crazy and I know there's an easier solution... I just can't find it.

You can use the dictionary object, it looks pretty straightforward to me
Sub fruitsCount()
Dim sourceRange As Range
Dim sourceMem As Object
Dim curRow as integer
'CHANGE TO WHATEVER SHEET NAME YOUR ARE USING
With Worksheets("SOURCE_SHEET")
Set sourceRange = .Range("A1:B" & .Range("A" & .Rows.count).End(xlUp).row)
End with
Set sourceMem = CreateObject("Scripting.dictionary")
For Each cell In sourceRange
On Error GoTo ERREUR
sourceMem.Add cell.Value, 1
On Error GoTo 0
Next
curRow = 2
'CHANGE TO WHATEVER SHEET NAME YOUR ARE USING
With Worksheets("DESTINATION_SHEET")
.Range("A1").Value = "Fruit"
.Range("B1").Value = "Occurencies"
For Each k In sourceMem.Keys
.Range("A" & curRow).Value = k
.Range("B" & curRow).Value = sourceMem(k)
curRow = curRow + 1
Next k
End With
Set sourceMem = Nothing
Exit Sub
ERREUR:
sourceMem(cell.Value) = sourceMem(cell.Value) + 1
Resume Next
End Sub
Edit: the logic behind the code is actually fairly simple, and relies on the dictionary object which allows to garner (key, value) pairs. Here the keys will be the fruit names, and the values the number of occurences for each fruit. The distinctive feature of the dictionary object the code relies on is that it won't allow duplicate keys - any time you try and add a key that already exists in the dictionary, a runtime error will be issued.
So the code just scans through each and every cell of your source range and it tries to add its value as a key to the dictionary:
if the operation succeeds, then this is the first occurence for that fruit in the source range - it is added as a key to the dictionary, and its paired value is set to 1
else, the fruit already exists as a key in the dictionary - and thus an error occurs when trying to add the fruit to the dictionary. The code then jumps to the ERREUR error hanlder to increment the value paired with that existing fruit key in the dictionary, and resume normal execution from there
Hopes that helps clarifying

Checking yours as a correct answer and +1 for help, but I wanted to share with the community the effort to make this work for an array too:
Private Function FilesCount(SourceRange As Range) As Variant
Dim SourceMem As Object
Dim Occurrencies() As Variant
Dim OneCell As Range
Dim i As Integer
Set SourceMem = CreateObject("Scripting.dictionary")
For Each OneCell In SourceRange
On Error GoTo Hell
SourceMem.Add OneCell.Value, 1
On Error GoTo 0
Next
ReDim Occurrencies(SourceMem.Count - 1, 1)
For i = 0 To SourceMem.Count - 1
Occurrencies(i, 0) = SourceMem.Keys()(i)
Occurrencies(i, 1) = SourceMem.Items()(i)
Next i
Set SourceMem = Nothing
FilesCount = Occurrencies
Exit Function
Hell:
SourceMem(OneCell.Value) = SourceMem(OneCell.Value) + 1
Resume Next
End Function
It returns an (n x 2) array, in which there are n names and their occurrence in the selected range

Related

Excel VBA find all values in row and save different column values to variables

I've done quite a bit of searching and can't find any code that matches my situation or to a point I can modify except for one.
Looking at the spreadsheet below. I want to have the user enter the OrderNumber then search Column A for every value of that number. As it does I want it to copy the ItemNumber and QtyOrdered to two different variables in order to put them into textboxes later on.
I want it to "stack" the information into the variable so something like ItemNumValues = ItemNumValues + Cell.Value
I tried to modify code from someone else ("their code") but I am getting a mismatch type error. The rest of the code works. There are some trace elements in the script from previous features that aren't used and I just haven't removed them yet.
'***********************************************************
'********** Their Code Follows *****************
'***********************************************************
Dim numentries As Integer
Dim i As Integer
'***********************************************************
'Get number of entries
numentries = Worksheets(Sheet1).UsedRange.Rows.Count
'*************************************************************
'Run loop to cycle through all entries (rows) to copy
For i = 1 To numentries
If (Worksheets("Sheet1").Cells(i + 2, 1).Value = InStr(1, Cell, OrderNumber, vbTextCompare)) Then
MsgBox Test
End If
Next i
End If
'***********************************************************
'********** End Their Code *****************
'***********************************************************
I recommend using a multidimensional array. If you've never used arrays before, I strongly suggest reading up on them.
Sub GatherData()
Dim c As Range
Dim aGetData() As Variant 'This is our array
Dim i As Integer
Dim a As Integer
Dim iRowCount As Integer
Dim sRange As String
'Gather data
iRowCount = Worksheets("Sheet1").UsedRange.Rows.Count
For Each c In Range("A2:A" & iRowCount)
If c.Value = 636779 Then
ReDim Preserve aGetData(2, i) 'An array must have a set size but as we
'do not know how many order numbers will be found we have to 'resize'
'the array to account for how many we do find. Using "ReDim Preserve"
'keeps any data we have placed into the array while at the same time
'changing it's size.
For a = 0 To 2 'Our first index will hold each col of data that is why
'it is set to 2 (arrays start at a base of zero, so
'0,1,2 will be each col(A,B,C)
aGetData(a, i) = c.Offset(0, a) 'This gets each value from col A,B and C
Next a
i = i + 1 'Increment for array in case we find another order number
'Our second index "aGetData(index1,index2) is being resized
'this represents each order number found on the sheet
End If
Next c
'How to read the array
For i = 0 To UBound(aGetData())
For a = 0 To 2
Debug.Print aGetData(a, i)
Next a
Next i
End Sub
It seems that the OrderNumber (column A) is sorted. Very good news (if they're not, just sort them ;) ). This simple function will get you the ItemNumbers and QtyOrdered into a bi-dimensional array, where each row is a pair of them.
Function ArrItemQty(ByVal OrderNumber As Long)
With Worksheets("Sheet1").UsedRange.Offset(1)
.AutoFilter 1, OrderNumber
ArrItemQty= .Resize(, 2).Offset(, 1).SpecialCells(xlCellTypeVisible).value
.Parent.AutoFilterMode = False
End With
End Function
And here's a little testing:
Sub Test()
Dim i As Long, j As Long, ar
ar = ArrItemQty(636779)
For i = LBound(ar, 1) To UBound(ar, 1)
Debug.Print
For j = LBound(ar, 2) To UBound(ar, 2): Debug.Print ar(i, j),: Next
Next
End Sub
p.s. be aware that the resulting array is 1-based. Use LBound and UBound as indicated is safest.

Search Range for Value Change, Copy Whole Row if Found

I'm very new to VBA (~4 days new) and have tried to solve this issue in my usual method, through reading lots of different posts on resources like this and experimenting, but have not been able to quite get the hang of it. I hope you fine folks are willing to point out where I'm going wrong with this. I've looked at a lot (all?) of the threads with similar issues but haven't been able to cobble together a solution for myself from them. I hope you'll forgive this if it has already been answered somewhere else.
Context:
I've got a spreadsheet with items in rows 5-713 down column B (merged up to cell J) where for each date (Columns K-SP) the item is scored either a 1 or a 0. My goal is to create a list at the bottom of the worksheet that contains all items which have gone from 1 to 0. To start, I've simply been trying to get my "generate list" button to copy all rows with a 0 in them to the bottom, figuring I would tweak it later to do exactly what I wanted. I've tried several things and gotten several different errors.
Worksheet Sample for a visual of what I'm talking about.
I've gone through several different attempts and have had limited success with each, usually getting a different error every time. I've had "method 'range of object' _Worksheet failed", "object required", "type mismatch", "out of memory", and a few others. I'm sure I'm simply not grasping some of the basic syntax, which is causing some problems.
Here is the latest batch of code, giving me the error 'type mismatch'. I've also tried having 'todo' be string but that just shoots out 'object required'
Sub CommandButton1_Click()
Application.ScreenUpdating = False
Dim y As Integer, z As Integer, todo As Range
Set todo = ThisWorkbook.ActiveSheet.Range(Cells(5, 2), Cells(713, 510))
y = 5
z = 714
With todo
Do
If todo.Rows(y).Value = 0 Then
todo.Copy Range(Cells(z, 2))
y = y + 1
z = z + 1
End If
Loop Until y = 708
End With
Application.ScreenUpdating = True
End Sub
Another attempt I thought was promising was the following, but it gives me 'out of memory'.
Private Sub CommandButton1_Click()
Application.ScreenUpdating = False
Dim y As Integer, z As Integer
y = 5
z = 714
Do
If Range("By:SPy").Value = 0 Then
Range("By:SPy").Copy Range("Bz")
y = y + 1
z = z + 1
End If
Loop Until y = 708
Application.ScreenUpdating = True
End Sub
Just to reiterate, the code attempts I've posted were simply to get any row containing 0's to the bottom of the spreadsheet, however, if there's a way define the criteria to search for 1's that turn to 0's, that would be amazing! Also, I'm not sure how to differentiate a 0 in the actual data and a zero in the item name (for example, it would not be great to have 'Item 10' go into the list just because 10 is a 1 with a 0 after it).
Any help to figure out this first step, or even how to have it scan for 1's that turn to 0's would be wonderfully appreciated. I'm sure I'm missing something simple and hope you guys can forgive my ignorance.
Thanks!
This looks through the data and copies it down below the last row of the data. It is assuming there is nothing below the data. It also only looks for zeros after it finds a 1.
Sub findValueChange()
Dim lastRow As Long, copyRow As Long, lastCol As Long
Dim myCell As Range, myRange As Range, dataCell As Range, data As Range
Dim hasOne As Boolean, switchToZero As Boolean
Dim dataSht As Worksheet
Set dataSht = Sheets("Sheet1") '<---- change for whatever your sheet name is
'Get the last row and column of the sheet
lastRow = dataSht.Cells(Rows.Count, 2).End(xlUp).row
lastCol = dataSht.Cells(5, Columns.Count).End(xlToLeft).Column
'Where we are copying the rows to (2 after last row initially)
copyRow = lastRow + 2
'Set the range of the items to loop through
With dataSht
Set myRange = .Range(.Cells(5, 2), .Cells(lastRow, 2))
End With
'start looping through the items
For Each myCell In myRange
hasOne = False 'This and the one following are just flags for logic
switchToZero = False
With dataSht
'Get the range of the data (1's and/or 0's in the row we are looking at
Set data = .Range(.Cells(myCell.row, 11), .Cells(myCell.row, lastCol))
End With
'loop through (from left to right) the binary data
For Each dataCell In data
'See if we have encountered a one yet
If Not hasOne Then 'if not:
If dataCell.Value = "1" Then
hasOne = True 'Yay! we found a 1!
End If
Else 'We already have a one, see if the new cell is 0
If dataCell.Value = "0" Then 'if 0:
switchToZero = True 'Now we have a zero
Exit For 'No need to continue looking, we know we already changed
End If
End If
Next dataCell 'move over to the next peice of data
If switchToZero Then 'If we did find a switch to zero:
'Copy and paste whole row down
myCell.EntireRow.Copy
dataSht.Cells(copyRow, 2).EntireRow.PasteSpecial xlPasteAll
Application.CutCopyMode = False
copyRow = copyRow + 1 'increment copy row to not overwrite
End If
Next myCell
'housekeeping
Set dataSht = Nothing
Set myRange = Nothing
Set myCell = Nothing
Set data = Nothing
Set dataCell = Nothing
End Sub

Excel Dictionary not updating?

I'm a relative newcomer to VB so am having real problems trying to understand how things fit together. I'm currently trying to use a global dictionary to store headers/column values so they can be accessed quickly when run (as the column numbers may change depending on content). However i'm struggling to make a dictionary work, it appears to add values but later in the code shows up as empty, i have no idea what i'm doing wrong and would appreciate any help.
Public dataHeaders As Dictionary
Public Function getCases()
Set dataHeaders = CreateObject("Scripting.Dictionary")
For i = 1 To 100
If IsEmpty(Worksheets("DATA").Cells(1, i)) Then
Exit For
Else
dataHeaders.Add Worksheets("DATA").Cells(1, i), i
End If
Next
For i = 1 To 10
For j = 1 To 750
If Worksheets("Summary").Cells(1, i) = Worksheets("DATA").Cells(dataHeaders("Checker"), j) Then
Worksheets("Summary").Cells(2, i) = Worksheets("Summary").Cells(2, i) + 1
End If
Next
Next
End Function
I suspect that your problem is either a casing issue or whitespace. To get rid of this issue, use the Trim and UCase (or LCase) to normalise your text before using it in a dictionary.
I tested the below code, and it outputs what I would expect..
Sub test()
Dim headers As Dictionary
Dim valueCount As Integer
Dim ws As Worksheet
Dim headerRange As Range
Set ws = Sheet1
Set headers = New Dictionary
'get last column on the right of our header row
valueCount = ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column
Set headerRange = ws.Cells(1, 1).Resize(1, valueCount)
Dim i As Long
i = 1
For Each cell In headerRange
'Trim and convert to upper when assigning to array.
headers.Add UCase(Trim(cell.Value)), i
i = i + 1
Next cell
For Each Key In headers.Keys
'Note the usage of Trim and UCase
Debug.Print "item: " & Key & " Value : " & headers(UCase(Trim(Key)))
Next Key
End Sub

List column numbers if columns contain string

I have some code that searches for the string "dog" in sheet1 of a workbook, the string can appear many times in the sheet, and it gives me a vector of the column numbers if the string was found in those columns, (dog can only appear once in each column). I have a button on the sheet which I assign this macro:
Option Explicit
Sub mymacro2()
Dim dog() As Integer
Dim coldog As Range
Set coldog = Sheets(1).UsedRange.Find("dog", , xlValues, xlWhole)
Dim i As Integer
i = 0
ReDim dog(0)
dog(i) = coldog.Column
Do
i = i + 1
ReDim Preserve dog(i)
Set coldog = Sheets(1).UsedRange.FindNext(coldog)
dog(i) = coldog.Column
Loop While dog(i) <> dog(0)
ReDim Preserve dog(i - 1)
Sheets(1).Cells(1, 1).Resize(1, UBound(Application.Transpose(dog))) = dog
'above line is displaying the vector on the sheet for testing purposes
Set coldog = Nothing
ReDim dog(0)
End Sub
The macro gives me the vector I want, i.e. it tells me in which columns I can find the string "dog".
Now, I want to modify the code or create a whole new code that does the same thing for each string in a list of strings found in column 1 on sheet2. All the vectors with the column numbers has to have the same name as the string it has column information about. Like I do manually in the code above.
The point is I have a list of about 130 animals which I need to do the same thing for. What is the best way of doing that in Excel VBA?
You have to store all the animals in another Array and call the given actions for each of them. Also your code has quite a few redundant parts. The sample code below should give you a good grasp to understand how to face this problem (as said via comment by Mehow, we are not here to write codes for you).
Dim totAnimals As Integer, i As Integer
totAnimals = 3
ReDim animals(totAnimals - 1) As String
animals(0) = "dog"
animals(1) = "cat"
animals(2) = "mouse"
'etc.
maxMatches = 100 'Maximum number of matches per animal. better don't make this value too big
ReDim matchCount(totAnimals - 1) 'This counter goes from 1 to maxMatches
ReDim matchCols(totAnimals - 1, maxMatches) As Integer
Dim targetRange As Range, tempRange As Range, tempRange2 As Range
Set targetRange = Sheets("sheet2").Columns(1)
For i = 0 To totAnimals - 1
Set tempRange = targetRange.Find(animals(i), , xlValues, xlWhole)
If (Not tempRange Is Nothing) Then
If (matchCount(i) + 1 <= maxMatches) Then
matchCount(i) = matchCount(i) + 1
matchCols(i, matchCount(i)) = tempRange.Column
Dim startAddress As String: startAddress = tempRange.Address
Set tempRange2 = tempRange
Do
Set tempRange2 = targetRange.FindNext(tempRange2)
If (Not tempRange2 Is Nothing) Then
If (tempRange2.Address = startAddress) Then Exit Do
Else
Exit Do
End If
If (matchCount(i) + 1 > maxMatches) Then Exit Do
matchCount(i) = matchCount(i) + 1
matchCols(i, matchCount(i)) = tempRange2.Column
Loop While (Not tempRange2 Is Nothing)
End If
End If
Next i

Need a better optimized code?

Need a much Optimized code.Well I Got a Project and I have Succefully made it work with the vba (Mostly helped by the stackoverflow programmers Thanks for that)
But Today I got a Feedback. Its deleting 2 more unique entries in the record But I dont know why its deleting Them.
The Algorithm I have applied
I have Used the COUNTIF function Which I found on google
="countif(A$1:A2,A3)=0" A3 is the active cell, Checks A2,A1 for dupes
It Throws False if there is a duplicate in The A column and True If it is a unique.What I have understood about Countif is that
It checks all the above columns values from that cell I mean let us take A4. SO it checks A2,A1,A3 for the duplicate. Similarly A10 checks for A1 to A9 and throws either TRue or False.Well It was working But I dont know what went wrong The code is not working for some entries.Its even showing False for the Unique entries sometimes.
And its taking more time to applye these formula as I have more amount of data. Im trying to make it cleaner and more Optimizing Way.People told me its not a c or some other Language to make it optimize but Im need of code that makes my code more optimized
I need code for these condtions can anyone help me as my countif failed.Im little helpless in doing so.
1)I have a column and I should check for duplicates in that column and delete that row if it is a duplicate
2) I have 35000 Old entries in the column and I have new entries 2000 everyweek these are appended. I need to check these 2000 entries from the total 37000 ( as we appened we get 35000+2000) and these delete operation need to be performed only on the newly appended 2000 entries but it should check the duplicates for entire column
Let me explain you clearly I have 2000 entries newly added,so Only these entries are to be checked for the duplicates from the 35000 entries and also from itself (2000 entries) and delete it if it is a duplicate and no duplicating operation should be performed on the 35000 entries old data.
I have found some codes but they are deleting even the duplicates of the 35000 entries. I have set the range but even though its not working.
Can anyone help me with the best code that takes less time?please thank you
Updating my question with the sample code I have
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
PTY 3945.678 2 2 PTY3945.67822
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
let us say these are old 35000 entries
Explaination to the above example.
The above are the 35000 entries. I have to check A,B,F,G,H,I columns for the dupes, if they are same I have to delete the row, I should not bother about the other columns c,d etc. so what I did is I have used one unused column Y and concatenated these 6 columns values into 1 at Y column using these
= A2 & B2 & F2 & G2 & H2 &I2 with the respective columns
Now checking the Y column for dupes and delete the entire row. as 2003 supports only for one column as far to my knowledge.
Notice that even the 35000 entries may have duplicates in it but I should not delete them. Example you can see the 2 and last row in my example code are dupes but I should not delete
as it is the old data.
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 2 2 PTY3945.67822 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
PTY 39868.5 4 2 540 3 PTY39868.5425403 'new
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
Now note that New entry PTY (from last 2nd) is a duplicate of the original record(PTY at first) So I hava to delete it.And the last new entry is a duplicate of the new entry itself so I should delete it even that . SO in the above code I have to delete only the last 2 rows which are dupes of original record and also from it . But should not delete the GTY which is the dupe but which is in orginal record.
I think I gave a clear view now. Is concatenating them into one cell . Is it better way to approach? as conactenatin for 40000 entries taking just 2 seconds i think that doesnt matter but any more algorithms to these is much aprreciated
I heard counif treats 45.00 and 45.00000 as different is that right may be that was the problem with it? since I have decimal points in my data. I think I should do
= I2 & H2 & G2 & F2 & A2 & B2
which is better to concatenate? is this or the other i posted before?
BIG UPDATE:
It think the original questions threw me off - there may be a problem with the logic in the question. The following assumes you want to delete the cell, not entire row, for the duplicate entries.
If the 35000 old records do not include duplicates, then all you need to do is remove all duplicates from the entire column - so long as you start from row 1, you run no risk of deleting any of the 'old' rows since no duplicates exist in them.
Here is one way:
Sub UniqueList()
Application.ScreenUpdating = False
Dim vArray As Variant
Dim i As Long, j As Long, lastrow As Long
Dim dictionary As Object
Set dictionary = CreateObject("scripting.dictionary")
lastrow = Range("A" & Rows.Count).End(xlUp).Row
vArray = Range("A1:A" & lastrow).Value
On Error Resume Next
For i = 1 To UBound(vArray, 1)
For j = 1 To UBound(vArray, 2)
If Len(vArray(i, j)) <> 0 Then
dictionary(vArray(i, j)) = 1
End If
Next
Next
Columns("A:A").ClearContents
Range("A1").Resize(dictionary.Count).Value = _
Application.Transpose(dictionary.keys)
Application.ScreenUpdating = True
End Sub
If for some odd reason the 35000 old records DO include dupes and you only want to allow these 35000 records to do so, then you can use 2 dictionaries, but this would be an unusual case since you'd be treating the old records differently than new...
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 1 To UBound(varray, 1)
If oldDict.exists(varray(i, 1)) = False Then
newDict.Add varray(i, 1), 1
End If
Next
'Delete and slap back on the unique list
Range("A35001", "A" & Rows.Count).ClearContents
Range("A35001").Resize(newDict.Count).Value = _
Application.Transpose(newDict.keys)
Application.ScreenUpdating = True
End Sub
Thanks to Reafidy for the advice and getting me to relook at this.
This is also a response to some of the comments and solutions made by other members so sorry if it does not straight away answer your question.
Firstly I believe that using excel in a database scenario that raw data and presentation data should be separated. This usually means a single worksheet with raw data and multiple other worksheets with presentation data. Then delete the raw data when necessary or archive.
When speed testing it is very difficult to get a level playing field in excel as there are many things that affect the results. Computer specs, available RAM etc.. Code must first be compiled before running any of the procedures. The test data is also important, when considering duplicates - how many duplicates vs how many rows. This sub loads some test data, altering the amount of rows vs the range of random numbers (duplicates) will give very different results for your code. I don't know what your data looks like so we are kind of working blind and your results may be very different.
'// This is still not very good test data, but should suffice for this situation.
Sub TestFill()
'// 300000 rows
For i = 1 To 300000
'// This populates a random number between 1 & 10000 - adjust to suit
Cells(i, "A").value = Int((100000 + 1) * Rnd + 1)
Next
End Sub
If we are talking about advanced filter vs an array & dictonary method then advanced filter will be quicker with a lower amount of rows but once you get above a certain amount of rows then the array method will be quicker. Then see what happens when you change the amount of duplicates.... :)
As a guideline or as a general rule using excels built in functions will be faster and I recommend always develop attempting to use these inbuilt functions, however there are often exceptions, like above when removing duplicates. :)
Deleting rows can be slow when looping if used incorrectly. If looping is used then it is important to keep synchronisation between code and the workbook out of the loop. This usually means read data to an array, loop through the data, then load the data from the array back to the presentation worksheet essentially deleting the unwanted data.
Sub RemoveDuplicatesA()
'// Copy raw data to presentation sheet
Range("A1", Cells(Rows.Count, "A").End(xlUp)).AdvancedFilter _
Action:=xlFilterCopy, CopyToRange:=Sheet2.Range("B1"), Unique:=True
End Sub
This will be the fastest method:
Sub RemoveDuplicatesB()
Dim vData As Variant, vArray As Variant
Dim lCnt As Long, lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(0 To UBound(vData, 1), 0)
lCnt = 0
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .Exists(vData(lRow, 1)) Then
vArray(lCnt, 0) = vData(lRow, 1): lCnt = lCnt + 1
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Copy raw data to presentation sheet
Sheet2.Range("B1").Resize(lCnt).value = vArray
End Sub
Application transpose has a limitation of 65536 rows but as you are using 2003 you should be fine using it, therefore you can simplify the above code with:
Sub RemoveDuplicatesC()
Dim vData As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
.Add vData(lRow, 1), Nothing
End If
Next lRow
'// Copy raw data to presentation sheet or replace raw data
Sheet2.Columns(2).ClearContents
Sheet2.Columns(2).Resize(.Count).value = Application.Transpose(.keys)
End With
End Sub
EDIT
Okay so #Issun has mentioned you want the entire row deleted. My suggestion was to improve your spreadsheet layout by having a raw data and presentation sheet which means you dont need to delete anything hence it would have been the fastest method. If you dont want to do that and would like to edit the raw data directly then try this:
Sub RemoveDuplicatesD()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(1 To UBound(vData, 1), 0)
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
varray(lRow, 0) = "x"
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
Application.ScreenUpdating = False
'// Modify the raw data
With ActiveSheet
.Columns(2).Insert
.Range("B1").Resize(lRow).value = vArray
.Columns(2).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
.Columns(2).Delete
End With
Application.ScreenUpdating = True
End Sub
Before starting again from scratch your whole code, here are a few things you can try:
Optimize your VBA
There are several tips on the web about optimizing vba. In particular, you can do:
'turn off some Excel functionality so your code runs faster
'these two are especially very efficient
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'use these if you really need to
Application.DisplayStatusBar = False
Application.EnableEvents = False
'code goes here
'at the end, restore the default behavior
'calculate the formulas
Application.Calculate
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.DisplayStatusBar = True
Application.EnableEvents = True
See here for more information
Optimize your algorithm
Especially when your inserting your COUNTIF formula, you can try to fill in instead of inserting the formula in each row.
On the deleting row part, you should try the solution I gave you in your previous thread: Delete duplicate entries in a column in excel 2003 vba to filter first on the True values and then to delete the visible cells. It is probably the fastest way.
[EDIT] Seems like Doc Brown's answer would be probably the best way to handle this (hey, this is a dictionary solution that wasn't written by Issun :)). Anyway, the VBA optimization tips are still relevant because this is quite a slow language.
OK, here's the advancedfilter method. Don't know if it is faster than the dictionary method. It would be interesting to know though, so let me know after you try it. I also included the delete portion so you would have to stop that portion if you want to do a true comparison. Also, you can make this a function instead of a sub and put in your variables, however you want to change it.
Sub DeleteRepeats()
Dim d1 As Double
Dim r1 As Range, rKeepers As Range
Dim wks As Worksheet
d1 = Timer
Set wks = ActiveSheet
Application.EnableEvents = False
Application.ScreenUpdating = False
'Make sure all rows are visible
On Error Resume Next
wks.ShowAllData
wks.UsedRange.Rows.Hidden = False
wks.UsedRange.Columns.Hidden = False
On Error GoTo 0
'Get concerned range
Set r1 = wks.Range("A1:A35000")
'Filter
r1.AdvancedFilter Action:=xlFilterInPlace, Unique:=True
'Get range of cells not to be deleted
Set rKeepers = r1.SpecialCells(xlCellTypeVisible)
On Error Resume Next
wks.ShowAllData
On Error GoTo 0
rKeepers.EntireRow.Hidden = True
'Delete all undesirables
r1.SpecialCells(xlCellTypeVisible).EntireRow.Delete
'show all rows
On Error Resume Next
wks.UsedRange.Rows.Hidden = False
On Error GoTo 0
Application.EnableEvents = False
Application.ScreenUpdating = False
Debug.Print Timer() - d1
End Sub
OK, here's a take on Doc's and Issun's use of Dictionaries. Before I wasn't convinced but after looking at it and testing it and comparing to advanced filter, I am convinced, dictionaries are better for this application. I don't know why Excel isn't faster on this point since they should be using faster algorithms, it's not the hiding, unhiding of the rows since that happens very quickly. So if anyone knows, let me know. This procedure takes just over 1 second on my slow computer:
Sub FindDupesAndDelete()
Dim d1 As Double
Dim dict As Object
Dim sh As Worksheet
Dim v1 As Variant
' Dim s1() As String
Dim rDelete As Range
Dim bUnion As Boolean
d1 = Timer()
bUnion = False
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
v1 = Application.Transpose(sh.Range("A1", "A" _
& sh.Cells.SpecialCells(xlCellTypeLastCell).row))
' ReDim s1(1 To UBound(v1))
Dim row As Long, value As String ', newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = v1(row)
If dict.Exists(value) Then
' newEntry = False
If bUnion Then
Set rDelete = Union(rDelete, sh.Range("A" & row))
Else
Set rDelete = sh.Range("A" & row)
bUnion = True
End If
Else
' newEntry = True
dict.Add value, 1
End If
' s1(row) = newEntry
Next
rDelete.EntireRow.Delete
' sh.Range("B1", "B" & UBound(v1)) = Application.Transpose(s1)
Debug.Print Timer() - d1
End Sub
Okay so now we have some more info here is a solution. It should execute almost instantly.
The code works by filling column y with your concatenate formula. It then adds all of column y to a dictionary and using the dictionary marks each row as a duplicate in column z. It then removes all the duplicates found after row 35000. Then finally it clears both column y and column z to remove the redundant data.
Sub RemoveDuplicates()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
'// Get used range of column A (excluding header) and offset to get column y
With ActiveSheet.Range("A2", Cells(Rows.Count, "A").End(xlUp)).Offset(, 24)
'// Adds the concatenate formula to the sheet column (y)
.FormulaR1C1 = "=RC[-24]&RC[-23]&RC[-19]&RC[-18]&RC[-17]&RC[-16]"
'// Adds the formula results to an array
vData = .Resize(, 1).value
End With
'// Re dimension the array to the correct size
ReDim vArray(1 To UBound(vData, 1), 0)
'// Create a dictionary object using late binding
With CreateObject("Scripting.Dictionary")
'// Loop through each row in the array
For lRow = 1 To UBound(vData, 1)
'// Check if value exists in the array
If Not .exists(vData(lRow, 1)) Then
'// Value does not exist mark as non duplicate.
vArray(lRow, 0) = "x"
'// Add value to dictionary
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Turn off screen updating to speed up code and prevent screen flicker
Application.ScreenUpdating = False
With ActiveSheet
'// Populate column z with the array
.Range("Z2").Resize(UBound(vArray, 1)) = vArray
'// Use error handling as speciallcells throws an error when none exist.
On Error Resume Next
'// Delete all blank cells in column z
.Range("Y35001", .Cells(Rows.Count, "Y").End(xlUp)).Offset(, 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
'// Remove error handling
On Error GoTo 0
'// Clear columns y and z
.Columns(25).Resize(, 2).ClearContents
End With
'// Turn screen updating back on.
Application.ScreenUpdating = True
End Sub
NOTE: you can change all references "activesheet" to your sheet codename if you want.
NOTE2: it assumes you have headers and has left row 1 alone.
I have used your columns and test data as best I can. Here is the test fill I used:
Sub TestFill()
For i = 1 To 37000
With Range("A" & i)
.value = Choose(Int(2 * Rnd + 1), "PTY", "GTY")
.Offset(, 1).value = Round((40000 * (Rnd + 1)), Choose(Int(4 * Rnd + 1), 1, 2, 3, 4))
.Offset(, 5).value = Int(4 * Rnd + 1)
.Offset(, 6).value = Int(2 * Rnd + 1)
.Offset(, 7).value = Choose(Int(2 * Rnd + 1), "230", "540")
.Offset(, 8).value = Int(3 * Rnd + 1)
End With
Next i
End Sub
Lets say you have your entries in column A, and you want the result of your formula in column B (but much faster). This VBA macro should do the trick:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
sh.Range("B" & row) = "False"
Else
sh.Range("B" & row) = "True"
dict.Add value, 1
End If
Next
End Sub
(Using a dictionary gives here almost linear running time, which should be a matter of seconds for 35.0000 rows, where your original formula had quadratic running time complexity).
Edit: due to your comment: you will have to fill the dictionary first by reading each entry at least once, that is something you cannot avoid easily. What you can avoid is to fill the rows of column B again when they are already filled:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String, newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
newEntry = False
Else
newEntry = True
dict.Add value, 1
End If
If Trim(sh.Range("B" & row)) = "" Then sh.Range("B" & row) = newEntry
Next
End Sub
But I suspect this won't be much faster than my first solution.
Now that you have updated that you want the entire rows deleted and that the first 35000 rows are allowed to have dupes, here is a function that will do that for you. I think I came up with a clever method and it's blazing fast, too:
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 35000 + UBound(varray, 1) To 35001 Step -1
If oldDict.exists(varray(i - 35000, 1)) = True Or _
newDict.exists(varray(i - 35000, 1)) = True Then
Range("A" & i).EntireRow.Delete
Else
newDict.Add varray(i - 35000, 1), 1
End If
Next
Application.ScreenUpdating = True
'A status message at the end for finishing touch
MsgBox UBound(varray, 1) - newDict.Count & _
" duplicate row(s) found and deleted."
End Sub
How it works:
First I store the 35000 cells into a dictionary file. Then I take a variant array of every cell 35001 onward and loop through them backwards to see if it's in the 35k dictionary or not, or that we haven't come across the value yet in the loop. If it finds that it's a dupe, it deletes the row.
The cool (if I may say) way that it does the row deletion is that when you create the varray, for say A35001 - A37000, it stores them as (1, 1) (2, 1) (...). So if you set "i" to the Ubound of the array + 35000 and step back to 35001, you will loop through all the additions backwardsfrom A37000 to A35001. Then when you want to delete the row, "i" is perfectly set to the row number the value was found in, so you can delete it. And since it goes backwards, it does not screw up the loop!