What is the difference between .Value = "" and .ClearContents? - vba

If I run the following code
Sub Test_1()
Cells(1, 1).ClearContents
Cells(2, 1).Value = ""
End Sub
When I check Cells(1, 1) and Cells(2, 1) using formula ISBLANK() both results return TRUE. So I'm wondering:
What is the difference between Cells( , ).Value = "" and Cells( , ).ClearContents?
Are they essentially the same?
If I then run the following code to test the time difference between the methods:
Sub Test_2()
Dim i As Long, j As Long
Application.ScreenUpdating = False
For j = 1 To 10
T0 = Timer
Call Number_Generator
For i = 1 To 100000
If Cells(i, 1).Value / 3 = 1 Then
Cells(i, 2).ClearContents
'Cells(i, 2).Value = ""
End If
Next i
Cells(j, 5) = Round(Timer - T0, 2)
Next j
End Sub
Sub Number_Generator()
Dim k As Long
Application.ScreenUpdating = False
For k = 1 To 100000
Cells(k, 2) = WorksheetFunction.RandBetween(10, 15)
Next k
End Sub
I get the following output for runtime on my machine
.ClearContents .Value = ""
4.20 4.44
4.25 3.91
4.18 3.86
4.22 3.88
4.22 3.88
4.23 3.89
4.21 3.88
4.19 3.91
4.21 3.89
4.17 3.89
Based on these results, we see that the method .Value = "" is faster than .ClearContents on average. Is this true in general? Why so?

From what I have found, if your goal is to simple have an empty cell and you do not want to change anything about the formatting, you should use Value = vbNullString as that is the most efficient.
The 'ClearContents' is checking and changing other properties in the cell such as formatting and the formula (which is technically a separate property than Value). When using Value = "" you are only changing one property and so it is faster. Using vbNullString prompts the compiler that you are using an empty string versus the other way with double quotes, it is expecting a general string. Because vbNullString prompts it to expect an empty string, it is able to skip some steps and you get a performance gain.

when apply both in single cell I don't think there is no any sensible deferent but when you apply it in range Range("A1:Z1000").ClearContents is easier and faster than use cell(i,j).value="" in nested loop or one for loop

I did find one difference that might be of note. ClearContents returns a value, which seems to be of type Boolean in my limited testing (docs mention Variant type).
Option Explicit
Public Sub ClearA1()
Dim a As Range
Dim b As Boolean
Set a = Range("A1")
Debug.Print b 'It's False, the default value
b = a.ClearContents
Debug.Print b 'Set to True, as the action was completed
End Sub
I'd guess some of the overhead is from the fact ClearContents does return a value, where you are just setting a value property in the alternate case.
Ultimately, in terms of outcomes of setting the value both methods appear functionally the same.

Using clearcontent has different behavior on cells with formulas.
When you have just one value, the behavior is the same, but differs when you have formulas in it.

You can notice a big difference in Excel spreadsheet.
Assume that B1 is filled by equation returns blank
A1 = 5
B1 = "=if(A1=5,"","x")
In this case, you have to equations that you can write in C1
(1) C1 = <=isblank(B1)>
(2) C1 =
Solution 1 will return false, as the cell is filled with equation
Solution 2 will return True

I came across this topic a little late, but i would like to share what i have noticed with abit of code of mine, i don't think i can fully explain it but ill do my best.
For Each Cell In ws.Range("D12:D161") 'Order feed colom
Select Case Cell.Value
Case 0
Cell.Interior.Color = Cell.Offset(0, -1).Interior.Color
Case 1
Cell.Interior.Color = 10198015
Case 2
Cell.Interior.Color = 11854022
End Select
Cell.value = ""
Next Cell
This is a bit of code that i have used in order to clear some fields and give some color to the range D12:D161. Nothing special here, If the value is 0 then copy your neighbor if 1 then red if 2 then green. And clear the cell afterwards
But in order for this code to run it took roughly 5-6 seconds for me, which i thought was a fair bit for a small piece of code. Plus i used this on a Private Sub Workbook_SheetActivate(ByVal Sh As Object) which made it for the user unpleasant to wait 5-6 seconds for a screen transition. So i put a loop in to check for empty's in a row and then skip out.
It is noteworthy that this is part of a script, and yes i have my screenupdating off, calculations off, events off during this piece of code.
For Each Cell In ws.Range("D12:D161") 'Order feed colom
Select Case Cell.Value
Case 0
Cell.Interior.Color = Cell.Offset(0, -1).Interior.Color
Erow = Erow +1
Case 1
Cell.Interior.Color = 10198015
Erow = 0
Case 2
Cell.Interior.Color = 11854022
Erow = 0
End Select
Cell.value = ""
if Erow = 10 then exit for
Next Cell
Now instead of having to do 149 rows i did roughly 58 rows, depending on my data in the column. But still it it took 3-4 seconds in order to fully run. During Debug mode i noticed no lag at all. If i manually ran the code when already on the sheet, there was 0 delay. Almost instant, after testing abit more but when using a Private Sub Workbook_SheetActivate(ByVal Sh As Object) with this code it still ran 3-4 seconds.
After testing individual rows of code, i came across the .Value = "". Removing this line from the code made it run 0,5 seconds.... So now i knew where my problem was, using multiple ways of emptying my cells. I noticed that .clearcontents was the fastest for me. Apparently if you move from Sheet to sheet EVENTHOUGH ws. has been declared as my active sheet, it just ook alot of time
For Each Cell In ws.Range("D12:D161") 'Order feed colom
Select Case Cell.Value
Case 0
Cell.Interior.Color = Cell.Offset(0, -1).Interior.Color
Erow = Erow +1
Case 1
Cell.Interior.Color = 10198015
Erow = 0
Case 2
Cell.Interior.Color = 11854022
Erow = 0
End Select
Cell.ClearContents 'DONT USE .Value = "", makes the code run slow
if Erow = 10 then exit for
Next Cell
In conclusiong.
Using the above code with
.value = "" took 4-5 seconds
.value = VbNullstring took 3-4 seconds
.ClearContents took only 0,5 seconds. But only during a worksheet transition
Private Sub Workbook_SheetActivate(ByVal Sh As Object)
If anybody is able to explain why this is or what exactly is going on, i would appreciate it.

Related

Highlight cells based on row number on VBA

I want to change the cell style based on the row number. I am still new on VBA.
Here is my code:
Sub format()
FinalRow = Cells(Rows.Count, 1).End(xlUp).Row
For i = 1 To FinalRow
If Rows.Count = 2 * i + 1 Then
Selection.Style = "Good"
ElseIf Rows.Count = 2 * i Then
Selection.Style = "Bad"
End If
ActiveCell.Offset(1, 0).Select
Next i
End Sub
The loop moves to the next cell but does not highlight if a criteria is met. May you please help me.
I suggest the following:
Option Explicit
Public Sub FormatEvenOddRowNumbers()
Dim FinalRow As Long
FinalRow = Cells(Rows.Count, 1).End(xlUp).Row
Dim i As Long
For i = 1 To FinalRow
If i Mod 2 = 0 Then 'even row number
Cells(i, 1).Style = "Good"
Else 'odd row number
Cells(i, 1).Style = "Bad"
End If
Next i
End Sub
To test if a row number is "even" you can use If i Mod 2 = 0 Then also you don't need to test for "odd" because if it is not "even" it must be "odd" so you can just use Else without any criteria.
Try to avoid using .Select it makes your code slow. See How to avoid using Select in Excel VBA. Instead access the cells directly like Cells(row, column).
First, I think you missused Rows.Count.
Rows.Count returns the total number of rows of your sheet. So now your criteria is only highlighting the two rows that are in the middle of the sheet.
If I assume correctly that you want to put "Good" the Rows that are even and "bad" the ones that are odds. then you should change your code to something like this:
Sub format()
FinalRow = Cells(Rows.Count, 1).End(xlUp).Row
For i = 1 To FinalRow
If i/2 = int(i/2) Then
Selection.Style = "Good"
ElseIf (i+1)/2 = int((i+1)/2) Then
Selection.Style = "Bad"
End If
ActiveCell.Offset(1, 0).Select
Next i
End Sub

Deleting rows with values based on a column

I have a monthly base with almost 373,000 lines. Of these, part has a low value or is blank. I'd like to erase this lines.
I have part of this code to delete those that have zero. How to create a code that joins the empty row conditions (column D) in a more agile way.
Thanks
Sub DelRowsZero()
Dim i As Long
For i = Cells(Rows.Count, "D").End(xlUp).Row To 2 Step -1
If Cells(i, "D") = 0 Then Rows(i).Delete
Next i
End Sub
How about:
Sub ZeroKiller()
Dim N As Long, ToBeKilled As Range
Dim i As Long
N = Cells(Rows.Count, "A").End(xlUp).Row
For i = 1 To N
If Cells(i, "D").Value = 0 Or Cells(i, "D").Value = "" Then
If ToBeKilled Is Nothing Then
Set ToBeKilled = Cells(i, "D")
Else
Set ToBeKilled = Union(ToBeKilled, Cells(i, "D"))
End If
End If
Next i
If Not ToBeKilled Is Nothing Then
ToBeKilled.EntireRow.Delete
End If
End Sub
This assumes that A is the longest column. If this is not always the case, use:
N = Range("A1").CurrentRegion.Rows.Count
I am concerned about the 375K lines, who knows how long this will take to run.
Sub Button1_Click()
Dim i As Long
For i = Cells(Rows.Count, "D").End(xlUp).Row To 2 Step -1
If Cells(i, "D") = 0 Or Cells(i, "D") = "" Then
Rows(i).Delete
End If
Next i
End Sub
I'm curious to know if this works for others, it just uses the "replace" 0 values to blanks, then uses specialcells to delete the blank rows. My test of 38K rows takes 3 seconds.
Sub FindLoop()
Dim startTime As Single
startTime = Timer
'--------------------------
Columns("D:D").Replace What:="0", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=True, SearchFormat:=False, _
ReplaceFormat:=False
Columns("D:D").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
'---------------------------------
Debug.Print Timer - startTime
End Sub
There's apparently an argument to be made, that deleting rows as you find them would be faster than deleting them all at once.
So I ran the below code with 36000 rows of =RANDBETWEEN(0, 10) in columns A and B (and then copy+paste special/values), and it completed thrice in 32 seconds and dusts.
Uncommenting the currentValue assignment and replacing the array subscript accesses with currentValue comparisons adds 2.5 seconds overhead; uncommenting the IsError check adds another 3.5 seconds overhead - but then the code won't blow up if the checked cells have the slightest chance of containing some #REF! or #VALUE! error.
Every time I ran it, ~4000 rows ended up being deleted.
Note:
No implicit ActiveSheet references. The code works against Sheet2, which is the code name for Worksheets("Sheet2") - a globally scoped Worksheet object variable that you get for free for any worksheet that exists at compile-time. If the sheet you're running this against exists at compile-time, use its code name (that's the (Name) property in the Properties toolwindow / F4).
Range is hard-coded. You already know how to get the last row with data, so I didn't bother with that. You'll want to dump your working range in a variant array nonetheless.
The commented-out code can be ignored/deleted if there's no way any of the cells involved have any chance of ever containing a worksheet error value.
Public Sub SpeedyConditionalDelete()
Dim startTime As Single
startTime = Timer
'1. dump the contents into a 2D variant array
Dim contents As Variant
contents = Sheet2.Range("A1:B36000").Value2
'2. declare your to-be-deleted range
Dim target As Range
'3. iterate the array
Dim i As Long
For i = LBound(contents, 1) To UBound(contents, 1)
'4. get the interesting current value
'Dim currentValue As Variant
'currentValue = contents(i, 1)
'5. validate that the value is usable
'If Not IsError(currentValue) Then
'6. determine if that row is up for deletion
If contents(i, 1) = 0 Or contents(i, 1) = vbNullString Then
'7. append to target range
If target Is Nothing Then
Set target = Sheet2.Cells(i, 1)
Else
Set target = Union(target, Sheet2.Cells(i, 1))
End If
End If
'End If
Next
'8. delete the target
If Not target Is Nothing Then target.EntireRow.Delete
'9. output timer
Debug.Print Timer - startTime
End Sub
Of course 375K rows will run much longer than 32-38 seconds, but I can't think of a faster solution.

ActiveCell.EntireRow.Delete

I have this VBA code to delete rows in excel
Sub deleterows()
i = 1
Do Until i = 150000
If ActiveCell.Value = False Then
ActiveCell.EntireRow.Delete
End If
ActiveCell.Offset(1, 0).Activate
i = i + 1
Loop
End Sub
However this code is not deleting all the rows that contain the "False" value, I've been trying to change it to activecell.value="" and activecell.value=vbnullstring but still it does not deletes all blank rows
You should move from the last row to the top, if you're deleting rows.
Also, it's best to avoid using ActiveCell.
Sub deleterows2()
i = 1
For i = 150000 To 1 Step -1
If Cells(i, 1).Value = False Or Cells(i, 1).Value = "False" Then
Rows(i).EntireRow.Delete
End If
Next i
End Sub
Tweak as needed. I'm assuming your column A has the cells you're checking for. If it's another column, just use that column's index number in the Cells(i,1). So if you need to check column D, use Cells(i,4)
You can fix it with a small change as follows:
If ActiveCell.Value = False Then
ActiveCell.EntireRow.Delete
Else
ActiveCell.Offset(1, 0).Activate
End If
Basically, you should only activate the next sell when the value is != False, otherwise it will skip rows.
Here's a handful of nice things baked in to what it looks like what you want to accomplish.
I'm assuming that 150000 is basically just a big number so that you are confident that all used rows are being considered.
Sub DeleteRows()
i = 0
Do While ActiveCell.Offset(i, 0).Row <= ActiveCell.End(xlDown).Row
'This only considers used rows - much better than considering 15,000 rows even if you're only using 100
If ActiveCell.Offset(i, 0).Value <> 1 Then
'If you directly have a boolean value (i.e. 'True', 'False', or '0','1', you do not need to compare to another value. If your cells contain text, compare with the string in quotes (i.e. ...= "False")
ActiveCell.Offset(i, 0).Delete
Else: i = i + 1
End If
'Don't have to activate the next cell because we're referencing off of a fixed cell
Loop
End Sub

Runtime error = '1004'

I'm trying to write to a program that will indicate an alphabet grade to each student in accord with their marks. Basically, the indicated grade will be shown in column C. From the data table, John will receive a grade "A" as his mark is higher than or equal to 85. Plus, the cell of his grade will be filled green and center aligned. However, for students whose mark is lower than 35, the students will receive a grade "F" in column C. Moreover, the whole row(Column A,B,C) of that student will be filled red. When I try to run the program, I get a
run time error = '1004'.
Does anyone know how to solve the problem?
**Also is my method of counting the row and column with data right?
Sub Green()
Dim Mark As Integer
Dim c As Integer
Dim r As Integer
Dim i As Integer
Dim j As Integer
r = Cells(Rows.Count, 1).End(xlUp).row
c = (Selection.End(xlToRight).row) / r
For i = 0 To r
For j = 0 To c
Mark = Range(i, "B").Value
If Mark >= 85 And Mark <= 100 Then
Range(i, "C").Value = "A"
Range(i, "C").Interior.Color = RGB(0, 255, 0)
Range(i, "C").HorizontalAlignment = xlCenter
ElseIf Mark >= 0 And Mark <= 35 Then
Range(i, "C").Value = "F"
Range(i, j).Interior.Color = RGB(255, 0, 0)
Range(i, "C").HorizontalAlignment = xlCenter
Else
End If
Next j
Next i
MsgBox ("Rows = " & r)
MsgBox ("Columns =" & c)
End Sub
My recommendation is that you handle conditional formatting to the application's built-in features. If you disagree, feel free to modify this code.
Function LastRow(ByVal ws As Worksheet, Optional ByVal col As Variant = 1) As Long
With ws
LastRow = .Cells(.Rows.Count, col).End(xlUp).row
End With
End Function
Sub assignLetterGrade()
Dim ws As Worksheet, i As Long, ltr As String
Set ws = ThisWorkbook.Worksheets(1)
For i = 1 To LastRow(ws, "B")
Select Case ws.Cells(i, "B")
Case 85 To 100
ltr = "A"
Case 75 To 84
ltr = "B"
Case 65 To 74
ltr = "C"
Case 55 To 64
ltr = "D"
Case 0 To 54
ltr = "F"
Case Else
MsgBox "Unknown Grade! Aborting!"
Exit Sub
End Select
ws.Cells(i, "C").Value = ltr
Next
End Sub
For conditional Formatting, you could try these steps:
Select the entire column of which you plan to format conditionally
(in my example, it's column C.
Let's create a new rule.
Now enter your Rules, one at a time. In this example, I am highlighting all cells containing "A" in green.
The question remains. Why not format within VBA?
The first thing you should understand about VBA is that it's very inefficient. While minimal code can take fractions of a second, once your loops get larger and larger the more you will notice that it takes longer to complete its task.
Also, you run into issues where your code is not pristine and may have minor errors. Minor errors can turn into major errors when you start building your project (example: using ActiveSheet comes to mind. It might work for your small project, but next thing you know you are applying formats to the wrong sheet, referencing the wrong cell's value, so on and so forth).
If you can avoid VBA, then do it. Take advantage of Microsoft's excellent conditional formatting UI. It's very efficient and much easier to manage.
Also, again this entire VBA project could have been done without using VBA at all. VBA is slow. You could have just as easily used formulas in column C to provide you with the letter grade.
A less-advanced version of a formula is to use nested If Statements.
=IF(B1>=85,"A", IF(B1>=75,"B", IF(B1>=65,"C", IF(B1>=55,"D","F"))))

Need a better optimized code?

Need a much Optimized code.Well I Got a Project and I have Succefully made it work with the vba (Mostly helped by the stackoverflow programmers Thanks for that)
But Today I got a Feedback. Its deleting 2 more unique entries in the record But I dont know why its deleting Them.
The Algorithm I have applied
I have Used the COUNTIF function Which I found on google
="countif(A$1:A2,A3)=0" A3 is the active cell, Checks A2,A1 for dupes
It Throws False if there is a duplicate in The A column and True If it is a unique.What I have understood about Countif is that
It checks all the above columns values from that cell I mean let us take A4. SO it checks A2,A1,A3 for the duplicate. Similarly A10 checks for A1 to A9 and throws either TRue or False.Well It was working But I dont know what went wrong The code is not working for some entries.Its even showing False for the Unique entries sometimes.
And its taking more time to applye these formula as I have more amount of data. Im trying to make it cleaner and more Optimizing Way.People told me its not a c or some other Language to make it optimize but Im need of code that makes my code more optimized
I need code for these condtions can anyone help me as my countif failed.Im little helpless in doing so.
1)I have a column and I should check for duplicates in that column and delete that row if it is a duplicate
2) I have 35000 Old entries in the column and I have new entries 2000 everyweek these are appended. I need to check these 2000 entries from the total 37000 ( as we appened we get 35000+2000) and these delete operation need to be performed only on the newly appended 2000 entries but it should check the duplicates for entire column
Let me explain you clearly I have 2000 entries newly added,so Only these entries are to be checked for the duplicates from the 35000 entries and also from itself (2000 entries) and delete it if it is a duplicate and no duplicating operation should be performed on the 35000 entries old data.
I have found some codes but they are deleting even the duplicates of the 35000 entries. I have set the range but even though its not working.
Can anyone help me with the best code that takes less time?please thank you
Updating my question with the sample code I have
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
PTY 3945.678 2 2 PTY3945.67822
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
let us say these are old 35000 entries
Explaination to the above example.
The above are the 35000 entries. I have to check A,B,F,G,H,I columns for the dupes, if they are same I have to delete the row, I should not bother about the other columns c,d etc. so what I did is I have used one unused column Y and concatenated these 6 columns values into 1 at Y column using these
= A2 & B2 & F2 & G2 & H2 &I2 with the respective columns
Now checking the Y column for dupes and delete the entire row. as 2003 supports only for one column as far to my knowledge.
Notice that even the 35000 entries may have duplicates in it but I should not delete them. Example you can see the 2 and last row in my example code are dupes but I should not delete
as it is the old data.
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 2 2 PTY3945.67822 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
PTY 39868.5 4 2 540 3 PTY39868.5425403 'new
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
Now note that New entry PTY (from last 2nd) is a duplicate of the original record(PTY at first) So I hava to delete it.And the last new entry is a duplicate of the new entry itself so I should delete it even that . SO in the above code I have to delete only the last 2 rows which are dupes of original record and also from it . But should not delete the GTY which is the dupe but which is in orginal record.
I think I gave a clear view now. Is concatenating them into one cell . Is it better way to approach? as conactenatin for 40000 entries taking just 2 seconds i think that doesnt matter but any more algorithms to these is much aprreciated
I heard counif treats 45.00 and 45.00000 as different is that right may be that was the problem with it? since I have decimal points in my data. I think I should do
= I2 & H2 & G2 & F2 & A2 & B2
which is better to concatenate? is this or the other i posted before?
BIG UPDATE:
It think the original questions threw me off - there may be a problem with the logic in the question. The following assumes you want to delete the cell, not entire row, for the duplicate entries.
If the 35000 old records do not include duplicates, then all you need to do is remove all duplicates from the entire column - so long as you start from row 1, you run no risk of deleting any of the 'old' rows since no duplicates exist in them.
Here is one way:
Sub UniqueList()
Application.ScreenUpdating = False
Dim vArray As Variant
Dim i As Long, j As Long, lastrow As Long
Dim dictionary As Object
Set dictionary = CreateObject("scripting.dictionary")
lastrow = Range("A" & Rows.Count).End(xlUp).Row
vArray = Range("A1:A" & lastrow).Value
On Error Resume Next
For i = 1 To UBound(vArray, 1)
For j = 1 To UBound(vArray, 2)
If Len(vArray(i, j)) <> 0 Then
dictionary(vArray(i, j)) = 1
End If
Next
Next
Columns("A:A").ClearContents
Range("A1").Resize(dictionary.Count).Value = _
Application.Transpose(dictionary.keys)
Application.ScreenUpdating = True
End Sub
If for some odd reason the 35000 old records DO include dupes and you only want to allow these 35000 records to do so, then you can use 2 dictionaries, but this would be an unusual case since you'd be treating the old records differently than new...
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 1 To UBound(varray, 1)
If oldDict.exists(varray(i, 1)) = False Then
newDict.Add varray(i, 1), 1
End If
Next
'Delete and slap back on the unique list
Range("A35001", "A" & Rows.Count).ClearContents
Range("A35001").Resize(newDict.Count).Value = _
Application.Transpose(newDict.keys)
Application.ScreenUpdating = True
End Sub
Thanks to Reafidy for the advice and getting me to relook at this.
This is also a response to some of the comments and solutions made by other members so sorry if it does not straight away answer your question.
Firstly I believe that using excel in a database scenario that raw data and presentation data should be separated. This usually means a single worksheet with raw data and multiple other worksheets with presentation data. Then delete the raw data when necessary or archive.
When speed testing it is very difficult to get a level playing field in excel as there are many things that affect the results. Computer specs, available RAM etc.. Code must first be compiled before running any of the procedures. The test data is also important, when considering duplicates - how many duplicates vs how many rows. This sub loads some test data, altering the amount of rows vs the range of random numbers (duplicates) will give very different results for your code. I don't know what your data looks like so we are kind of working blind and your results may be very different.
'// This is still not very good test data, but should suffice for this situation.
Sub TestFill()
'// 300000 rows
For i = 1 To 300000
'// This populates a random number between 1 & 10000 - adjust to suit
Cells(i, "A").value = Int((100000 + 1) * Rnd + 1)
Next
End Sub
If we are talking about advanced filter vs an array & dictonary method then advanced filter will be quicker with a lower amount of rows but once you get above a certain amount of rows then the array method will be quicker. Then see what happens when you change the amount of duplicates.... :)
As a guideline or as a general rule using excels built in functions will be faster and I recommend always develop attempting to use these inbuilt functions, however there are often exceptions, like above when removing duplicates. :)
Deleting rows can be slow when looping if used incorrectly. If looping is used then it is important to keep synchronisation between code and the workbook out of the loop. This usually means read data to an array, loop through the data, then load the data from the array back to the presentation worksheet essentially deleting the unwanted data.
Sub RemoveDuplicatesA()
'// Copy raw data to presentation sheet
Range("A1", Cells(Rows.Count, "A").End(xlUp)).AdvancedFilter _
Action:=xlFilterCopy, CopyToRange:=Sheet2.Range("B1"), Unique:=True
End Sub
This will be the fastest method:
Sub RemoveDuplicatesB()
Dim vData As Variant, vArray As Variant
Dim lCnt As Long, lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(0 To UBound(vData, 1), 0)
lCnt = 0
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .Exists(vData(lRow, 1)) Then
vArray(lCnt, 0) = vData(lRow, 1): lCnt = lCnt + 1
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Copy raw data to presentation sheet
Sheet2.Range("B1").Resize(lCnt).value = vArray
End Sub
Application transpose has a limitation of 65536 rows but as you are using 2003 you should be fine using it, therefore you can simplify the above code with:
Sub RemoveDuplicatesC()
Dim vData As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
.Add vData(lRow, 1), Nothing
End If
Next lRow
'// Copy raw data to presentation sheet or replace raw data
Sheet2.Columns(2).ClearContents
Sheet2.Columns(2).Resize(.Count).value = Application.Transpose(.keys)
End With
End Sub
EDIT
Okay so #Issun has mentioned you want the entire row deleted. My suggestion was to improve your spreadsheet layout by having a raw data and presentation sheet which means you dont need to delete anything hence it would have been the fastest method. If you dont want to do that and would like to edit the raw data directly then try this:
Sub RemoveDuplicatesD()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(1 To UBound(vData, 1), 0)
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
varray(lRow, 0) = "x"
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
Application.ScreenUpdating = False
'// Modify the raw data
With ActiveSheet
.Columns(2).Insert
.Range("B1").Resize(lRow).value = vArray
.Columns(2).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
.Columns(2).Delete
End With
Application.ScreenUpdating = True
End Sub
Before starting again from scratch your whole code, here are a few things you can try:
Optimize your VBA
There are several tips on the web about optimizing vba. In particular, you can do:
'turn off some Excel functionality so your code runs faster
'these two are especially very efficient
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'use these if you really need to
Application.DisplayStatusBar = False
Application.EnableEvents = False
'code goes here
'at the end, restore the default behavior
'calculate the formulas
Application.Calculate
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.DisplayStatusBar = True
Application.EnableEvents = True
See here for more information
Optimize your algorithm
Especially when your inserting your COUNTIF formula, you can try to fill in instead of inserting the formula in each row.
On the deleting row part, you should try the solution I gave you in your previous thread: Delete duplicate entries in a column in excel 2003 vba to filter first on the True values and then to delete the visible cells. It is probably the fastest way.
[EDIT] Seems like Doc Brown's answer would be probably the best way to handle this (hey, this is a dictionary solution that wasn't written by Issun :)). Anyway, the VBA optimization tips are still relevant because this is quite a slow language.
OK, here's the advancedfilter method. Don't know if it is faster than the dictionary method. It would be interesting to know though, so let me know after you try it. I also included the delete portion so you would have to stop that portion if you want to do a true comparison. Also, you can make this a function instead of a sub and put in your variables, however you want to change it.
Sub DeleteRepeats()
Dim d1 As Double
Dim r1 As Range, rKeepers As Range
Dim wks As Worksheet
d1 = Timer
Set wks = ActiveSheet
Application.EnableEvents = False
Application.ScreenUpdating = False
'Make sure all rows are visible
On Error Resume Next
wks.ShowAllData
wks.UsedRange.Rows.Hidden = False
wks.UsedRange.Columns.Hidden = False
On Error GoTo 0
'Get concerned range
Set r1 = wks.Range("A1:A35000")
'Filter
r1.AdvancedFilter Action:=xlFilterInPlace, Unique:=True
'Get range of cells not to be deleted
Set rKeepers = r1.SpecialCells(xlCellTypeVisible)
On Error Resume Next
wks.ShowAllData
On Error GoTo 0
rKeepers.EntireRow.Hidden = True
'Delete all undesirables
r1.SpecialCells(xlCellTypeVisible).EntireRow.Delete
'show all rows
On Error Resume Next
wks.UsedRange.Rows.Hidden = False
On Error GoTo 0
Application.EnableEvents = False
Application.ScreenUpdating = False
Debug.Print Timer() - d1
End Sub
OK, here's a take on Doc's and Issun's use of Dictionaries. Before I wasn't convinced but after looking at it and testing it and comparing to advanced filter, I am convinced, dictionaries are better for this application. I don't know why Excel isn't faster on this point since they should be using faster algorithms, it's not the hiding, unhiding of the rows since that happens very quickly. So if anyone knows, let me know. This procedure takes just over 1 second on my slow computer:
Sub FindDupesAndDelete()
Dim d1 As Double
Dim dict As Object
Dim sh As Worksheet
Dim v1 As Variant
' Dim s1() As String
Dim rDelete As Range
Dim bUnion As Boolean
d1 = Timer()
bUnion = False
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
v1 = Application.Transpose(sh.Range("A1", "A" _
& sh.Cells.SpecialCells(xlCellTypeLastCell).row))
' ReDim s1(1 To UBound(v1))
Dim row As Long, value As String ', newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = v1(row)
If dict.Exists(value) Then
' newEntry = False
If bUnion Then
Set rDelete = Union(rDelete, sh.Range("A" & row))
Else
Set rDelete = sh.Range("A" & row)
bUnion = True
End If
Else
' newEntry = True
dict.Add value, 1
End If
' s1(row) = newEntry
Next
rDelete.EntireRow.Delete
' sh.Range("B1", "B" & UBound(v1)) = Application.Transpose(s1)
Debug.Print Timer() - d1
End Sub
Okay so now we have some more info here is a solution. It should execute almost instantly.
The code works by filling column y with your concatenate formula. It then adds all of column y to a dictionary and using the dictionary marks each row as a duplicate in column z. It then removes all the duplicates found after row 35000. Then finally it clears both column y and column z to remove the redundant data.
Sub RemoveDuplicates()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
'// Get used range of column A (excluding header) and offset to get column y
With ActiveSheet.Range("A2", Cells(Rows.Count, "A").End(xlUp)).Offset(, 24)
'// Adds the concatenate formula to the sheet column (y)
.FormulaR1C1 = "=RC[-24]&RC[-23]&RC[-19]&RC[-18]&RC[-17]&RC[-16]"
'// Adds the formula results to an array
vData = .Resize(, 1).value
End With
'// Re dimension the array to the correct size
ReDim vArray(1 To UBound(vData, 1), 0)
'// Create a dictionary object using late binding
With CreateObject("Scripting.Dictionary")
'// Loop through each row in the array
For lRow = 1 To UBound(vData, 1)
'// Check if value exists in the array
If Not .exists(vData(lRow, 1)) Then
'// Value does not exist mark as non duplicate.
vArray(lRow, 0) = "x"
'// Add value to dictionary
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Turn off screen updating to speed up code and prevent screen flicker
Application.ScreenUpdating = False
With ActiveSheet
'// Populate column z with the array
.Range("Z2").Resize(UBound(vArray, 1)) = vArray
'// Use error handling as speciallcells throws an error when none exist.
On Error Resume Next
'// Delete all blank cells in column z
.Range("Y35001", .Cells(Rows.Count, "Y").End(xlUp)).Offset(, 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
'// Remove error handling
On Error GoTo 0
'// Clear columns y and z
.Columns(25).Resize(, 2).ClearContents
End With
'// Turn screen updating back on.
Application.ScreenUpdating = True
End Sub
NOTE: you can change all references "activesheet" to your sheet codename if you want.
NOTE2: it assumes you have headers and has left row 1 alone.
I have used your columns and test data as best I can. Here is the test fill I used:
Sub TestFill()
For i = 1 To 37000
With Range("A" & i)
.value = Choose(Int(2 * Rnd + 1), "PTY", "GTY")
.Offset(, 1).value = Round((40000 * (Rnd + 1)), Choose(Int(4 * Rnd + 1), 1, 2, 3, 4))
.Offset(, 5).value = Int(4 * Rnd + 1)
.Offset(, 6).value = Int(2 * Rnd + 1)
.Offset(, 7).value = Choose(Int(2 * Rnd + 1), "230", "540")
.Offset(, 8).value = Int(3 * Rnd + 1)
End With
Next i
End Sub
Lets say you have your entries in column A, and you want the result of your formula in column B (but much faster). This VBA macro should do the trick:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
sh.Range("B" & row) = "False"
Else
sh.Range("B" & row) = "True"
dict.Add value, 1
End If
Next
End Sub
(Using a dictionary gives here almost linear running time, which should be a matter of seconds for 35.0000 rows, where your original formula had quadratic running time complexity).
Edit: due to your comment: you will have to fill the dictionary first by reading each entry at least once, that is something you cannot avoid easily. What you can avoid is to fill the rows of column B again when they are already filled:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String, newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
newEntry = False
Else
newEntry = True
dict.Add value, 1
End If
If Trim(sh.Range("B" & row)) = "" Then sh.Range("B" & row) = newEntry
Next
End Sub
But I suspect this won't be much faster than my first solution.
Now that you have updated that you want the entire rows deleted and that the first 35000 rows are allowed to have dupes, here is a function that will do that for you. I think I came up with a clever method and it's blazing fast, too:
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 35000 + UBound(varray, 1) To 35001 Step -1
If oldDict.exists(varray(i - 35000, 1)) = True Or _
newDict.exists(varray(i - 35000, 1)) = True Then
Range("A" & i).EntireRow.Delete
Else
newDict.Add varray(i - 35000, 1), 1
End If
Next
Application.ScreenUpdating = True
'A status message at the end for finishing touch
MsgBox UBound(varray, 1) - newDict.Count & _
" duplicate row(s) found and deleted."
End Sub
How it works:
First I store the 35000 cells into a dictionary file. Then I take a variant array of every cell 35001 onward and loop through them backwards to see if it's in the 35k dictionary or not, or that we haven't come across the value yet in the loop. If it finds that it's a dupe, it deletes the row.
The cool (if I may say) way that it does the row deletion is that when you create the varray, for say A35001 - A37000, it stores them as (1, 1) (2, 1) (...). So if you set "i" to the Ubound of the array + 35000 and step back to 35001, you will loop through all the additions backwardsfrom A37000 to A35001. Then when you want to delete the row, "i" is perfectly set to the row number the value was found in, so you can delete it. And since it goes backwards, it does not screw up the loop!