VBA solution for very slow Countif formula - vba

I have a solution that works in Excel with a countif formula (with the help of another Stackoverflow user).
Essentially what this countif formula does is count the first instance of an ID that exclusively exists with the classification type "DC". For example, as you can see in my snippet, 2232 is marked with as it is only exists with the classification "DC". Whilst in the case of 2240 it is marked as 0 as there are multiple classifications possible.
The formula in column D is the following:
=IF(IF(B2<>"DC",0,AND(COUNTIF(C$2:C$28,C2)=COUNTIF(A$2:A$28,A2),COUNTIF(A$2:A2,A2)=1)),1,0)
The problem that I am experiencing is that this is an extremely slow formula to process for Excel -- it takes roughly ~10-15 mins to complete. The database that I am running this on contains of roughly 150k~ lines.
I was wondering if it was possible to do this same process in VBA, but a lot faster and more efficient than the current processing time.
So I am using the following piece of VBA code to try to recreate the same results:
Sub MarkUniqueID()
Dim Ary As Variant, Nary As Variant
Dim r As Long
With ThisWorkbook.Sheets("sheet1")
Ary = .Range("A2", .Range("A" & Rows.Count).End(xlUp)).Value2
End With
ReDim Nary(1 To UBound(Ary), 1 To 1)
With CreateObject("scripting.dictionary")
For r = 1 To UBound(Ary)
If Not .Exists(Ary(r, 1)) Then
.Add Ary(r, 1), Nothing
Nary(r, 1) = 1
Else
Nary(r, 1) = 0
End If
Next r
End With
ThisWorkbook.Sheets("sheet1").Range("E2").Resize(r).Value = Nary
End Sub
Which runs the process much smoother it takes only a few ~seconds of my original time, however, I am not sure how I can add one more criteria into my array (i.e. only exclusively consider "DC"), as now the results are not what I want (see below).
Any pointers would be much appreciated!

You can use another dictionary to track which ID's should be excluded:
Sub MarkUniqueID()
Dim Ary As Variant, Nary() As Long, cls, id, k
Dim r As Long, dictIn As Object, dictOut As Object
Dim ws As Worksheet
Set dictIn = CreateObject("scripting.dictionary")
Set dictOut = CreateObject("scripting.dictionary")
Set ws = ThisWorkbook.Sheets("sheet1")
'pick up the classification and ID
Ary = ws.Range("B2:C" & ws.Cells(ws.Rows.Count, "B").End(xlUp).Row).Value
ReDim Nary(1 To UBound(Ary), 1 To 1)
For r = 1 To UBound(Ary, 1)
cls = Ary(r, 1)
id = CStr(Ary(r, 2))
If cls = "DC" Then
If Not dictIn.exists(id) Then dictIn.Add id, r
Else
If Not dictOut.exists(id) Then dictOut.Add id, True
End If
Next r
For Each k In dictIn
If Not dictOut.exists(k) Then Nary(dictIn(k), 1) = 1
Next k
ws.Range("E2").Resize(UBound(Nary, 1)).Value = Nary
End Sub

Related

Storing/Saving a dictionary even after excel is closed

I am having a dictionary, that holds userkey -> username references. (I am using it to store the user name after looking it up in the windows directory based on the current userkey, because I assume it is a very slow process and want to improve the performance)
If I got it right on my searching, my dictionary gets completely cleared when I reopen the excel file, correct?
So I want to save it to one of the sheets, where I want to recreate it from on the next session. (one column should hold the userkey, the other the name).
My code runs, but does not write any values in the fields:
'will store the values on the rule sheets in row 4 following, columns BA and BB
Sub SaveDictToRulesSheet(dict As Object)
'startrow of list on excel sheet
startrow = 4
Dim i As Integer
i = 0
ActiveSheet.Name = "Rules"
For Each key In dict.Keys
Worksheets("Rules").Cells(startrow + i, "BA").Value = key
Worksheets("Rules").Cells(startrow + i, "BB").Value = dict(key)
i = i + 1
Next key
i = 0
End Sub
Any help is greatly appreciated.
So I want to save it to one of the sheets, where I want to recreate it from on the next session. (one column should hold the userkey, the other the name).
Well that part seems fairly simple. What's a little confusing is where you read in your dict. You refer to it, but it's unclear to me where the values are being loaded in. I'm going to show you how I would do it. Hopefully that helps and I've understood the issue properly.
Write your dictionary columns to a blank / current workbook and save. Then create a new sub that operates something like this:
Sub Retrieve_Dict()
Set wbkCSV = Workbooks.Open("Template.xlsx")
Set wshCSV = wbkCSV.Worksheets("Rules")
Set dict = CreateObject("Scripting.Dictionary")
numrows = application.worksheetfunction.counta(wshCSV.Columns(27)) - 5
numcols = 2
set wshRange = wshCSV.Range("BA5").Resize(numrows,numcols)
tempArray = wshRange.value
for i = 1 to ubound(tempArray) ' Read rows, columns, send to dict.
dict.key(tempArray(i, 1)) = tempArray(i, 2)' read values.
Next i
tempArray = Process(dict) ' Func. updating dictionary values.
wshRange.value = tempArray
wbkCSV.Close (True)
End Sub
Of course, you can make the above sub a function if you instead open the workbook outside, then pass the worksheet. The function could return as an Object / Scripting.Dictionary depending on your binding.
Also, note, I may have gotten the offset / row count wrong. But the general principle should apply, I think.
The code bellow:
TestDictionaryOps() - tests writing and reading from sheet
DictionaryToRange() - writes dictionary to sheet
DictionaryFromRange() - reads dictionary from sheet
Paste it in a new standard module, and run it on a new sheet (Sheet4)
Option Explicit
Public Sub TestDictionaryOps()
Dim d As Dictionary
Set d = New Dictionary
d("1") = "a"
d("2") = "b"
d("3") = "c"
DictionaryToRange d, Sheet4
Set d = DictionaryFromRange(Sheet4)
If Not d Is Nothing Then MsgBox "Total Dictionary items: " & d.Count
End Sub
Public Sub DictionaryToRange(ByRef d As Dictionary, _
ByRef ws As Worksheet, _
Optional ByVal startCol As Long = 1)
If Not d Is Nothing And Not ws Is Nothing And startCol > 0 Then
Dim cnt As Long, rng1 As Range, rng2 As Range
cnt = d.Count
If cnt > 0 Then
Set rng1 = ws.Range(ws.Cells(1, startCol + 0), ws.Cells(cnt, startCol + 0))
Set rng2 = ws.Range(ws.Cells(1, startCol + 1), ws.Cells(cnt, startCol + 1))
rng1 = Application.Transpose(d.Keys) 'write all keys to column 1
rng2 = Application.Transpose(d.Items) 'write all items to column 2
Else
MsgBox "Empty Dictionary"
End If
Else
MsgBox "Missing Dictionary or WorkSheet"
End If
End Sub
Public Function DictionaryFromRange(ByRef ws As Worksheet, _
Optional ByVal startCol As Long = 1) As Dictionary
If Not ws Is Nothing And startCol > 0 Then
Dim d As Dictionary, cnt As Long, vArr As Variant, i As Long
Set d = New Dictionary
cnt = ws.UsedRange.Columns(startCol).Cells.Count
vArr = ws.Range(ws.Cells(1, startCol), ws.Cells(cnt, startCol + 1)).Value2
For i = 1 To cnt
d(vArr(i, startCol)) = vArr(i, startCol + 1)
Next
Set DictionaryFromRange = d
Else
MsgBox "Missing WorkSheet"
End If
End Function
Early binding (fast): VBA Editor -> Tools -> References -> Add Microsoft Scripting Runtime
Late binding (slow): CreateObject("Scripting.Dictionary")

Gather data tidy in Excel using VBA

What the case is:
So I got a "results sample" in excel format that needs filtering and reshaping to look nice. It is a result that will be not identical all the time but it follows similar rules. I have to filter it further and make it a little more tidy. I have figured out the filtering part, but I am not sure how to sort the remaining data, in a tidy way.
What the situation is:
There are six columns involved.
Notice: Real deal is not THAT simple, but what I need can be demonstrated using such a simple example and then I can manage more complex stuff myself I suppose.
For our example we use columns from B to G
The data are set as pairs of a "title" and a value.
For instance, if you look the first example picture I provide, The first detais the pair B3 and C3.
As you can see, looking at the same picture, D3 and E3 is an empty pair.
Same goes for D4 - E4 and F4 - G4 and so on until a last one at B11 - C11.
Starting data example:
[
What I want to achieve:
I would like, using Visual Basic for Applications, to sort the data, starting from let's say for our example B3 (see second picture) and fill three SETS of two columns, (BC, DE, FG) if there are no data inside those cells.
Notice: If a cell like D3 is null then SURELY E3 will be null too so there can be just only one check. I mean we can check either value columns or title columns.
Notice2: The B,D,F or C,E,G columns DON'T have to be sorted. I just want all the not-null values of B,D,F and their respective values from C,E,G gathered together neat so printing will not need 30 pages but just a few (too many spaces between is causing it and I try to automate the cleanup)
Here's something to start with. The first double loop populates a VBA Collection with Range variables that refer to the Cells that contain the titles.
The associated values are obtained by using an offset. The middle double loop performs a bubble sort on the latter (highly inefficient - you might want to replace it with something else). The next if statement creates a 2nd sheet if it doesn't exist on which to write out the results (last loop).
Option Explicit
Sub GatherData()
Dim lastRow As Integer, lastCol As Integer
Dim r As Integer, c As Integer
Dim vals As Collection
Set vals = New Collection
With Sheets(1)
lastCol = .UsedRange.Columns(.UsedRange.Columns.Count).Column
lastRow = .UsedRange.Rows(.UsedRange.Rows.Count).row
For c = 1 To lastCol Step 2
For r = 1 To lastRow
If (Trim(Cells(r, c).Value) <> "") Then
vals.Add .Cells(r, c)
End If
Next
Next
End With
' Bubble Sort
Dim i As Integer, j As Integer
Dim vTemp As Range
For i = 1 To vals.Count - 1
For j = i + 1 To vals.Count
If vals(i).Value > vals(j).Value Then
Set vTemp = vals(j)
vals.Remove j
vals.Add vTemp, vTemp, i
End If
Next j
Next i
Dim sht2 As Worksheet
If ThisWorkbook.Worksheets.Count = 1 Then
Set sht2 = ThisWorkbook.Sheets.Add(After:=ThisWorkbook.Sheets(1))
Else
Set sht2 = Worksheets(2)
End If
With sht2
r = 3
c = 2
For i = 1 To vals.Count
.Cells(r, c).Value = vals(i).Value
.Cells(r, c + 1).Value = vals(i).Offset(, 1).Value
c = c + 2
If c = 8 Then
r = r + 1
c = 2
End If
Next
End With
End Sub
Here is a method using the Dictionary object. I use early binding which requires setting a reference to Microsoft Scripting Runtime. If you are going to be distributing this, you might want to convert this to late-binding.
We assume that your data is properly formed as you show it above. In other words, all the titles are in even numbered columns; and the results are in the adjacent cell.
We create the dictionary using the Title as the Key, and the adjacent cell value for the Dictionary item.
We collect the information
Transfer the Keys to a VBA array and sort alphabetically
create a "Results Array" and populate it in order
write the results to a worksheet.
I will leave formatting and header generation to you.
By the way, there is a constant in the code for the number of Title/Value pair columns. I have set it to 3, but you can vary that.
Enjoy
Option Explicit
Option Compare Text 'If you want the sorting to be case INsensitive
'set reference to Microsoft Scripting Runtime
Sub TidyData()
'Assume Titles are in even numbered columns
'Assume want ColPairs pairs of columns for output
'Use dictionary with Title as key, and Value as the item
Dim dctTidy As Dictionary
Dim arrKeys As Variant
Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
Dim vSrc As Variant, vRes As Variant
Dim LastRow As Long, LastCol As Long
Dim I As Long, J As Long, K As Long, L As Long
Dim V As Variant
'in Results
Const ColPairs As Long = 3
'Set Source and results worksheet and range
Set wsSrc = Worksheets("sheet1")
Set wsRes = Worksheets("sheet2")
Set rRes = wsRes.Cells(1, 2)
'Read source data into variant array
With wsSrc.Cells
LastRow = .Find(what:="*", after:=.Item(1, 1), _
LookIn:=xlValues, searchorder:=xlByRows, searchdirection:=xlPrevious).Row
LastCol = .Find(what:="*", after:=.Item(1, 1), _
LookIn:=xlValues, searchorder:=xlByColumns, searchdirection:=xlPrevious).Column
vSrc = .Range(.Cells(1, 1), .Cells(LastRow, LastCol))
End With
'Collect the data into a dictionary
Set dctTidy = New Dictionary
For I = 1 To UBound(vSrc, 1)
For J = 2 To UBound(vSrc, 2) Step 2
If vSrc(I, J) <> "" Then _
dctTidy.Add Key:=vSrc(I, J), Item:=vSrc(I, J + 1)
Next J
Next I
'For this purpose, we can do a simple sort on the dictionary keys,
' and then create our results array in the sorted order.
arrKeys = dctTidy.Keys
Quick_Sort arrKeys, LBound(arrKeys), UBound(arrKeys)
'Create results array
ReDim vRes(1 To WorksheetFunction.RoundUp(dctTidy.Count / ColPairs, 0), 1 To ColPairs * 2)
I = 0
J = 0
For Each V In arrKeys
K = Int(I / ColPairs) + 1
L = (J Mod ColPairs) * 2 + 1
vRes(K, L) = V
vRes(K, L + 1) = dctTidy(V)
I = I + 1
J = J + 1
Next V
'write the results
Set rRes = rRes.Resize(UBound(vRes, 1), UBound(vRes, 2))
With rRes
.Worksheet.Cells.Clear
.Value = vRes
.HorizontalAlignment = xlCenter
End With
End Sub
Sub Quick_Sort(ByRef SortArray As Variant, ByVal first As Long, ByVal last As Long)
Dim Low As Long, High As Long
Dim Temp As Variant, List_Separator As Variant
Low = first
High = last
List_Separator = SortArray((first + last) / 2)
Do
Do While (SortArray(Low) < List_Separator)
Low = Low + 1
Loop
Do While (SortArray(High) > List_Separator)
High = High - 1
Loop
If (Low <= High) Then
Temp = SortArray(Low)
SortArray(Low) = SortArray(High)
SortArray(High) = Temp
Low = Low + 1
High = High - 1
End If
Loop While (Low <= High)
If (first < High) Then Quick_Sort SortArray, first, High
If (Low < last) Then Quick_Sort SortArray, Low, last
End Sub
Assuming we got all variables set and initialized properly, in this example:
Sheets("sheetname").Select ' because stupid things can happen...
For i = 3 To 13
Let newrangeT = "B" & i '
Let newrangeV = "C" & i '
If Sheets("sheetname").Range(newrangeV) <> "" Then
values(Position) = Sheets("sheetname").Range(newrangeV)
titles(Position) = Sheets("sheetname").Range(newrangeT)
Position = Position + 1
Else
' Don't do anything if the fields are null
End If
Next i
Sheets("sheetname").Range("B1:G13").Clear
' We then get each data from the arrays with a For loop.
' We set a columnset variable to 1.
' We set a currentrow variable to 3.
' If columnset is 1 data will enter in B and C and columnset = columnset +1
' Then if columnset is 2 we set data to DE and columnset = columnset +1
' But if columnset is 2we set data to FG and columnset = 1 and currentrow = currentrow +1
' Iterating the arrays will result in a neat setting of the data, but it will add zeros for all the nulls. Thus we need an If statement that will exclude that values checking the TITLE array (that should contain a title instead). if the value is not 0 then... we run what I describe, otherwise we do nothing.
Putting the data in the array is half of the trick.
Then we clear the area.
We set two string variables to declare ranges (actually cell reference) for every cell iterated in the loop. Here I demonstrated only for column set B,C
but we have to do the same for the rest of the columns.
The If statement here checks for null. You might have different needs, so changing the if statement changes the filtering. Here I check if the cells are not null. If the cells of column C contain data, put those data in values array and the respective B data on titles array but where? Position starts as 1 and we then iterate it +1 each time it adds something.
You can set data from an array using this command:
' current_row is set to the first row of the spreadsheet we wanna fill.
Sheets("sheetname").Select ' because stupid things can happen...
newrangeV = "C" & current_row
Sheets("sheetname").Range(newrangeV) = values(j)
The rest is just putting things together.
In any case, I wanna thank both of the people involved in this question, because I might didn't got the solution, but I got an idea of how to do other stuff, like accidentally learning something new. Cheers.

COUNTIF() in 'For' loop

I have a column with nearly 100k and am trying to determine how many times a value occurs repeatedly in that column. I can do it row by row currently, but this is menial as a programmer, through something like =COUNTIF(D:D,D2). Yet that only returns D2 matches in column D.
I need to iterate through all values of D returning countif, therefore revealing all of the values repetitions in the column. I can remove duplicates later! So I have a dev. button a basic sub, or function (man this is new to me) and something along the lines of the most basic for loop ever. Just getting caught up on how to implement the COUNTIF() to to the loop properly.
Right now I'm looking at:
Sub doloop()
Dim i As Integer
i = 1
Do While i < D.Length
Cells(i, 8).Value =CountIf(D:D,D[i])
i = i + 1
Loop
End Sub
That code is incorrect obviously but it is where I'm at and may help for anyone more familiar with other languages.
Use Application.WorksheetFunction.CountIf() in your loop.
Private Sub doloop()
Dim lastRow As Long
Dim d As Double
Dim r As Range
Dim WS As Excel.Worksheet
Dim strValue As String
Dim lRow As Long
'Build your worksheet object
Set WS = ActiveWorkbook.Sheets("sheet1")
'Get the last used row in column A
lastRow = WS.Cells(WS.Rows.count, "D").End(xlUp).Row
'Build your range object to be searched
Set r = WS.Range("D1:D" & lastRow)
lRow = 1
WS.Activate
'Loop through the rows and do the search
Do While lRow <= lastRow
'First, get the value we will search for from the current row
strValue = WS.Range("D" & lRow).Value
'Return the count from the CountIf() worksheet function
d = Application.worksheetFunction.CountIf(r, strValue)
'Write that value to the current row
WS.Range("H" & lRow).Value = d
lRow = lRow + 1
Loop
End Sub
I believe you are trying to write the value to the cell, that is what the above does. FYI, if you want to put a formula into the cell, here is how that is done. Use this in place of WS.Range("H" & lRow).Value = d
WS.Range("H" & lRow).Formula = "=CountIf(D:D, D" & lRow & ")"
Sounds like you may want to look into using tables in Excel and capitalizing on their features like filtering and equation autofill. You may also be interested in using a PivotTable to do something very similar to what you're describing.
If you really want to go about this the programmatic way, I think the solution Matt gives answers your question about how to do this using CountIf. There's a big detriment to using CountIf though, in that it's not very computationally efficient. I don't think the code Matt posted will really be practical for processing the 100K rows mentioned in the OP (Application.ScreenUpdating = false would help some). Here's an alternative method that's a lot more efficient, but less intuitive, so you'll have to decide what suites your needs and what you feel conformable with.
Sub CountOccurances()
'Define Input and Output Ranges
'The best way to do this may very from case to case,
'So it should be addressed seperately
'Right now we'll assume current sheet rows 1-100K as OP specifies
Dim RInput, ROutput As Range
Set RInput = Range("D1:D100000")
Set ROutput = Range("E1:E100000")
'Define array for housing and processing range values
Dim A() As Variant
ReDim A(1 To RInput.Rows.Count, 0)
'Use Value2 as quicker more accurate value
A = RInput.Value2
'Create dictionary object
Set d = CreateObject("Scripting.Dictionary")
'Loop through array, adding new values and counting values as you go
For i = 1 To UBound(A)
If d.Exists(A(i, 1)) Then
d(A(i, 1)) = d(A(i, 1)) + 1
Else
d.Add A(i, 1), 1
End If
Next
'Overwrite original array values with count of that value
For i = 1 To UBound(A)
A(i, 1) = d(A(i, 1))
Next
'Write resulting array to output range
ROutput = A
End Sub
You can also modify this to include the removal of replicates you mentioned.
Sub CountOccurances_PrintOnce()
'Define Input and Output Ranges
'The best way to do this may very from case to case,
'So it should be addressed seperately
'Right now we'll assume current sheet rows 1-100K as OP specifies
Dim RInput, ROutput As Range
Set RInput = Range("D1:D100000")
Set ROutput = Range("F1:F9")
'Define array for housing and processing range values
Dim A() As Variant
ReDim A(1 To RInput.Rows.Count, 0)
'Use Value2 as quicker more accurate value
A = RInput.Value2
'Create dictionary object
Set d = CreateObject("Scripting.Dictionary")
'Loop through array, adding new values and counting values as you go
For i = 1 To UBound(A)
If d.Exists(A(i, 1)) Then
d(A(i, 1)) = d(A(i, 1)) + 1
Else
d.Add A(i, 1), 1
End If
Next
'Print results to VBA's immediate window
Dim sum As Double
For Each K In d.Keys
Debug.Print K & ": " & d(K)
sum = sum + d(K)
Next
Debug.Print "Total: " & sum
End Sub

Reverse order of For Each loop

One of the most powerful things about VB is ability to loop through objects in a collection WITHOUT referring to the index - for each loop.
I find it very useful only want to remove objects from a collection.
When doing removing objects from a predefined such as rows on a spread sheet the code is simpler if I use indexing and start at the largest and work back to the first. (Step -1 with an iterator) (otherwise requires an offset as the For each moves the enumerator pointer back to the previous object once the active one is deleted)
eg.
For intA = 10 to 1 step -1
' ...
Next
What about when using a For Each | Next
eg.
For each rngCell in Selection.Cells
' ...
Next
How could I loop backwards using the for each loop syntax?
It's not possible to loop backwards using the for each loop syntax.
As an alternative you can use a For i = a To 1 Step -1 loop:
Sub reverseForEach()
Dim i As Long, rng As Range
Set rng = ActiveSheet.Range("A1:B2")
For i = rng.Cells.Count To 1 Step -1
Debug.Print rng.item(i).Address
' Or shorthand rng(i) as the Item property
' is the default property for the Range object.
' Prints: $B$2, $A$2, $B$1, $A$1
Next i
End Sub
This works with all collections that have the Item property. For instance Worksheets, Areas or Shapes.
Note: The order of the loop when using on the Range object is from right to left, then up.
For built in collections (eg a Range) the short answer is: you can't. For user defined collections the answer linked by #VBlades might be useful, although the cost might outweigh the benifit.
One work around is to seperate the identification of items to be removed from the actual removal. Eg, for a range, build up a new range variable using Union, then process that variable, eg delete all the rows in one go. For the Range example, you can also take advantage of the Variant Array method to further speed things up.
Whether or not any of this is useful will depend on your actual use case.
There are other good answers but here's another alternative method of "stepping backwards" through a Range.
Function to Invert Range into Array
This function returns a "backwards Range Array" that can be used with For..Each:
Function ReverseRange(rg As Range) As Range()
Dim arr() As Range, r As Long, c As Long, n As Long
With rg
ReDim arr(1 To .Cells.Count) 'resize Range Array
For r = .Cells(.Rows.Count, 1).Row To .Cells(1, 1).Row Step -1
For c = .Cells(1, .Columns.Count).Column To .Cells(1, 1).Column Step -1
n = n + 1
Set arr(n) = .Worksheet.Cells(r, c) 'set cell in Array
Next c
Next r
End With
ReverseRange = arr 'return Range Array as function result
End Function
Example Usage:
Sub test()
Dim oCell
For Each oCell In ReverseRange(ActiveSheet.Range("E5:A1"))
Debug.Print oCell.Address 'do something here with each cell
Next oCell
End Sub
use a second variable that is set as your wanted counter and use this one in your code
'ex: Loop from n = 19 to 16
For i = 0 To 3
n = 19 - i
'your code here using n as the counter
Next
Only for Range collections. They are more complicated if they have more than 1 Area.
Basically there are two loops, the first one keeps the index of all the cells in an array and the second one creates a union of ranges from back to front
Option Explicit
Private Sub Main()
Dim InvertedRange As Range
Set InvertedRange = InvertRange(Application.Union(ActiveSheet.Range("A1:A2"), _
ActiveSheet.Range("F6:F7"), ActiveSheet.Range("E4:F5"), ActiveSheet.Range("E1")))
Dim ActualRange As Range
For Each ActualRange In InvertedRange
Debug.Print (ActualRange.Address(False, False) & " : " & ActualRange.Value)
Next ActualRange
End Sub
Public Function InvertRange(ByVal rngRange_I As Range) As Range
Dim RangesArray() As Long
ReDim RangesArray(1 To rngRange_I.Count, 1 To rngRange_I.Count)
Dim ActualArea As Range
Dim ActualRange As Range
Dim ArrayIndex As Long
For Each ActualArea In rngRange_I.Areas
For Each ActualRange In ActualArea
ArrayIndex = ArrayIndex + 1
RangesArray(ArrayIndex, 1) = ActualRange.Row
RangesArray(ArrayIndex, 2) = ActualRange.Column
Next ActualRange
Next ActualArea
Dim ActualRow As Long
Dim ActualColumn As Long
ActualRow = RangesArray(UBound(RangesArray, 1), 1)
ActualColumn = RangesArray(UBound(RangesArray, 2), 2)
With rngRange_I.Worksheet
Dim InvertedRange As Range
Set InvertedRange = .Cells(ActualRow, ActualColumn)
For ArrayIndex = UBound(RangesArray, 1) To LBound(RangesArray, 1) Step -1
ActualRow = RangesArray(ArrayIndex, 1)
ActualColumn = RangesArray(ArrayIndex, 2)
Set InvertedRange = Application.Union(InvertedRange, _
.Cells(ActualRow, ActualColumn))
Next ArrayIndex
End With
Set InvertRange = InvertedRange
End Function
You can use a stack (LIFO data structure) for inventing your list and the code would be something like this:
Dim aStack as Object
Set aStack = CreateObject("System.Collections.Stack")
For Each arngCell in Selection.Cells
aStack.Push(arngCell)
Next
While aStack.Count > 0
rngCell = aStack.Pop
' ...
End While
Set stack = Nothing

Need a better optimized code?

Need a much Optimized code.Well I Got a Project and I have Succefully made it work with the vba (Mostly helped by the stackoverflow programmers Thanks for that)
But Today I got a Feedback. Its deleting 2 more unique entries in the record But I dont know why its deleting Them.
The Algorithm I have applied
I have Used the COUNTIF function Which I found on google
="countif(A$1:A2,A3)=0" A3 is the active cell, Checks A2,A1 for dupes
It Throws False if there is a duplicate in The A column and True If it is a unique.What I have understood about Countif is that
It checks all the above columns values from that cell I mean let us take A4. SO it checks A2,A1,A3 for the duplicate. Similarly A10 checks for A1 to A9 and throws either TRue or False.Well It was working But I dont know what went wrong The code is not working for some entries.Its even showing False for the Unique entries sometimes.
And its taking more time to applye these formula as I have more amount of data. Im trying to make it cleaner and more Optimizing Way.People told me its not a c or some other Language to make it optimize but Im need of code that makes my code more optimized
I need code for these condtions can anyone help me as my countif failed.Im little helpless in doing so.
1)I have a column and I should check for duplicates in that column and delete that row if it is a duplicate
2) I have 35000 Old entries in the column and I have new entries 2000 everyweek these are appended. I need to check these 2000 entries from the total 37000 ( as we appened we get 35000+2000) and these delete operation need to be performed only on the newly appended 2000 entries but it should check the duplicates for entire column
Let me explain you clearly I have 2000 entries newly added,so Only these entries are to be checked for the duplicates from the 35000 entries and also from itself (2000 entries) and delete it if it is a duplicate and no duplicating operation should be performed on the 35000 entries old data.
I have found some codes but they are deleting even the duplicates of the 35000 entries. I have set the range but even though its not working.
Can anyone help me with the best code that takes less time?please thank you
Updating my question with the sample code I have
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
PTY 3945.678 2 2 PTY3945.67822
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
let us say these are old 35000 entries
Explaination to the above example.
The above are the 35000 entries. I have to check A,B,F,G,H,I columns for the dupes, if they are same I have to delete the row, I should not bother about the other columns c,d etc. so what I did is I have used one unused column Y and concatenated these 6 columns values into 1 at Y column using these
= A2 & B2 & F2 & G2 & H2 &I2 with the respective columns
Now checking the Y column for dupes and delete the entire row. as 2003 supports only for one column as far to my knowledge.
Notice that even the 35000 entries may have duplicates in it but I should not delete them. Example you can see the 2 and last row in my example code are dupes but I should not delete
as it is the old data.
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 2 2 PTY3945.67822 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
PTY 39868.5 4 2 540 3 PTY39868.5425403 'new
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
Now note that New entry PTY (from last 2nd) is a duplicate of the original record(PTY at first) So I hava to delete it.And the last new entry is a duplicate of the new entry itself so I should delete it even that . SO in the above code I have to delete only the last 2 rows which are dupes of original record and also from it . But should not delete the GTY which is the dupe but which is in orginal record.
I think I gave a clear view now. Is concatenating them into one cell . Is it better way to approach? as conactenatin for 40000 entries taking just 2 seconds i think that doesnt matter but any more algorithms to these is much aprreciated
I heard counif treats 45.00 and 45.00000 as different is that right may be that was the problem with it? since I have decimal points in my data. I think I should do
= I2 & H2 & G2 & F2 & A2 & B2
which is better to concatenate? is this or the other i posted before?
BIG UPDATE:
It think the original questions threw me off - there may be a problem with the logic in the question. The following assumes you want to delete the cell, not entire row, for the duplicate entries.
If the 35000 old records do not include duplicates, then all you need to do is remove all duplicates from the entire column - so long as you start from row 1, you run no risk of deleting any of the 'old' rows since no duplicates exist in them.
Here is one way:
Sub UniqueList()
Application.ScreenUpdating = False
Dim vArray As Variant
Dim i As Long, j As Long, lastrow As Long
Dim dictionary As Object
Set dictionary = CreateObject("scripting.dictionary")
lastrow = Range("A" & Rows.Count).End(xlUp).Row
vArray = Range("A1:A" & lastrow).Value
On Error Resume Next
For i = 1 To UBound(vArray, 1)
For j = 1 To UBound(vArray, 2)
If Len(vArray(i, j)) <> 0 Then
dictionary(vArray(i, j)) = 1
End If
Next
Next
Columns("A:A").ClearContents
Range("A1").Resize(dictionary.Count).Value = _
Application.Transpose(dictionary.keys)
Application.ScreenUpdating = True
End Sub
If for some odd reason the 35000 old records DO include dupes and you only want to allow these 35000 records to do so, then you can use 2 dictionaries, but this would be an unusual case since you'd be treating the old records differently than new...
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 1 To UBound(varray, 1)
If oldDict.exists(varray(i, 1)) = False Then
newDict.Add varray(i, 1), 1
End If
Next
'Delete and slap back on the unique list
Range("A35001", "A" & Rows.Count).ClearContents
Range("A35001").Resize(newDict.Count).Value = _
Application.Transpose(newDict.keys)
Application.ScreenUpdating = True
End Sub
Thanks to Reafidy for the advice and getting me to relook at this.
This is also a response to some of the comments and solutions made by other members so sorry if it does not straight away answer your question.
Firstly I believe that using excel in a database scenario that raw data and presentation data should be separated. This usually means a single worksheet with raw data and multiple other worksheets with presentation data. Then delete the raw data when necessary or archive.
When speed testing it is very difficult to get a level playing field in excel as there are many things that affect the results. Computer specs, available RAM etc.. Code must first be compiled before running any of the procedures. The test data is also important, when considering duplicates - how many duplicates vs how many rows. This sub loads some test data, altering the amount of rows vs the range of random numbers (duplicates) will give very different results for your code. I don't know what your data looks like so we are kind of working blind and your results may be very different.
'// This is still not very good test data, but should suffice for this situation.
Sub TestFill()
'// 300000 rows
For i = 1 To 300000
'// This populates a random number between 1 & 10000 - adjust to suit
Cells(i, "A").value = Int((100000 + 1) * Rnd + 1)
Next
End Sub
If we are talking about advanced filter vs an array & dictonary method then advanced filter will be quicker with a lower amount of rows but once you get above a certain amount of rows then the array method will be quicker. Then see what happens when you change the amount of duplicates.... :)
As a guideline or as a general rule using excels built in functions will be faster and I recommend always develop attempting to use these inbuilt functions, however there are often exceptions, like above when removing duplicates. :)
Deleting rows can be slow when looping if used incorrectly. If looping is used then it is important to keep synchronisation between code and the workbook out of the loop. This usually means read data to an array, loop through the data, then load the data from the array back to the presentation worksheet essentially deleting the unwanted data.
Sub RemoveDuplicatesA()
'// Copy raw data to presentation sheet
Range("A1", Cells(Rows.Count, "A").End(xlUp)).AdvancedFilter _
Action:=xlFilterCopy, CopyToRange:=Sheet2.Range("B1"), Unique:=True
End Sub
This will be the fastest method:
Sub RemoveDuplicatesB()
Dim vData As Variant, vArray As Variant
Dim lCnt As Long, lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(0 To UBound(vData, 1), 0)
lCnt = 0
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .Exists(vData(lRow, 1)) Then
vArray(lCnt, 0) = vData(lRow, 1): lCnt = lCnt + 1
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Copy raw data to presentation sheet
Sheet2.Range("B1").Resize(lCnt).value = vArray
End Sub
Application transpose has a limitation of 65536 rows but as you are using 2003 you should be fine using it, therefore you can simplify the above code with:
Sub RemoveDuplicatesC()
Dim vData As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
.Add vData(lRow, 1), Nothing
End If
Next lRow
'// Copy raw data to presentation sheet or replace raw data
Sheet2.Columns(2).ClearContents
Sheet2.Columns(2).Resize(.Count).value = Application.Transpose(.keys)
End With
End Sub
EDIT
Okay so #Issun has mentioned you want the entire row deleted. My suggestion was to improve your spreadsheet layout by having a raw data and presentation sheet which means you dont need to delete anything hence it would have been the fastest method. If you dont want to do that and would like to edit the raw data directly then try this:
Sub RemoveDuplicatesD()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(1 To UBound(vData, 1), 0)
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
varray(lRow, 0) = "x"
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
Application.ScreenUpdating = False
'// Modify the raw data
With ActiveSheet
.Columns(2).Insert
.Range("B1").Resize(lRow).value = vArray
.Columns(2).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
.Columns(2).Delete
End With
Application.ScreenUpdating = True
End Sub
Before starting again from scratch your whole code, here are a few things you can try:
Optimize your VBA
There are several tips on the web about optimizing vba. In particular, you can do:
'turn off some Excel functionality so your code runs faster
'these two are especially very efficient
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'use these if you really need to
Application.DisplayStatusBar = False
Application.EnableEvents = False
'code goes here
'at the end, restore the default behavior
'calculate the formulas
Application.Calculate
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.DisplayStatusBar = True
Application.EnableEvents = True
See here for more information
Optimize your algorithm
Especially when your inserting your COUNTIF formula, you can try to fill in instead of inserting the formula in each row.
On the deleting row part, you should try the solution I gave you in your previous thread: Delete duplicate entries in a column in excel 2003 vba to filter first on the True values and then to delete the visible cells. It is probably the fastest way.
[EDIT] Seems like Doc Brown's answer would be probably the best way to handle this (hey, this is a dictionary solution that wasn't written by Issun :)). Anyway, the VBA optimization tips are still relevant because this is quite a slow language.
OK, here's the advancedfilter method. Don't know if it is faster than the dictionary method. It would be interesting to know though, so let me know after you try it. I also included the delete portion so you would have to stop that portion if you want to do a true comparison. Also, you can make this a function instead of a sub and put in your variables, however you want to change it.
Sub DeleteRepeats()
Dim d1 As Double
Dim r1 As Range, rKeepers As Range
Dim wks As Worksheet
d1 = Timer
Set wks = ActiveSheet
Application.EnableEvents = False
Application.ScreenUpdating = False
'Make sure all rows are visible
On Error Resume Next
wks.ShowAllData
wks.UsedRange.Rows.Hidden = False
wks.UsedRange.Columns.Hidden = False
On Error GoTo 0
'Get concerned range
Set r1 = wks.Range("A1:A35000")
'Filter
r1.AdvancedFilter Action:=xlFilterInPlace, Unique:=True
'Get range of cells not to be deleted
Set rKeepers = r1.SpecialCells(xlCellTypeVisible)
On Error Resume Next
wks.ShowAllData
On Error GoTo 0
rKeepers.EntireRow.Hidden = True
'Delete all undesirables
r1.SpecialCells(xlCellTypeVisible).EntireRow.Delete
'show all rows
On Error Resume Next
wks.UsedRange.Rows.Hidden = False
On Error GoTo 0
Application.EnableEvents = False
Application.ScreenUpdating = False
Debug.Print Timer() - d1
End Sub
OK, here's a take on Doc's and Issun's use of Dictionaries. Before I wasn't convinced but after looking at it and testing it and comparing to advanced filter, I am convinced, dictionaries are better for this application. I don't know why Excel isn't faster on this point since they should be using faster algorithms, it's not the hiding, unhiding of the rows since that happens very quickly. So if anyone knows, let me know. This procedure takes just over 1 second on my slow computer:
Sub FindDupesAndDelete()
Dim d1 As Double
Dim dict As Object
Dim sh As Worksheet
Dim v1 As Variant
' Dim s1() As String
Dim rDelete As Range
Dim bUnion As Boolean
d1 = Timer()
bUnion = False
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
v1 = Application.Transpose(sh.Range("A1", "A" _
& sh.Cells.SpecialCells(xlCellTypeLastCell).row))
' ReDim s1(1 To UBound(v1))
Dim row As Long, value As String ', newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = v1(row)
If dict.Exists(value) Then
' newEntry = False
If bUnion Then
Set rDelete = Union(rDelete, sh.Range("A" & row))
Else
Set rDelete = sh.Range("A" & row)
bUnion = True
End If
Else
' newEntry = True
dict.Add value, 1
End If
' s1(row) = newEntry
Next
rDelete.EntireRow.Delete
' sh.Range("B1", "B" & UBound(v1)) = Application.Transpose(s1)
Debug.Print Timer() - d1
End Sub
Okay so now we have some more info here is a solution. It should execute almost instantly.
The code works by filling column y with your concatenate formula. It then adds all of column y to a dictionary and using the dictionary marks each row as a duplicate in column z. It then removes all the duplicates found after row 35000. Then finally it clears both column y and column z to remove the redundant data.
Sub RemoveDuplicates()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
'// Get used range of column A (excluding header) and offset to get column y
With ActiveSheet.Range("A2", Cells(Rows.Count, "A").End(xlUp)).Offset(, 24)
'// Adds the concatenate formula to the sheet column (y)
.FormulaR1C1 = "=RC[-24]&RC[-23]&RC[-19]&RC[-18]&RC[-17]&RC[-16]"
'// Adds the formula results to an array
vData = .Resize(, 1).value
End With
'// Re dimension the array to the correct size
ReDim vArray(1 To UBound(vData, 1), 0)
'// Create a dictionary object using late binding
With CreateObject("Scripting.Dictionary")
'// Loop through each row in the array
For lRow = 1 To UBound(vData, 1)
'// Check if value exists in the array
If Not .exists(vData(lRow, 1)) Then
'// Value does not exist mark as non duplicate.
vArray(lRow, 0) = "x"
'// Add value to dictionary
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Turn off screen updating to speed up code and prevent screen flicker
Application.ScreenUpdating = False
With ActiveSheet
'// Populate column z with the array
.Range("Z2").Resize(UBound(vArray, 1)) = vArray
'// Use error handling as speciallcells throws an error when none exist.
On Error Resume Next
'// Delete all blank cells in column z
.Range("Y35001", .Cells(Rows.Count, "Y").End(xlUp)).Offset(, 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
'// Remove error handling
On Error GoTo 0
'// Clear columns y and z
.Columns(25).Resize(, 2).ClearContents
End With
'// Turn screen updating back on.
Application.ScreenUpdating = True
End Sub
NOTE: you can change all references "activesheet" to your sheet codename if you want.
NOTE2: it assumes you have headers and has left row 1 alone.
I have used your columns and test data as best I can. Here is the test fill I used:
Sub TestFill()
For i = 1 To 37000
With Range("A" & i)
.value = Choose(Int(2 * Rnd + 1), "PTY", "GTY")
.Offset(, 1).value = Round((40000 * (Rnd + 1)), Choose(Int(4 * Rnd + 1), 1, 2, 3, 4))
.Offset(, 5).value = Int(4 * Rnd + 1)
.Offset(, 6).value = Int(2 * Rnd + 1)
.Offset(, 7).value = Choose(Int(2 * Rnd + 1), "230", "540")
.Offset(, 8).value = Int(3 * Rnd + 1)
End With
Next i
End Sub
Lets say you have your entries in column A, and you want the result of your formula in column B (but much faster). This VBA macro should do the trick:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
sh.Range("B" & row) = "False"
Else
sh.Range("B" & row) = "True"
dict.Add value, 1
End If
Next
End Sub
(Using a dictionary gives here almost linear running time, which should be a matter of seconds for 35.0000 rows, where your original formula had quadratic running time complexity).
Edit: due to your comment: you will have to fill the dictionary first by reading each entry at least once, that is something you cannot avoid easily. What you can avoid is to fill the rows of column B again when they are already filled:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String, newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
newEntry = False
Else
newEntry = True
dict.Add value, 1
End If
If Trim(sh.Range("B" & row)) = "" Then sh.Range("B" & row) = newEntry
Next
End Sub
But I suspect this won't be much faster than my first solution.
Now that you have updated that you want the entire rows deleted and that the first 35000 rows are allowed to have dupes, here is a function that will do that for you. I think I came up with a clever method and it's blazing fast, too:
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 35000 + UBound(varray, 1) To 35001 Step -1
If oldDict.exists(varray(i - 35000, 1)) = True Or _
newDict.exists(varray(i - 35000, 1)) = True Then
Range("A" & i).EntireRow.Delete
Else
newDict.Add varray(i - 35000, 1), 1
End If
Next
Application.ScreenUpdating = True
'A status message at the end for finishing touch
MsgBox UBound(varray, 1) - newDict.Count & _
" duplicate row(s) found and deleted."
End Sub
How it works:
First I store the 35000 cells into a dictionary file. Then I take a variant array of every cell 35001 onward and loop through them backwards to see if it's in the 35k dictionary or not, or that we haven't come across the value yet in the loop. If it finds that it's a dupe, it deletes the row.
The cool (if I may say) way that it does the row deletion is that when you create the varray, for say A35001 - A37000, it stores them as (1, 1) (2, 1) (...). So if you set "i" to the Ubound of the array + 35000 and step back to 35001, you will loop through all the additions backwardsfrom A37000 to A35001. Then when you want to delete the row, "i" is perfectly set to the row number the value was found in, so you can delete it. And since it goes backwards, it does not screw up the loop!