Automatic Grouping Excel VBA - vba

This Question has been answered, however I need help with one point. I am using the code provided in the answer, however I can not get the subgrouping, for the entirety of the document. Is such thing possible?
Section Index
1 1
+ 1.1 2
++ 1.1.1 3
+++1.1.1.1 4
+++1.1.1.2 4
+++1.1.1.3 4
++ 1.1.2 3
++ 1.1.3 3
+ 1.2 2
+ 1.3 2
2 1
NOTE: Plusses shows groups.
I have such table as above, where I have indexed the sections with sublevels. I am trying to group those section using excel group feature, however, I have over 3000 rows of data, so I am trying to automate the process. I have modified a Excel VBA macro I found here and got this code below.
Sub AutoGroupBOM()
'Define Variables
Dim StartCell As Range 'This defines the highest level of assembly, usually 1, and must be the top leftmost cell of concern for outlining, its our starting point for grouping'
Dim StartRow As Integer 'This defines the starting row to beging grouping, based on the row we define from StartCell'
Dim LevelCol As Integer 'This is the column that defines the assembly level we're basing our grouping on'
Dim LastRow As Integer 'This is the last row in the sheet that contains information we're grouping'
Dim CurrentLevel As Integer 'iterative counter'
Dim groupBegin, groupEnd As Integer
Dim i As Integer
Dim j As Integer
Dim n As Integer
Application.ScreenUpdating = False 'Turns off screen updating while running.
'Prompts user to select the starting row. It MUST be the highest level of assembly and also the top left cell of the range you want to group/outline"
Set StartCell = Application.InputBox("Select levels' column top cell", Type:=8)
StartRow = StartCell.Row
LevelCol = StartCell.Column
LastRow = ActiveSheet.UsedRange.End(xlDown).Row 'empty rows above aren't included in UsedRange.rows.count => UsedRange.End
'Remove any pre-existing outlining on worksheet, or you're gonna have 99 problems and an outline ain't 1
Cells.ClearOutline
'Walk down the bom lines and group items until you reach the end of populated cells in the assembly level column
groupBegin = StartRow + 1 'For the first group
For i = StartRow To LastRow
CurrentLevel = Cells(i, LevelCol)
groupBegin = i + 1
'Goes down until the entire subrange is selected according to the index
For n = i + 1 To LastRow
If Cells(i, LevelCol).Value = Cells(n, LevelCol).Value Then
If n - i = 1 Then
Exit For
Else
groupEnd = n - 1
Rows(groupBegin & ":" & groupEnd).Select
'If is here to prevent grouping level that have only one row
End If
Exit For
Else
End If
Next n
Next i
'For last group
Rows(groupBegin & ":" & LastRow).Select
Selection.Rows.Group
ActiveSheet.Outline.ShowLevels RowLevels:=1 'Minimize all the groups
ActiveSheet.Outline.SummaryRow = xlAbove 'Put "+" next to first line of each group instead of the bottom
Application.ScreenUpdating = True 'Turns on screen updating when done.
End Sub
Basically what I am trying to do in the above code is to select the top index and run down the cells until that index is the same value again. Basically for the example chart, I would like to select rows(2:4) and group them. This is not achieved by the code. Also, code skips grouping if the adjacent rows are with the same index.
Is this a viable method or should I re-think my loops and how?

The code you have arrived at seems a little convoluted to me. Change to your needs and try this:
Sub groupTest()
Dim sRng As Range, eRng As Range ' Start range, end range
Dim rng As Range
Dim currRng As Range
Set currRng = Range("B1")
Do While currRng.Value <> ""
Debug.Print currRng.Address
If sRng Is Nothing Then
' If start-range is empty, set start-range to current range
Set sRng = currRng
Else
' Start-range not empty
' If current range and start range match, we've reached the same index & need to terminate
If currRng.Value <> sRng.Value Then
Set eRng = currRng
End If
If currRng.Value = sRng.Value Or currRng.Offset(1).Value = "" Then
Set rng = Range(sRng.Offset(1), eRng)
rng.EntireRow.Group
Set sRng = currRng
Set eRng = Nothing
End If
End If
Set currRng = currRng.Offset(1)
Loop
End Sub
Note that there is no error-handling here, the code is a little verbose for readability and bonus - no select.
Edit:
As requested, the subgrouping. This actually had me stuck for a bit - I coded myself into a corner and only barely got out on my own!
A few notes:
I have tested this to some extent (with 4 sublevels and multiple parents) and it works nicely. I tried to write the code so that you can have as many sublevels or as many parents as you want. But it has not been extensively tested, so I couldn't guarantee anything.
However, for some scenarios, Excel won't properly display the +-signs, I am guessing that is due to lack of space in these particular scenarios. If you encounter this, you can contract and expand the different levels using the numbered buttons at the top of the column the +-signs are located in. This will expand/contract all groups of that particular sub-level, however, so it is not optimal. But it is what it is.
Assuming a setup like this (this is after the grouping - you can see the missing +-signs here, for example for group 1.3 and 3.1 -- but they are grouped!):
Sub subGroupTest()
Dim sRng As Range, eRng As Range
Dim groupMap() As Variant
Dim subGrp As Integer, i As Integer, j As Integer
Dim startRow As Range, lastRow As Range
Dim startGrp As Range, lastGrp As Range
ReDim groupMap(1 To 2, 1 To 1)
subGrp = 0
i = 0
Set startRow = Range("A1")
' Create a map of the groups with their cell addresses and an index of the lowest subgrouping
Do While (startRow.Offset(i).Value <> "")
groupMap(1, i + 1) = startRow.Offset(i).Address
groupMap(2, i + 1) = UBound(Split(startRow.Offset(i).Value, "."))
If subGrp < groupMap(2, i + 1) Then subGrp = groupMap(2, i + 1)
ReDim Preserve groupMap(1 To 2, 1 To (i + 2))
Set lastRow = Range(groupMap(1, i + 1))
i = i + 1
Loop
' Destroy already existing groups, otherwise we get errors
On Error Resume Next
For k = 1 To 10
Rows(startRow.Row & ":" & lastRow.Row).EntireRow.Ungroup
Next k
On Error GoTo 0
' Create the groups
' We do them by levels in descending order, ie. all groups with an index of 3 are grouped individually before we move to index 2
Do While (subGrp > 0)
For j = LBound(groupMap, 2) To UBound(groupMap, 2)
If groupMap(2, j) >= CStr(subGrp) Then
' If current value in the map matches the current group index
' Update group range references
If startGrp Is Nothing Then
Set startGrp = Range(groupMap(1, j))
End If
Set lastGrp = Range(groupMap(1, j))
Else
' If/when we reach this loop, it means we've reached the end of a subgroup
' Create the group we found in the previous loops
If Not startGrp Is Nothing And Not lastGrp Is Nothing Then Range(startGrp, lastGrp).EntireRow.Group
' Then, reset the group ranges so they're ready for the next group we encounter
If Not startGrp Is Nothing Then Set startGrp = Nothing
If Not lastGrp Is Nothing Then Set lastGrp = Nothing
End If
Next j
' Decrement the index
subGrp = subGrp - 1
Loop
End Sub

The subGroupTest() function above can be replaced by 6 lines of code:
Sub subGroupTest()
Dim cRng As range
Set cRng = range("A1")
Do While cRng.Value <> ""
cRng.EntireRow.OutlineLevel = UBound(Split(cRng.Value, ".")) + 1
Set cRng = cRng.Offset(1)
Loop
End Sub
Consecutive rows on the same OutlineLevel are automatically grouped together, so no need to jump through all the hoops in order to solve for the depths manually. OutlineLevel = 1 means the row is not grouped too.
As a bonus, there is no need to delete the outline levels beforehand.

Related

Loop not correctly deleting rows in the Worksheet

So here is the task I am required to do.
I have a worksheet in which the user can specify a Column name and an element under the column, once chosen, the macro will find and delete every element with said name.
My issue comes from the final part of the macro, the delete. My loop doesn't delete all the rows, it will only find one instance of the element and delete it, then go to the next element and delete it, leaving every other element with the same name intact.
Here is the function within the macro, I apologize in advance for the poor code quality as I am not well versed in vba.
Function LineDelete() As Variant
Dim NbLignes As Integer
Dim ctr As Integer
Dim ctr2 As Integer
Dim Table As Variant
Worksheets("parametrage_suppr_ligne").Activate
ctr = 1
ctr2 = 1
NbLignes = Cells.Find("*", Range("A1"), , , xlByRows, xlPrevious).Row - 4
ReDim Table(1 To NbLignes, 2)
While ctr <= NbLignes
Table(ctr, 1) = Cells(ctr + 4, 1).Value
Table(ctr2, 2) = Cells(ctr2 + 4, 2).Value
ctr = ctr + 1
ctr2 = ctr2 + 1
Wend
Call FileOpen
Call delInvalidChars
Call OrderRows
Dim newCtr As Integer
Dim rng As Range
Dim rngHeaders As Range
Dim newString As Variant
Dim i As Integer
NbLignes = 0
NbLignes = Cells.Find("*", Range("A1"), , , xlByRows, xlPrevious).Row
Set rngHeaders = Range("1:1")
newCtr = 1
For i = NbLignes To FirstRow Step -1
Set rng = rngHeaders.Find(Table(newCtr, 1))
If Table(newCtr, 1) = rng Then
MsgBox "All is gud!!"
newString = Cells.Find(Table(newCtr, 2))
If Table(newCtr, 2) = newString Then
MsgBox newString
Range(Cells.Find(Table(newCtr, 2)).Address).EntireRow.Delete
newCtr= newCtr + 1
End If
End If
newCtr = newCtr + 1
Next i
End Function
So now to explain a bit what I've done here.
At first I store the options in a 2 dimentional table with a simple loop, in this table I store the name of the column a well as the name of the element under the column that has to be deleted.
After that I call the functions which open a txt file which is then transformed into an excel file, it is in this new excel file that the deletes have to be done.
I then reset the NbLignes variable as well as call new variables.
Here is where the issues begin, I thought that by iterating on the number of lines the new excel file has; the program was going to look for all of the instances of the word in the column and was going to delete them, but so far it will only do it 3 times.
I am totally lost as to what modify to be able to fix this.
Here is what the config table looks like, this is what the user can modify to specify what to delete, it is also what I store inside of the 2d Table:
User can add as many columns and names as needed
EDIT: What the code does now after updating is that it deletes all the elements that have the same name as the first one in the image (fun_h_opcomp), the expected outcome would be that as soon as all those elements are deleted, the program should then pass on to the next one (fun_b_pterm) and so on.
Of course the i was just an Example for that counter and you must use your newCtr counter here, and FirstRow must be set to a value.
Const FirstRow As Long = 1
Dim newCtr As Long 'always use Long for row counting
For newCtr = NbLignes To FirstRow Step -1
Set rng = rngHeaders.Find(Table(newCtr, 1))
If Table(newCtr, 1) = rng Then
MsgBox "All is gud!!"
newString = Cells.Find(Table(newCtr, 2))
If Table(newCtr, 2) = newString Then
MsgBox newString
Range(Cells.Find(Table(newCtr, 2)).Address).EntireRow.Delete
End If
End If
Next newCtr
There is no need to increment/decrement newCtr anymore because this is automatically done by the Next statement.

VBA that copies rows into new sheet based on each row's cell contents (example included)

So I'm hoping for some help to automate a process that will otherwise involve copying and editing some 10,000 rows.
This is stuff relating to location data. Essentially, there are tons of these Master Rows but they do not have individual rows for Unit Numbers. I am hoping to get something to expand these into individual Unit Number rows based on what is in Column N. Column N is intended to follow a strict format of being a comma-seperated single cell list for each row.
Below is an example from Sheet 1 of what each row will have and needs to be expanded upon. Note that Column N is green and follows a consistent formatting and this will be the determinant for how many times these rows will each be expanded upon.
Below is Sheet 2 and what I want the VBA to create from Sheet 1. You can see that each row has been expanded based on the contents of Column N from Sheet 1.
Like I said, it is expected that this will involve some several thousand rows to create.
Option Explicit
Sub Tester()
Dim sht1, sht2, rwSrc As Range, rwDest As Range, v, arr, n
Set sht1 = ThisWorkbook.Sheets("Sheet1")
Set sht2 = ThisWorkbook.Sheets("Sheet2")
sht2.Range("A2:M2").Resize(3, 13).Value = sht1.Range("A2:M2").Value
Set rwDest = sht2.Range("A2:M2") 'destination start row
Set rwSrc = sht1.Range("A2:M2") 'source row
Do While Application.CountA(rwSrc) > 0
v = rwSrc.EntireRow.Cells(1, "N").Value 'list of values
If InStr(v, ",") > 0 Then
'list of values: split and count
arr = Split(v, ",")
n = UBound(arr) + 1
Else
'one or no value
arr = Array(v)
n = 1
End If
'duplicate source row as required
rwDest.Resize(n, 13).Value = rwSrc.Value
'copy over the unit values
rwDest.Cells(1, "G").Resize(n, 1).Value = Application.Transpose(arr)
'offset to next destination row
Set rwDest = rwDest.Offset(n, 0)
'next source row
Set rwSrc = rwSrc.Offset(1, 0)
Loop
End Sub
This does the work in same sheet... Pls copy the value to "Sheet2" before executing this. Not sure about efficiency though.
Public Sub Test()
Dim lr As Long ' To store the last row of the data range
Dim counter As Long
Dim Str As String ' To store the string in column N
lr = Range("N65536").End(xlUp).Row 'Getting the last row of the data
For i = lr To 2 Step -1
Str = Range("N" & i).Value ' Getting the value from Column N
counter = 1
For Each s In Split(Str, ",")
If counter > 1 Then
Range("A" & (i + counter - 1)).EntireRow.Insert ' Inserting rows for each value in column N
Range("G" & (i + counter - 1)).Formula = s ' Updating the value in Column G
Else
Range("G" & i).Formula = s ' No need to insert a new row for first value
End If
counter = counter + 1
Next s
Next i
lr = Range("G65536").End(xlUp).Row
' Pulling down other values from the first value row other rows
Range("A1:N" & lr).Select
Selection.SpecialCells(xlCellTypeBlanks).Select
Selection.FormulaR1C1 = "=R[-1]C"
' Pasting the data as Values to avoid future formula issues.
Range("A1:N" & lr).Copy
Range("A1:N" & lr).PasteSpecial xlPasteValues
MsgBox "Done"
End Sub

COUNTIF() in 'For' loop

I have a column with nearly 100k and am trying to determine how many times a value occurs repeatedly in that column. I can do it row by row currently, but this is menial as a programmer, through something like =COUNTIF(D:D,D2). Yet that only returns D2 matches in column D.
I need to iterate through all values of D returning countif, therefore revealing all of the values repetitions in the column. I can remove duplicates later! So I have a dev. button a basic sub, or function (man this is new to me) and something along the lines of the most basic for loop ever. Just getting caught up on how to implement the COUNTIF() to to the loop properly.
Right now I'm looking at:
Sub doloop()
Dim i As Integer
i = 1
Do While i < D.Length
Cells(i, 8).Value =CountIf(D:D,D[i])
i = i + 1
Loop
End Sub
That code is incorrect obviously but it is where I'm at and may help for anyone more familiar with other languages.
Use Application.WorksheetFunction.CountIf() in your loop.
Private Sub doloop()
Dim lastRow As Long
Dim d As Double
Dim r As Range
Dim WS As Excel.Worksheet
Dim strValue As String
Dim lRow As Long
'Build your worksheet object
Set WS = ActiveWorkbook.Sheets("sheet1")
'Get the last used row in column A
lastRow = WS.Cells(WS.Rows.count, "D").End(xlUp).Row
'Build your range object to be searched
Set r = WS.Range("D1:D" & lastRow)
lRow = 1
WS.Activate
'Loop through the rows and do the search
Do While lRow <= lastRow
'First, get the value we will search for from the current row
strValue = WS.Range("D" & lRow).Value
'Return the count from the CountIf() worksheet function
d = Application.worksheetFunction.CountIf(r, strValue)
'Write that value to the current row
WS.Range("H" & lRow).Value = d
lRow = lRow + 1
Loop
End Sub
I believe you are trying to write the value to the cell, that is what the above does. FYI, if you want to put a formula into the cell, here is how that is done. Use this in place of WS.Range("H" & lRow).Value = d
WS.Range("H" & lRow).Formula = "=CountIf(D:D, D" & lRow & ")"
Sounds like you may want to look into using tables in Excel and capitalizing on their features like filtering and equation autofill. You may also be interested in using a PivotTable to do something very similar to what you're describing.
If you really want to go about this the programmatic way, I think the solution Matt gives answers your question about how to do this using CountIf. There's a big detriment to using CountIf though, in that it's not very computationally efficient. I don't think the code Matt posted will really be practical for processing the 100K rows mentioned in the OP (Application.ScreenUpdating = false would help some). Here's an alternative method that's a lot more efficient, but less intuitive, so you'll have to decide what suites your needs and what you feel conformable with.
Sub CountOccurances()
'Define Input and Output Ranges
'The best way to do this may very from case to case,
'So it should be addressed seperately
'Right now we'll assume current sheet rows 1-100K as OP specifies
Dim RInput, ROutput As Range
Set RInput = Range("D1:D100000")
Set ROutput = Range("E1:E100000")
'Define array for housing and processing range values
Dim A() As Variant
ReDim A(1 To RInput.Rows.Count, 0)
'Use Value2 as quicker more accurate value
A = RInput.Value2
'Create dictionary object
Set d = CreateObject("Scripting.Dictionary")
'Loop through array, adding new values and counting values as you go
For i = 1 To UBound(A)
If d.Exists(A(i, 1)) Then
d(A(i, 1)) = d(A(i, 1)) + 1
Else
d.Add A(i, 1), 1
End If
Next
'Overwrite original array values with count of that value
For i = 1 To UBound(A)
A(i, 1) = d(A(i, 1))
Next
'Write resulting array to output range
ROutput = A
End Sub
You can also modify this to include the removal of replicates you mentioned.
Sub CountOccurances_PrintOnce()
'Define Input and Output Ranges
'The best way to do this may very from case to case,
'So it should be addressed seperately
'Right now we'll assume current sheet rows 1-100K as OP specifies
Dim RInput, ROutput As Range
Set RInput = Range("D1:D100000")
Set ROutput = Range("F1:F9")
'Define array for housing and processing range values
Dim A() As Variant
ReDim A(1 To RInput.Rows.Count, 0)
'Use Value2 as quicker more accurate value
A = RInput.Value2
'Create dictionary object
Set d = CreateObject("Scripting.Dictionary")
'Loop through array, adding new values and counting values as you go
For i = 1 To UBound(A)
If d.Exists(A(i, 1)) Then
d(A(i, 1)) = d(A(i, 1)) + 1
Else
d.Add A(i, 1), 1
End If
Next
'Print results to VBA's immediate window
Dim sum As Double
For Each K In d.Keys
Debug.Print K & ": " & d(K)
sum = sum + d(K)
Next
Debug.Print "Total: " & sum
End Sub

Why do my VBA code sometimes work and most of the times it doesn't?

Sub UpdateCSAH()
Dim S As String
Dim R As Long
Dim RR As Long
Dim CC As Long
Dim i As Long
Dim j As Long
Dim csah() As String 'an array that stores the CSAH sites
ReDim csah(1 To 100, 1 To 7)
Dim Ran As Range
Dim Ran1 As Range
Set Ran = Worksheets("Current Sites").Range("A1").CurrentRegion 'Ran is the region that has values
RR = 1 'row number in csah
CC = 1 'column number in csah
'check each value in Ran to see if its Route section has "CSAH"
For Each cell In Ran
R = cell.row
S = CStr(Cells(R, 4).value)
If InStr(S, "CSAH") > 0 Then 'check if "CSAH" is in the Route section
If CC > 7 Then 'reset the column number and go to the next row when reach the end of the column
CC = 1
RR = RR + 1
End If
csah(RR, CC) = cell.value
CC = CC + 1
End If
Next cell
Worksheets("CSAH Sites").Select
Range("A2:G100").Select
Selection.ClearContents
'assign each array values to cells in sheet"CSAH Sites"
i = 1
j = 1
For i = 1 To UBound(csah, 1)
For j = 1 To UBound(csah, 2)
Cells(i + 1, j) = csah(i, j)
Next j
Next i
'format the CSAH Sites values
Set Ran1 = Worksheets("CSAH Sites").Range("A1").CurrentRegion
For Each cell In Ran1
If cell.row = 1 Then
With cell.Font
.Color = -11489280
End With
ElseIf cell.row Mod 2 = 0 Then
With cell.Interior
.Color = 10092441
End With
End If
Next cell
End Sub
I have an Excel worksheet named "Current Sites" that has some data. If the 4th column has the word "CSAH", I want to store the values of that row into an array and assign those values to cells in the worksheet named "CSAH Sites". My code sometimes works (the 1st time you click), and most of times it doesn't work or doesn't work properly.
Please help me out! Thanks A Bunch!!
It looks like you want to check every row of data in the "Current Sites" sheet and if column 4 includes the "CSAH" text, then write the first 7 columns of data for that entry to the "CSAH Sites" sheet and add some colour to the even-numbered rows.
To check every row of data, you can read down just one column and use either the Offset or the Cells method to see the values of neighbouring cells. In your code you were "touching" every cell and each time you were then looking at the value in column 4 and also checking to see if the code had gone past column 7. That slows things down a lot and makes the code hard to understand.
You can also assign the values from a range of cells directly to another range of cells without using variables or an array.
See if this does what you want:
Sub UpdateCSAH()
Dim currentSitesRange As Range
Dim thisSiteRange As Range
Dim outputCell As Range
Dim numRowsOfData As Long
Const NUM_COLUMNS_OF_DATA As Integer = 7
Set currentSitesRange = Worksheets("Current Sites").Range("A1")
numRowsOfData = currentSitesRange.CurrentRegion.Rows.Count
Set currentSitesRange = currentSitesRange.Resize(RowSize:=numRowsOfData) 'currentSitesRange is the region that has values
Worksheets("CSAH Sites").Range("A2:G100").ClearContents
Set outputCell = Worksheets("CSAH Sites").Range("A2")
For Each thisSiteRange In currentSitesRange.Cells
' Look for "CSAH" in the Route section (column D)
If InStr(1, thisSiteRange.Offset(ColumnOffset:=3).Value, "CSAH", vbTextCompare) > 0 Then
' Found "CSAH" so write NUM_COLUMNS_OF_DATA columns of data to CSAH Sites sheet
outputCell.Resize(ColumnSize:=NUM_COLUMNS_OF_DATA).Value = thisSiteRange.Resize(ColumnSize:=NUM_COLUMNS_OF_DATA).Value
' Format the even-numbered rows
If outputCell.Row Mod 2 = 0 Then
With outputCell.Resize(ColumnSize:=NUM_COLUMNS_OF_DATA).Interior
.Color = 10092441
End With
End If
Set outputCell = outputCell.Offset(RowOffset:=1)
End If
Next thisSiteRange
End Sub

Need a better optimized code?

Need a much Optimized code.Well I Got a Project and I have Succefully made it work with the vba (Mostly helped by the stackoverflow programmers Thanks for that)
But Today I got a Feedback. Its deleting 2 more unique entries in the record But I dont know why its deleting Them.
The Algorithm I have applied
I have Used the COUNTIF function Which I found on google
="countif(A$1:A2,A3)=0" A3 is the active cell, Checks A2,A1 for dupes
It Throws False if there is a duplicate in The A column and True If it is a unique.What I have understood about Countif is that
It checks all the above columns values from that cell I mean let us take A4. SO it checks A2,A1,A3 for the duplicate. Similarly A10 checks for A1 to A9 and throws either TRue or False.Well It was working But I dont know what went wrong The code is not working for some entries.Its even showing False for the Unique entries sometimes.
And its taking more time to applye these formula as I have more amount of data. Im trying to make it cleaner and more Optimizing Way.People told me its not a c or some other Language to make it optimize but Im need of code that makes my code more optimized
I need code for these condtions can anyone help me as my countif failed.Im little helpless in doing so.
1)I have a column and I should check for duplicates in that column and delete that row if it is a duplicate
2) I have 35000 Old entries in the column and I have new entries 2000 everyweek these are appended. I need to check these 2000 entries from the total 37000 ( as we appened we get 35000+2000) and these delete operation need to be performed only on the newly appended 2000 entries but it should check the duplicates for entire column
Let me explain you clearly I have 2000 entries newly added,so Only these entries are to be checked for the duplicates from the 35000 entries and also from itself (2000 entries) and delete it if it is a duplicate and no duplicating operation should be performed on the 35000 entries old data.
I have found some codes but they are deleting even the duplicates of the 35000 entries. I have set the range but even though its not working.
Can anyone help me with the best code that takes less time?please thank you
Updating my question with the sample code I have
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
PTY 3945.678 2 2 PTY3945.67822
GTY 34446.1234 2 1 230 1 GTY34446.1234212301
let us say these are old 35000 entries
Explaination to the above example.
The above are the 35000 entries. I have to check A,B,F,G,H,I columns for the dupes, if they are same I have to delete the row, I should not bother about the other columns c,d etc. so what I did is I have used one unused column Y and concatenated these 6 columns values into 1 at Y column using these
= A2 & B2 & F2 & G2 & H2 &I2 with the respective columns
Now checking the Y column for dupes and delete the entire row. as 2003 supports only for one column as far to my knowledge.
Notice that even the 35000 entries may have duplicates in it but I should not delete them. Example you can see the 2 and last row in my example code are dupes but I should not delete
as it is the old data.
A B F G H I Y
PTY 39868.5 4 2 540 3 PTY39868.5425403 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 2 2 PTY3945.67822 'old
GTY 34446.1234 2 1 230 1 GTY34446.1234212301 'old
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
PTY 39868.5 4 2 540 3 PTY39868.5425403 'new
PTY 3945.678 1 1 230 2 PTY3945.678112302 'new
Now note that New entry PTY (from last 2nd) is a duplicate of the original record(PTY at first) So I hava to delete it.And the last new entry is a duplicate of the new entry itself so I should delete it even that . SO in the above code I have to delete only the last 2 rows which are dupes of original record and also from it . But should not delete the GTY which is the dupe but which is in orginal record.
I think I gave a clear view now. Is concatenating them into one cell . Is it better way to approach? as conactenatin for 40000 entries taking just 2 seconds i think that doesnt matter but any more algorithms to these is much aprreciated
I heard counif treats 45.00 and 45.00000 as different is that right may be that was the problem with it? since I have decimal points in my data. I think I should do
= I2 & H2 & G2 & F2 & A2 & B2
which is better to concatenate? is this or the other i posted before?
BIG UPDATE:
It think the original questions threw me off - there may be a problem with the logic in the question. The following assumes you want to delete the cell, not entire row, for the duplicate entries.
If the 35000 old records do not include duplicates, then all you need to do is remove all duplicates from the entire column - so long as you start from row 1, you run no risk of deleting any of the 'old' rows since no duplicates exist in them.
Here is one way:
Sub UniqueList()
Application.ScreenUpdating = False
Dim vArray As Variant
Dim i As Long, j As Long, lastrow As Long
Dim dictionary As Object
Set dictionary = CreateObject("scripting.dictionary")
lastrow = Range("A" & Rows.Count).End(xlUp).Row
vArray = Range("A1:A" & lastrow).Value
On Error Resume Next
For i = 1 To UBound(vArray, 1)
For j = 1 To UBound(vArray, 2)
If Len(vArray(i, j)) <> 0 Then
dictionary(vArray(i, j)) = 1
End If
Next
Next
Columns("A:A").ClearContents
Range("A1").Resize(dictionary.Count).Value = _
Application.Transpose(dictionary.keys)
Application.ScreenUpdating = True
End Sub
If for some odd reason the 35000 old records DO include dupes and you only want to allow these 35000 records to do so, then you can use 2 dictionaries, but this would be an unusual case since you'd be treating the old records differently than new...
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 1 To UBound(varray, 1)
If oldDict.exists(varray(i, 1)) = False Then
newDict.Add varray(i, 1), 1
End If
Next
'Delete and slap back on the unique list
Range("A35001", "A" & Rows.Count).ClearContents
Range("A35001").Resize(newDict.Count).Value = _
Application.Transpose(newDict.keys)
Application.ScreenUpdating = True
End Sub
Thanks to Reafidy for the advice and getting me to relook at this.
This is also a response to some of the comments and solutions made by other members so sorry if it does not straight away answer your question.
Firstly I believe that using excel in a database scenario that raw data and presentation data should be separated. This usually means a single worksheet with raw data and multiple other worksheets with presentation data. Then delete the raw data when necessary or archive.
When speed testing it is very difficult to get a level playing field in excel as there are many things that affect the results. Computer specs, available RAM etc.. Code must first be compiled before running any of the procedures. The test data is also important, when considering duplicates - how many duplicates vs how many rows. This sub loads some test data, altering the amount of rows vs the range of random numbers (duplicates) will give very different results for your code. I don't know what your data looks like so we are kind of working blind and your results may be very different.
'// This is still not very good test data, but should suffice for this situation.
Sub TestFill()
'// 300000 rows
For i = 1 To 300000
'// This populates a random number between 1 & 10000 - adjust to suit
Cells(i, "A").value = Int((100000 + 1) * Rnd + 1)
Next
End Sub
If we are talking about advanced filter vs an array & dictonary method then advanced filter will be quicker with a lower amount of rows but once you get above a certain amount of rows then the array method will be quicker. Then see what happens when you change the amount of duplicates.... :)
As a guideline or as a general rule using excels built in functions will be faster and I recommend always develop attempting to use these inbuilt functions, however there are often exceptions, like above when removing duplicates. :)
Deleting rows can be slow when looping if used incorrectly. If looping is used then it is important to keep synchronisation between code and the workbook out of the loop. This usually means read data to an array, loop through the data, then load the data from the array back to the presentation worksheet essentially deleting the unwanted data.
Sub RemoveDuplicatesA()
'// Copy raw data to presentation sheet
Range("A1", Cells(Rows.Count, "A").End(xlUp)).AdvancedFilter _
Action:=xlFilterCopy, CopyToRange:=Sheet2.Range("B1"), Unique:=True
End Sub
This will be the fastest method:
Sub RemoveDuplicatesB()
Dim vData As Variant, vArray As Variant
Dim lCnt As Long, lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(0 To UBound(vData, 1), 0)
lCnt = 0
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .Exists(vData(lRow, 1)) Then
vArray(lCnt, 0) = vData(lRow, 1): lCnt = lCnt + 1
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Copy raw data to presentation sheet
Sheet2.Range("B1").Resize(lCnt).value = vArray
End Sub
Application transpose has a limitation of 65536 rows but as you are using 2003 you should be fine using it, therefore you can simplify the above code with:
Sub RemoveDuplicatesC()
Dim vData As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
.Add vData(lRow, 1), Nothing
End If
Next lRow
'// Copy raw data to presentation sheet or replace raw data
Sheet2.Columns(2).ClearContents
Sheet2.Columns(2).Resize(.Count).value = Application.Transpose(.keys)
End With
End Sub
EDIT
Okay so #Issun has mentioned you want the entire row deleted. My suggestion was to improve your spreadsheet layout by having a raw data and presentation sheet which means you dont need to delete anything hence it would have been the fastest method. If you dont want to do that and would like to edit the raw data directly then try this:
Sub RemoveDuplicatesD()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
vData = ActiveSheet.UsedRange.Columns(1).value
ReDim vArray(1 To UBound(vData, 1), 0)
With CreateObject("Scripting.Dictionary")
For lRow = 1 To UBound(vData, 1)
If Not .exists(vData(lRow, 1)) Then
varray(lRow, 0) = "x"
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
Application.ScreenUpdating = False
'// Modify the raw data
With ActiveSheet
.Columns(2).Insert
.Range("B1").Resize(lRow).value = vArray
.Columns(2).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
.Columns(2).Delete
End With
Application.ScreenUpdating = True
End Sub
Before starting again from scratch your whole code, here are a few things you can try:
Optimize your VBA
There are several tips on the web about optimizing vba. In particular, you can do:
'turn off some Excel functionality so your code runs faster
'these two are especially very efficient
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'use these if you really need to
Application.DisplayStatusBar = False
Application.EnableEvents = False
'code goes here
'at the end, restore the default behavior
'calculate the formulas
Application.Calculate
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.DisplayStatusBar = True
Application.EnableEvents = True
See here for more information
Optimize your algorithm
Especially when your inserting your COUNTIF formula, you can try to fill in instead of inserting the formula in each row.
On the deleting row part, you should try the solution I gave you in your previous thread: Delete duplicate entries in a column in excel 2003 vba to filter first on the True values and then to delete the visible cells. It is probably the fastest way.
[EDIT] Seems like Doc Brown's answer would be probably the best way to handle this (hey, this is a dictionary solution that wasn't written by Issun :)). Anyway, the VBA optimization tips are still relevant because this is quite a slow language.
OK, here's the advancedfilter method. Don't know if it is faster than the dictionary method. It would be interesting to know though, so let me know after you try it. I also included the delete portion so you would have to stop that portion if you want to do a true comparison. Also, you can make this a function instead of a sub and put in your variables, however you want to change it.
Sub DeleteRepeats()
Dim d1 As Double
Dim r1 As Range, rKeepers As Range
Dim wks As Worksheet
d1 = Timer
Set wks = ActiveSheet
Application.EnableEvents = False
Application.ScreenUpdating = False
'Make sure all rows are visible
On Error Resume Next
wks.ShowAllData
wks.UsedRange.Rows.Hidden = False
wks.UsedRange.Columns.Hidden = False
On Error GoTo 0
'Get concerned range
Set r1 = wks.Range("A1:A35000")
'Filter
r1.AdvancedFilter Action:=xlFilterInPlace, Unique:=True
'Get range of cells not to be deleted
Set rKeepers = r1.SpecialCells(xlCellTypeVisible)
On Error Resume Next
wks.ShowAllData
On Error GoTo 0
rKeepers.EntireRow.Hidden = True
'Delete all undesirables
r1.SpecialCells(xlCellTypeVisible).EntireRow.Delete
'show all rows
On Error Resume Next
wks.UsedRange.Rows.Hidden = False
On Error GoTo 0
Application.EnableEvents = False
Application.ScreenUpdating = False
Debug.Print Timer() - d1
End Sub
OK, here's a take on Doc's and Issun's use of Dictionaries. Before I wasn't convinced but after looking at it and testing it and comparing to advanced filter, I am convinced, dictionaries are better for this application. I don't know why Excel isn't faster on this point since they should be using faster algorithms, it's not the hiding, unhiding of the rows since that happens very quickly. So if anyone knows, let me know. This procedure takes just over 1 second on my slow computer:
Sub FindDupesAndDelete()
Dim d1 As Double
Dim dict As Object
Dim sh As Worksheet
Dim v1 As Variant
' Dim s1() As String
Dim rDelete As Range
Dim bUnion As Boolean
d1 = Timer()
bUnion = False
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
v1 = Application.Transpose(sh.Range("A1", "A" _
& sh.Cells.SpecialCells(xlCellTypeLastCell).row))
' ReDim s1(1 To UBound(v1))
Dim row As Long, value As String ', newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = v1(row)
If dict.Exists(value) Then
' newEntry = False
If bUnion Then
Set rDelete = Union(rDelete, sh.Range("A" & row))
Else
Set rDelete = sh.Range("A" & row)
bUnion = True
End If
Else
' newEntry = True
dict.Add value, 1
End If
' s1(row) = newEntry
Next
rDelete.EntireRow.Delete
' sh.Range("B1", "B" & UBound(v1)) = Application.Transpose(s1)
Debug.Print Timer() - d1
End Sub
Okay so now we have some more info here is a solution. It should execute almost instantly.
The code works by filling column y with your concatenate formula. It then adds all of column y to a dictionary and using the dictionary marks each row as a duplicate in column z. It then removes all the duplicates found after row 35000. Then finally it clears both column y and column z to remove the redundant data.
Sub RemoveDuplicates()
Dim vData As Variant, vArray As Variant
Dim lRow As Long
'// Get used range of column A (excluding header) and offset to get column y
With ActiveSheet.Range("A2", Cells(Rows.Count, "A").End(xlUp)).Offset(, 24)
'// Adds the concatenate formula to the sheet column (y)
.FormulaR1C1 = "=RC[-24]&RC[-23]&RC[-19]&RC[-18]&RC[-17]&RC[-16]"
'// Adds the formula results to an array
vData = .Resize(, 1).value
End With
'// Re dimension the array to the correct size
ReDim vArray(1 To UBound(vData, 1), 0)
'// Create a dictionary object using late binding
With CreateObject("Scripting.Dictionary")
'// Loop through each row in the array
For lRow = 1 To UBound(vData, 1)
'// Check if value exists in the array
If Not .exists(vData(lRow, 1)) Then
'// Value does not exist mark as non duplicate.
vArray(lRow, 0) = "x"
'// Add value to dictionary
.Add vData(lRow, 1), Nothing
End If
Next lRow
End With
'// Turn off screen updating to speed up code and prevent screen flicker
Application.ScreenUpdating = False
With ActiveSheet
'// Populate column z with the array
.Range("Z2").Resize(UBound(vArray, 1)) = vArray
'// Use error handling as speciallcells throws an error when none exist.
On Error Resume Next
'// Delete all blank cells in column z
.Range("Y35001", .Cells(Rows.Count, "Y").End(xlUp)).Offset(, 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
'// Remove error handling
On Error GoTo 0
'// Clear columns y and z
.Columns(25).Resize(, 2).ClearContents
End With
'// Turn screen updating back on.
Application.ScreenUpdating = True
End Sub
NOTE: you can change all references "activesheet" to your sheet codename if you want.
NOTE2: it assumes you have headers and has left row 1 alone.
I have used your columns and test data as best I can. Here is the test fill I used:
Sub TestFill()
For i = 1 To 37000
With Range("A" & i)
.value = Choose(Int(2 * Rnd + 1), "PTY", "GTY")
.Offset(, 1).value = Round((40000 * (Rnd + 1)), Choose(Int(4 * Rnd + 1), 1, 2, 3, 4))
.Offset(, 5).value = Int(4 * Rnd + 1)
.Offset(, 6).value = Int(2 * Rnd + 1)
.Offset(, 7).value = Choose(Int(2 * Rnd + 1), "230", "540")
.Offset(, 8).value = Int(3 * Rnd + 1)
End With
Next i
End Sub
Lets say you have your entries in column A, and you want the result of your formula in column B (but much faster). This VBA macro should do the trick:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
sh.Range("B" & row) = "False"
Else
sh.Range("B" & row) = "True"
dict.Add value, 1
End If
Next
End Sub
(Using a dictionary gives here almost linear running time, which should be a matter of seconds for 35.0000 rows, where your original formula had quadratic running time complexity).
Edit: due to your comment: you will have to fill the dictionary first by reading each entry at least once, that is something you cannot avoid easily. What you can avoid is to fill the rows of column B again when they are already filled:
Option Explicit
Sub FindDupes()
Dim dict As Object
Dim sh As Worksheet
Set dict = CreateObject("Scripting.Dictionary")
Set sh = ActiveSheet
Dim row As Long, value As String, newEntry As Boolean
For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
value = sh.Range("A" & row).Text
If dict.Exists(value) Then
newEntry = False
Else
newEntry = True
dict.Add value, 1
End If
If Trim(sh.Range("B" & row)) = "" Then sh.Range("B" & row) = newEntry
Next
End Sub
But I suspect this won't be much faster than my first solution.
Now that you have updated that you want the entire rows deleted and that the first 35000 rows are allowed to have dupes, here is a function that will do that for you. I think I came up with a clever method and it's blazing fast, too:
Sub RemoveNewDupes()
Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")
On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row
'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
oldDict.Add varray(i, 1), 1
Next
'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 35000 + UBound(varray, 1) To 35001 Step -1
If oldDict.exists(varray(i - 35000, 1)) = True Or _
newDict.exists(varray(i - 35000, 1)) = True Then
Range("A" & i).EntireRow.Delete
Else
newDict.Add varray(i - 35000, 1), 1
End If
Next
Application.ScreenUpdating = True
'A status message at the end for finishing touch
MsgBox UBound(varray, 1) - newDict.Count & _
" duplicate row(s) found and deleted."
End Sub
How it works:
First I store the 35000 cells into a dictionary file. Then I take a variant array of every cell 35001 onward and loop through them backwards to see if it's in the 35k dictionary or not, or that we haven't come across the value yet in the loop. If it finds that it's a dupe, it deletes the row.
The cool (if I may say) way that it does the row deletion is that when you create the varray, for say A35001 - A37000, it stores them as (1, 1) (2, 1) (...). So if you set "i" to the Ubound of the array + 35000 and step back to 35001, you will loop through all the additions backwardsfrom A37000 to A35001. Then when you want to delete the row, "i" is perfectly set to the row number the value was found in, so you can delete it. And since it goes backwards, it does not screw up the loop!