Removing duplicate data from columns in Excel - vba

I am having an issue with this code:
Sub text()
Dim iListCount As Integer
Dim x As Variant
Dim iCtr As Integer
' Turn off screen updating to speed up macro.
Application.ScreenUpdating = False
' Get count of records to search through (list that will be deleted).
iListCount = Sheets("sheet2").Cells(Rows.Count, "C").End(xlUp).Row
' Loop through the "master" list.
For Each x In Sheets("Sheet2").Range("A1:A" & Sheets("Sheet2").Cells(Rows.Count, "C").End(xlUp).Row)
' Loop through all records in the second list.
For iCtr = iListCount To 1 Step -1
' Do comparison of next record.
' To specify a different column, change 1 to the column number.
If x.Value = Sheets("Sheet2").Cells(iCtr, 3).Value Then
' If match is true then delete row.
Sheets("Sheet2").Cells(iCtr, 1).EntireRow.Delete
End If
Next iCtr
Next
Application.ScreenUpdating = True
MsgBox "Done!"
End Sub
It runs, and kind of works. It removes one duplicate but leaves all of the others. I am testing this so I'm using a small sample size, so I know that there are 5 duplicates, however I can't get this code to remove them all. Any ideas? I think its an issue with the loop but no matter what I change I can't get it to work

By deleting entire rows in the inner loop you are modifying the range that the outer loop is looping through in the middle of the loop. Such code is difficult to debug.
Your nested loop structure is essentially a series of linear searches. This makes the overall behavior quadratic in the number of rows and can slow the application to a crawl. One approach is to use a dictionary which can be used in VBA if your project includes a reference to Microsoft Scripting Runtime (Tools - References in the VBA editor)
The following sub uses a dictionary to delete all cells in column C which have a value that occurs in column A:
Sub text()
Dim MasterList As New Dictionary
Dim iListCount As Integer
Dim x As Variant
Dim iCtr As Integer
Dim v As Variant
Application.ScreenUpdating = False
' Get count of records in master list
iListCount = Sheets("sheet2").Cells(Rows.Count, "A").End(xlUp).Row
'Load Dictionary:
For iCtr = 1 To iListCount
v = Sheets("sheet2").Cells(iCtr, "A").Value
If Not MasterList.Exists(v) Then MasterList.Add v, ""
Next iCtr
'Get count of records in list to be deleted
iListCount = Sheets("sheet2").Cells(Rows.Count, "C").End(xlUp).Row
' Loop through the "delete" list.
For iCtr = iListCount To 1 Step -1
If MasterList.Exists(Sheets("Sheet2").Cells(iCtr, "C").Value) Then
Sheets("Sheet2").Cells(iCtr, "C").Delete shift:=xlUp
End If
Next iCtr
Application.ScreenUpdating = True
MsgBox "Done!"
End Sub

Another option would be to loop through the cells, use Find and FindNext to find the duplicates and add them to a range using Union(). You could then delete that range at the end of the routine. This solves the problem with deleting rows as you iterate over them, and should execute pretty quickly.
Note: This code is untested, you may need to debug it.
Sub text()
Dim cell As Range
Dim lastCell as Range
Dim masterList as Range
Dim matchCell as Range
Dim removeUnion as Range
Dim firstMatch as String
' Turn off screen updating to speed up macro.
Application.ScreenUpdating = False
With Sheets("sheet2").Range("A:A")
' Find the last cell with data in column A
Set lastCell = .Find("*", .Cells(1,1), xlFormulas, xlPart, xlByRows, xlPrevious)
' Set the master list range to the used cells within column A
Set masterList = .Range(.cells(1,1) , lastCell)
End With
' Loop through the "master" list.
For Each cell In masterList
' Look for a match anywhere within column "C"
With cell.Parent.Range("C:C")
Set matchCell = .Find(.Cells(1,1), cell.Value, xlValues, xlWhole, xlByRows)
'If we got a match, add it to the range to be deleted later and look for more matches
If Not matchCell is Nothing then
'Store the address of first match so we know when we are done looping
firstMatch = matchCell.Address
'Look for all duplicates, add them to a range to be deleted at the end
Do
If removeUnion is Nothing Then
Set removeUnion = MatchCell
Else
Set removeUnion = Application.Union(removeUnion, MatchCell)
End If
Set MatchCell = .FindNext
Loop While (Not matchCell Is Nothing) and matchCell.Address <> firstMatch
End If
'Reset the variables used in find before next loop
firstMatch = ""
Set matchCell = Nothing
End With
Next
If Not removeUnion is Nothing then removeUnion.EntireRow.Delete
Application.ScreenUpdating = True
MsgBox "Done!"
End Sub

Related

How can I delete all rows that do not include a specific value?

I have been struggling with this for a few hours and think it's probably time to ask for help.
I have hundreds of spreadsheets that I would like to manually open and then simplify using a macro. Each spreadsheet has a list of hospitals (approx 400) and I would like to limit each one to only showing data about 100 hospitals. The hospitals are identified by a three letter acronym in a column that varies in location (row/column) but is always titled "Code".
So, for example, I would like the macro to delete all rows that do not contain the values "Code", "ABC", "DEF", "GEH", etc.
I am not a regular Excel user and only need to use it to solve this one problem...!
I have tried the code attached but it has a couple of bugs:
It deletes rows that contain "ABC" as well. This problem goes away if I define Range("B1:B100") but not if the range extends across multiple columns (e.g. "A1:E100"). Frustratingly the "Code" column varies across the spreadsheets.
As I want to save 100 hospital codes, it feels as if there ought to be a better way than using the "Or" operator 100 times.
Can anyone help?
Sub Clean()
Dim c As Range
Dim MyRange As Range
LastRow = Cells(Cells.Rows.Count, "D").End(xlUp).Row
Set MyRange = Range("A1:E100")
For Each c In MyRange
If c.Value = "Code" Then
c.EntireRow.Interior.Color = xlNone
ElseIf c.Value = "ABC" Or c.Value = "DEF" Then
c.EntireRow.Interior.Color = vbYellow
Else
c.EntireRow.Delete
End If
Next
End Sub
Try this:
Option Explicit
Sub Clean()
Dim rngRow As Range
Dim rngCell As Range
Dim MyRange As Range
Dim blnDel As Boolean
Dim lngCount As Long
Set MyRange = Range("A1:E8")
For lngCount = MyRange.Rows.Count To 1 Step -1
blnDel = False
For Each rngCell In MyRange.Rows(lngCount).Cells
If rngCell = "ABC" Then
rngCell.EntireRow.Interior.Color = vbRed
blnDel = True
ElseIf rngCell = "DEF" Then
rngCell.EntireRow.Interior.Color = vbYellow
blnDel = True
End If
Next rngCell
If Not blnDel Then Rows(lngCount).Delete
Next lngCount
End Sub
In general, you need to loop through the rows, and then through each cell in every row. In order for the program to remember whether something should be deleted or not on a given row, between the two loops there is a blnDel, which deletes the row, if no DEF or ABC was found.
The problematic part in rows deletion in VBA, is that you should be careful to delete always the correct one. Thus, you should make a reversed loop, starting from the last row.
Option Explicit
Sub Clean()
Dim c As Range, MyRange As Range, DelRng As Range, Code As Range, CodeList As Range
Dim CodeCol As Long, LastRow As Long
''Uncomment the below. I'd put all of your codes into one sheet and then test if the value is in that range
'With CodeListSheet
' Set CodeList = .Range(.Cells(1, 1), .Cells(.Cells(.Rows.Count, 1).End(xlUp).Row, 1))
'End With
' Update this to point at the relevant sheet
' If you're looking at multiple sheets you can loop through the sheets starting your loop here
With Sheet1
Set Code = .Cells.Find("Code")
If Not Code Is Nothing Then
CodeCol = Code.Column
LastRow = .Cells(Cells.Rows.Count, CodeCol).End(xlUp).Row
Set MyRange = .Range(.Cells(1, CodeCol), .Cells(LastRow, CodeCol))
For Each c In MyRange
If c.Value2 = "Code" Then
c.EntireRow.Interior.Color = xlNone
'' Also uncomment this one to replace your current one
'ElseIf WorksheetFunction.CountIf(CodeList, c.Value2) > 0 Then
ElseIf UCase(c.Value2) = "ABC" Or c.Value2 = "DEF" Then
c.EntireRow.Interior.Color = vbYellow
Else
If DelRng Is Nothing Then
Set DelRng = c
Else
Set DelRng = Union(DelRng, c)
End If
End If
Next c
If Not DelRng Is Nothing Then DelRng.EntireRow.Delete
Else
MsgBox "Couldn't find correct column"
Exit Sub
End If
End With
End Sub

Deleting or keeping multiple rows by a specific word content

I'm trying to write a code that either deletes or keeps rows by a specific word input by the end-user.
I've created two button actions:
Sub Button1_Click()
Dim cell As Range
word1 = InputBox("Enter a word by which you want to keep rows", "Enter")
For Each cell In Selection
cell.EntireRow.Hidden = (InStr(1, cell, word1, 1) = 0) 'keep by a word input by the user
Next
End Sub
Sub Button2_Click()
Dim cell As Range
word2 = InputBox("Enter a word by which you want to delete rows", "Enter")
For Each cell In Selection
cell.EntireRow.Hidden = (InStr(1, cell, word2, 1) = 1) 'delete by a word input by the user
Next
End Sub
However, these buttons don't work quite the way I would like them to do.
Problems:
1) I have to specifically select the cells in the column of the text to be searched; if I select the whole block of data,everything will be deleted.
2) Actually, the program would be handier, if it did its magic from the cell J22 onwards (to the right and downwards) until the end of the data is reached, without the need to select anything. What is the best way to do this?
3) If I use these buttons several times sequentially, the rows that I've already deleted keep popping up again. How to make the delete "permanent" each time I use one of these buttons? By changing Hidden to Delete I start to get run-time errors.
When you attempt to delete permanently the macro deletes a row, shifts all of the other rows up one to accomodate and this disrupts the flow of your 'For Each...Next'.
There are a couple of ways around this either way it very much changes the shape of your code.
One of them is to add the rows you wish to delete to a union during the loop and then delete the union outside of the loop (example A below). In any case it sounds like you want to specify the range you want this code to work on so I've incorporated that into each example.
Example A
Sub Button1_Click()
Dim endR As Integer, endC As Integer 'depending on size of sheet may need to change to Long
Dim cell As Range, rng As Range, U As Range
Dim ws As Worksheet
Set ws = Sheets(2) ' change accordingly
endR = ws.UsedRange.Rows.Count
endC = ws.UsedRange.Columns.Count
Set rng = Range(ws.Cells(22, 10), ws.Cells(endR, endC)) ' from cell J22 to last used row of the last used column on the right
word1 = InputBox("Enter a word by which you want to keep rows", "Enter")
For Each cell In rng
If InStr(1, cell, word1, 1) = 0 Then
If U Is Nothing Then ' for the first time the code finds a match
Set U = cell.EntireRow ' add row to be deleted to U variable
Else
Set U = Union(U, cell.EntireRow) ' for any subsequent matches, add row to be deleted to Union
End If
End If
Next
U.Delete
End Sub
The other way to do it would be to define the exact ranges you want to work with at the start of your code and then loop backwards through that range using loop control variables instead of for each, that way when you delete a row, the shift up doesn't impact the loop.
Sub Button2_Click()
Dim r As Integer, c As Integer
Dim endR As Integer, endC As Integer
Dim cell As Range, rng As Range
Dim ws As Worksheet
Set ws = Sheets(2) ' change accordingly
endC = ws.UsedRange.Columns.Count
word2 = InputBox("Enter a word by which you want to delete rows", "Enter")
For c = 10 To endC ' start from J and move to the right
endR = ws.UsedRange.Rows.Count ' after each column has been dealt with, re-evaluate the total rows in the worksheet
For r = endR To 22 Step -1 ' start from the last row and work up
If InStr(1, ws.Cells(r, c), word2, 1) = 1 Then
ws.Cells(r, c).EntireRow.Delete
End If
Next r
Next c
End Sub
With your current code, if you select the whole block of data, it checks each cell in that selection individually and acts accordingly. If you have a range selected like A1:J1,000, it will hide every row unless each cell in every row of the selection contains the input word.
Depending on what you exactly want, you could try something Sheets("Sheet1").Cells(Sheets("Sheet1").Rows.count, 10).End(xlUp).Row this returns the rownumber of the last cell in column 10(J), more examples of this in the code below
This is caused by the for loop and the deletion of rows, say For i = 1 To 100 you check cells A1 to A100, if you then delete a row during that loop, the loop will still continue to 100 and not end at 99, the end of the loop is set before the loop starts and does not change during the loop. More information on that and it's solutions here.
General
Avoid the .Select/.Activate methods and .Selection property, it is a source of many bugs.
Declare all your variables, use Option Explicit to enforce this.
Here is the refactored code with annotations.
Option Explicit
Sub Button1_Click()
'Keep rows based on input
'Declaration of variables
Dim i As Long
Dim strFilterWord As String
Dim rngCell As Range
Dim rngToDelete As Range, rngRow As Range
Dim arrRow() As Variant, arrTmp() As Variant
'Setting the filter word
strFilterWord = InputBox("Enter a word by which you want to keep rows", "Enter")
With ThisWorkbook.Worksheets("Sheet1") 'Replace "Sheet1" with the actual name of your sheet.
'Setting up for loop, currently range to loop over is J22:J(lastrow with data)
For Each rngCell In .Range(.Cells(22, 10), .Cells(Rows.Count, 10).End(xlUp))
'All values of the current row are combined into an array
'Determining and setting the range of the current row
Set rngRow = rngCell.Resize(1, 3)
'Populate a tmp array with the row range values
arrTmp = rngRow
'To use the array, it needs to be 1D, currently it is 2D, section below accomplishes this
'resize the final array
ReDim arrRow(LBound(arrTmp, 2) To UBound(arrTmp, 2))
'Copy values to final array
For i = LBound(arrTmp, 2) To UBound(arrTmp, 2)
arrRow(i) = arrTmp(1, i)
Next i
'the final array is combined to a single string value with " "(spaces) between each array element
'if the filterword is not found in the string Instr returns a 0
'If the filterword is found in the string InStr returns a number corresponding to the start position.
If InStr(1, Join(arrRow, " "), strFilterWord, vbTextCompare) = 0 Then
'Test to see if the range to delete is empty or not
If rngToDelete Is Nothing Then
'If the range is empty, it is set to the first row to delete.
Set rngToDelete = rngCell.EntireRow
Else
'if the range is not empty, the row to delete is added to the range.
Set rngToDelete = Union(rngToDelete, rngCell.EntireRow)
End If
End If
Next rngCell
'After all cells are looped over, the rows to delete are deleted in one go
If Not rngToDelete Is Nothing Then rngToDelete.Delete
End With
End Sub
Sub Button2_Click()
'Keep rows based on input
'Declaration of variables
Dim i As Long
Dim strFilterWord As String
Dim rngCell As Range
Dim rngToDelete As Range, rngRow As Range
Dim arrRow() As Variant, arrTmp() As Variant
'Setting the filter word
strFilterWord = InputBox("Enter a word by which you want to delete rows", "Enter")
With ThisWorkbook.Worksheets("Sheet1") 'Replace "Sheet1" with the actual name of your sheet.
'Setting up for loop, currently range to loop over is J22:J(lastrow with data)
For Each rngCell In .Range(.Cells(22, 10), .Cells(Rows.Count, 10).End(xlUp))
'All values of the current row are combined into an array
'Determining and setting the range of the current row
Set rngRow = rngCell.Resize(1, 3)
'Populate a tmp array with the row range values
arrTmp = rngRow
'To use the array, it needs to be 1D, currently it is 2D, section below accomplishes this
'resize the final array
ReDim arrRow(LBound(arrTmp, 2) To UBound(arrTmp, 2))
'Copy values to final array
For i = LBound(arrTmp, 2) To UBound(arrTmp, 2)
arrRow(i) = arrTmp(1, i)
Next i
'the final array is combined to a single string value with " "(spaces) between each array element
'if the filterword is not found in the string Instr returns a 0
'If the filterword is found in the string InStr returns a number corresponding to the start position.
If InStr(1, Join(arrRow, " "), strFilterWord, vbTextCompare) > 0 Then
'Test to see if the range to delete is empty or not
If rngToDelete Is Nothing Then
'If the range is empty, it is set to the first row to delete.
Set rngToDelete = rngCell.EntireRow
Else
'if the range is not empty, the row to delete is added to the range.
Set rngToDelete = Union(rngToDelete, rngCell.EntireRow)
End If
End If
Next rngCell
'After all cells are looped over, the rows to delete are deleted in one go
If Not rngToDelete Is Nothing Then rngToDelete.Delete
End With
End Sub
This should do the trick
Option Explicit
Sub DeletingRowContainingSpecificText()
Dim DataWorkSheet As Worksheet
'Change "ThisWorkBook" an "Sheet1" as you require
Set DataWorkSheet = ThisWorkbook.Worksheets("Sheet1")
Dim LastRow As Long
Dim LastColumn As Long
With DataWorkSheet.UsedRange
LastRow = .Rows(.Rows.Count).Row
LastColumn = Columns(.Columns.Count).Column
End With
Dim word1 As String
word1 = InputBox("Enter a word by which you want to keep rows", "Enter")
Dim RowRange As Range
Dim RowReference As Long
Dim RowContent As String
Dim WordFound As Variant
'When ever you are deleting you need to start at the end and work your way back
'Otherwise the row after the row you deleted becomes the current row
For RowReference = LastRow To 22 Step -1
'Setting the Row Range from Column J to the end for a specific row
Set RowRange = ThisWorkbook.Worksheets("Sheet1").Range(Cells(RowReference, "J"), Cells(RowReference, LastColumn))
Set WordFound = RowRange.Find(What:=word1, LookIn:=xlValues)
If Not WordFound Is Nothing Then
'Choose if you want to delete or hidden
RowRange.EntireRow.Hidden = True
RowRange.EntireRow.Delete
End If
Next RowReference
End Sub
Just paste the Sub Content into your Button1_Click Sub. Otherwise paste this into your WorkBook Module and then test if it is working first.
I did test it and it worked for me.
NB when working with Deleting Rows or Columns always start at the end of the list and work your way to the beginning, this way the reference doesn't get messed up.
the problem resides in using Selection. You should avoid it at all costs!
If the data always is in the same region, this becomes quite simple. Try something like:
Sub Button1_Click()
Dim cell As Range
Dim rData as Range
'Assigns the range for J22 and adjacent rows and columns
Set rData = ActiveSheet.Range("J22").CurrentRegion
word1 = InputBox("Enter a word by which you want to keep rows", "Enter")
For Each cell In rData
If (InStr(1, cell, word1, 1) = 0) then cell.EntireRow.Delete
Next cell
End Sub
As you are not using Selection anymore, your 3 points get solved

Making VBA-Excel code more Efficient

I am running this vba code in Excel, it copies a columns from sheet 1, pastes it into sheet two. It then compares it to a column in sheet two before deleting any duplicates.
Private Sub CommandButton1_Click()
Dim MasterList As New Dictionary
Dim iListCount As Integer
Dim x As Variant
Dim iCtr As Integer
Dim v As Variant
Dim counter As Integer, i As Integer
counter = 0
Sheets("Sheet2").Select
Sheets("Sheet2").Range("M:M").Select
Selection.ClearContents
Sheets("Sheet1").Select
Sheets("Sheet1").Range("C:C").Select
Selection.Copy
Sheets("Sheet2").Select
Sheets("Sheet2").Range("M1").Select
ActiveSheet.Paste
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
' Get count of records in master list
iListCount = Sheets("sheet2").Cells(Rows.Count, "A").End(xlUp).Row
'Load Dictionary:
For iCtr = 1 To iListCount
v = Sheets("sheet2").Cells(iCtr, "A").value
If Not MasterList.Exists(v) Then MasterList.Add v, ""
Next iCtr
'Get count of records in list to be deleted
iListCount = Sheets("sheet2").Cells(Rows.Count, "M").End(xlUp).Row
'Loop through the "delete" list.
For iCtr = iListCount To 1 Step -1
If MasterList.Exists(Sheets("Sheet2").Cells(iCtr, "M").value) Then
Sheets("Sheet2").Cells(iCtr, "M").Delete shift:=xlUp
End If
Next iCtr
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
MsgBox "Done!"
End Sub
There is just under 30,000 rows that it has to compare so I know that it is always going to take some time, but I was wondering if there was any way to speed it up or even just make my code more streamline and efficient.
Don't copy and paste from sheet 1 to sheet 2. Store the values from both sheets in arrays:
Dim v1 as variant, v2 as variant
v1 = Sheet1.Range("C:C").Value
v2 = Sheet2.Range("A1").Resize(iListCount,1).Value
Then read the values in v1 into a dictionary, loop through the values in v2 and check if each of them exists in the dictionary or not. If they exist, remove the item from the dictionary.
This will make it a bit more efficient
Dim MasterList As New Dictionary
Dim iListCount As Integer
Dim x As Variant
Dim iCtr As Integer
Dim v As Variant
Dim counter As Integer, i As Integer
counter = 0
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
With Sheets("Sheet2")
.Range("M:M").ClearContents
Sheets("Sheet1").Range("C:C").Copy
.Range("M1").Paste
' Get count of records in master list
iListCount = .Cells(Rows.Count, "A").End(xlUp).Row
'Load Dictionary:
For iCtr = 1 To iListCount
v = .Cells(iCtr, "A").Value
If Not MasterList.Exists(v) Then MasterList.Add v, ""
Next iCtr
'Get count of records in list to be deleted
iListCount = .Cells(Rows.Count, "M").End(xlUp).Row
' Loop through the "delete" list.
For iCtr = iListCount To 1 Step -1
If MasterList.Exists(.Cells(iCtr, "M").Value) Then
.Cells(iCtr, "M").Delete shift:=xlUp
End If
Next iCtr
End With
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
MsgBox "Done!"
If you really wanted to make it more effceint I would change below
' Loop through the "delete" list.
For iCtr = iListCount To 1 Step -1
If MasterList.Exists(.Cells(iCtr, "M").Value) Then
.Cells(iCtr, "M").Delete shift:=xlUp
End If
Next iCtr
So that you miss the sheet. e.g. delete them out of the dictionary and then clear the list and then output the dictionary in one line of code. Accessing the sheet is the costly part in terms of CPU use, limit how many times you access the sheet for much much faster code. you could also try to remove the loop for reading entries in and try and do that in one line of code too
Slow parts to consider
.Cells(iCtr, "A").Value
and probably causing most of the time below
.Cells(iCtr, "M").Delete shift:=xlUp
Here is my version of optimized code.
Comments about the concepts used are put in the code.
Private Sub CommandButton1_Click()
Dim MasterList As New Dictionary
Dim data As Variant
Dim dataSize As Long
Dim lastRow As Long
Dim row As Long
Dim value As Variant
Dim comparisonData As Variant
Dim finalResult() As Variant
Dim itemsAdded As Long
'-----------------------------------------------------------------
'First load data from column C of [Sheet1] into array (processing
'data from array is much more faster than processing data
'directly from worksheets).
'Also, there is no point to paste the data to column M of Sheet2 right now
'and then remove some of them. We will first remove unnecessary items
'and then paste the final set of data into column M of [Sheet2].
'It will reduce time because we can skip deleting rows and this operation
'was the most time consuming in your original code.
With Sheets("Sheet1")
lastRow = .Range("C" & .Rows.Count).End(xlUp).row
data = .Range("C1:C" & lastRow)
End With
'We can leave this but we don't gain much with it right now,
'since all the operations will be calculated in VBA memory.
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'We make the same operation to load data from column A of Sheet2
'into another array - [comparisonData].
'It can seem as wasting time - first load into array instead
'of directly iterating through data, but in fact it will allow us
'to save a lot of time - since iterating through array is much more
'faster than through Excel range.
With Sheets("Sheet2")
lastRow = .Range("A" & .Rows.Count).End(xlUp).row
comparisonData = .Range("A1:A" & lastRow)
End With
'Iterate through all the items in array [comparisonData] and load them
'into dictionary.
For row = LBound(comparisonData, 1) To UBound(comparisonData, 1)
value = comparisonData(row, 1)
If Not MasterList.Exists(value) Then
Call MasterList.Add(value, "")
End If
Next row
'Change the size of [finalResult] array to make the place for all items
'assuming no data will be removed. It will save some time because we
'won't need to redim array with each iteration.
'Some items of this array will remain empty, but it doesn't matter
'since we only want to paste it into worksheet.
'We create 2-dimensional array to avoid transposing later and save
'even some more time.
dataSize = UBound(data, 1) - LBound(data, 1)
ReDim finalResult(1 To dataSize, 1 To 1)
'Now iterate through all the items in array [data] and compare them
'to dictionary [MasterList]. All the items that are found in
'[MasterDict] are added to finalResult array.
For row = LBound(data, 1) To UBound(data, 1)
value = data(row, 1)
If MasterList.Exists(value) Then
itemsAdded = itemsAdded + 1
finalResult(itemsAdded, 1) = value
End If
Next row
'Now the finalResult array is ready and we can print it into worksheet:
Dim rng As Range
With Sheets("Sheet2")
Call .Range("M:M").ClearContents
.Range("M1").Resize(dataSize, 1) = finalResult
End With
'Restore previous settings.
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
MsgBox "Done!"
End Sub

Excel Moving duplicate values to new sheet

I have compiled this code from bit and pieces I have found - I am by no means an expert - more of an eager student - This code works for me but now I need to keep the first occurrence of the duplicate row to stay on the original worksheet and move only the subsequent occurrence(s) to the newly created sheet.
I am willing to redo all the code if needed but would prefer to modify the existing vba for the sake of time
Sub moveduplicates
'***************************************************************
'** This proc expects you to select all the cells in a single **
'** column that you want to check for duplicates in. If dup- **
'** licates are found, the entire row will be copied to the **
'** predetermined sheet. **
'***************************************************************
Set Rng = ActiveCell
'Sticky_Selection()
Dim s As Range
Set s = Selection
Cells.EntireColumn.Hidden = False
Worksheets.Add(After:=Worksheets(Worksheets.Count)).Name = "Duplicate Values"
Sheets("Data").Select
Range("A2").Select
Range(Selection, Selection.End(xlToRight)).Select
Selection.Copy
Sheets("Duplicate Values").Select
Range("A1").Select
ActiveSheet.Paste
s.Parent.Activate
s.Select 'NOT Activate - possibly more than one cell!
Dim ShO As Worksheet
Dim Rng1 As Range
Dim pRow As Integer
Dim c As Range, cTmp As Range
Dim found
Dim Addresses() As String
Dim a() As String
Dim p2 As Integer
Dim tfFlag As Boolean, sTmp As Variant
Set ShO = Worksheets("Duplicate Values") 'You can change this to whatever worksheet name you want the duplicates in Set Rng1 = Application.InputBox("Select a range", "Obtain Range Object", Type:=8)
MsgBox "The cells selected were " & Rng.Address 'Rng1 is all the currently selected cells
pRow = 2 'This is the first row in our output sheet that will be used to record duplicates
ReDim a(0) 'Initialize our array that holds found values
For Each c In Rng1.Cells 'Cycle through each cell in our selected range
ReDim Addresses(0) 'This array holds the cell address for our duplicates.
'We will reset the array each time we move to the next cell
Now check the array of already found duplicates.
If the current value is already there skip to next value
tfFlag = False
For Each sTmp In a
If CStr(c.Value) = sTmp Or CStr(c.Value) = "xXDeleteXx" Then 'We've already done this value, move on
tfFlag = True
Exit For
End If
Next
If Not tfFlag Then 'Remember the flag is true when we have already located the
'duplicates for this value, so skip to next value
With Rng1
Set found = .Find(c.Value, LookIn:=xlValues) 'Search entire selected range for value
If Not found Is Nothing Then 'Found it
Addresses(0) = found.Address 'Record the address we found it
Do 'Now keep finding occurances of it
Set found = .FindNext(found)
If found.Address <> Addresses(0) Then
ReDim Preserve Addresses(UBound(Addresses) + 1)
Addresses(UBound(Addresses)) = found.Address
End If
Loop While Not found Is Nothing And found.Address <> Addresses(0) 'Until we get back to the original address
If UBound(Addresses) > 0 Then 'We Found Duplicates
a(UBound(a)) = c.Value 'Record the value we found a duplicate for in an array
'ReDim Preserve a(UBound(a) + 1) 'add an empty spot to the array for next value
'ShO.Range("A" & pRow).Value = "Duplicate Rows for Value " & c.Value & _
" in Column " & c.Column & " on original sheet" 'Add a label row
'pRow = pRow + 1 'Increment to the next row
For p2 = UBound(Addresses) To 0 Step -1 'Cycle through the duplicate addresses
Set cTmp = Rng1.Worksheet.Range(Addresses(p2)) 'we just want to easily get the correct row to copy
Rng1.Worksheet.Rows(cTmp.Row).Copy ShO.Rows(pRow) 'Copy form orig to duplicates sheet
cTmp.Value = "xXDeleteXx" 'Mark for Delete the original row
pRow = pRow + 1 'Increment row counter
Next p2
'Row = pRow + 1 'This increment will give us a blank row between sets of duplicates
End If
End If
End With
End If
Next
'Now go delete all the marked rows
Do
tfFlag = False
For Each c In Rng1
If c.Value = "xXDeleteXx" Then
Rng1.Worksheet.Rows(c.Row).Delete (xlShiftUp)
tfFlag = True
End If
Next
Loop Until tfFlag = False
'AutoFit Every Worksheet Column in a Workbook
For Each sht In ThisWorkbook.Worksheets
sht.Cells.EntireColumn.AutoFit
Next sht
Application.Goto Rng
End
End Sub
Thank you very much for your time and consideration
You can use a scripting Dictionary object to keep track of duplicates:
Sub RemoveDups()
Dim c As Range, dict, rngDel As Range, rw As Long
Dim wb As Workbook
Dim shtDups As Worksheet
Dim rng1 As Range
Set rng1 = Selection 'assuming you've selected a single column of values
' from which you want to remove dups
Set wb = ActiveWorkbook
Set shtDups = wb.Worksheets.Add( _
after:=wb.Worksheets(wb.Worksheets.Count))
shtDups.Name = "Duplicate Values"
With rng1.Parent
.Range(.Range("A2"), .Range("A2").End(xlToRight)).Copy _
shtDups.Range("A1")
End With
rw = 2
Set dict = CreateObject("scripting.dictionary")
For Each c In rng1.Cells
'already seen this value?
If dict.exists(c.Value) Then
c.EntireRow.Copy shtDups.Cells(rw, 1)
rw = rw + 1
'add row to "delete" range
If rngDel Is Nothing Then
Set rngDel = c
Else
Set rngDel = Application.Union(c, rngDel)
End If
Else
'first time for this value - add to dictionary
dict.Add c.Value, 1
End If
Next c
'delete all duplicate rows (if found)
If Not rngDel Is Nothing Then
rngDel.EntireRow.Delete
End If
End Sub
Another enthusiastic amateur here!
Not really answering your question, but here is a little function I use for removing duplicate rows:
Sub RemoveDupes(TempWB As Workbook, TargetSheet As String, ConcatCols As String, DeleteTF As Boolean)
Dim Counter As Integer
Dim Formula As String
Dim RowCount As Integer
Dim StartingCol As String
Dim CurrentRow As Integer
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Remove duplicate rows on a worksheet '
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Prerequisites:
' - Data needs to start # A1
' - Data has headings in row 1
' determine number of rows to be processed
RowCount = TempWB.Sheets(TargetSheet).Cells(TempWB.Sheets(TargetSheet).Rows.Count, "A").End(xlUp).Row
' insert a column to hold the calculate unique key
TempWB.Sheets(TargetSheet).Columns("A:A").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
' add a heading
TempWB.Sheets(TargetSheet).Cells(1, 1).Value = "Duplication Check"
' insert the unique key formula
For CurrentRow = 2 To RowCount
' start the formula string
Formula = "="
' construct the formula
For Counter = 1 To Len(ConcatCols)
' if we are on the last element, dont add another '&'
If Counter = Len(ConcatCols) Then
Formula = Formula & AddLetter(Mid(ConcatCols, Counter, 1)) & CurrentRow
Else
Formula = Formula & AddLetter(Mid(ConcatCols, Counter, 1)) & CurrentRow & "&"
End If
' Debug.Print Mid(ConcatCols, Counter, 1)'Next
' next element!
Next
' insert the newly constructed formula
TempWB.Sheets(TargetSheet).Cells(CurrentRow, "A").Formula = Formula
' next row
Next
' unfortunately we need to use explicit selection here *sigh*
TempWB.Sheets(TargetSheet).Activate
' to select the range we are going to test
TempWB.Sheets(TargetSheet).Range("A2:A" & TempWB.Sheets(TargetSheet).Cells(Rows.Count, "A").End(xlUp).Row).Select
' clock down the list flagging each dupe by changing the text color
Dim d As Object, e
Set d = CreateObject("scripting.dictionary")
For Each e In Intersect(Columns(ActiveCell.Column), ActiveSheet.UsedRange)
If e.Value <> vbNullString Then
If Not d.exists(e.Value) Then d(e.Value) = 1 Else _
e.Font.ColorIndex = 4
End If
Next
' if the delete flag is set...
If DeleteTF Then
' then go down the list deleting rows...
For CurrentRow = RowCount To 2 Step -1
' if the row has been highlighted, its time to go...
If TempWB.Sheets(TargetSheet).Cells(CurrentRow, "A").Font.ColorIndex = 4 Then
TempWB.Sheets(TargetSheet).Cells(CurrentRow, "A").EntireRow.Delete
End If
Next
' If we are deleting rows, remove the column just like we were never here
TempWB.Sheets(TargetSheet).Cells(1, "A").EntireColumn.Delete
End If
End Sub
Function AddLetter(Letter As String)
' gives you the next letter
AddLetter = Split(Cells(, Range(Letter & 1).Column + 1).Address, "$")(1)
End Function
When I get a sec I will have a go adapting this to your requirements...
This will search a specified column for duplicates, copying subsequent duplicates entries to Sheet2 and then remove them from Sheet1.
I've used the Scripting Dictionary too but you will need to add a reference to "Microsoft Scripting Runtime" for the code to work as-is. (Adding the reference will help if you want to learn about dictionaries since it adds the Dictionary to Intellitype code completion stuff)
Sub Main()
Dim SearchColumn As Integer: SearchColumn = 2 ' column to search for duplicates
Dim Source As Worksheet: Set Source = ThisWorkbook.Worksheets("Sheet1")
Dim Duplicates As Worksheet: Set Duplicates = ThisWorkbook.Worksheets("Sheet2")
Dim List As Dictionary: Set List = New Dictionary ' used to hold the first instance of unique items
Dim Data As Variant ' holds a copy of the column you want to search
Dim Count As Integer ' hold the size of said column
Dim Index As Integer ' iterator for data
Dim Item As String ' holds the current item
Count = Source.Cells(Source.Rows.Count, SearchColumn).End(xlUp).Row
Set Data = Source.Range(Source.Cells(1, SearchColumn).Address, Source.Cells(Count, SearchColumn).Address)
Application.ScreenUpdating = False
' first loop, find unique items and copy duplicates
For Index = 1 To Count
Item = Data(Index, 1)
If List.Exists(Item) = False Then
' add the item to our dictionary of items
List.Add Item, Index
Else
' add item to duplicates sheet as its a duplicate
Source.Rows(Index).Copy
Duplicates.Rows(1).Insert xlShiftDown
End If
Next Index
' second loop, remove duplicates from original sheet
For Index = Count To 1 Step -1
Item = Data(Index, 1)
If List.Exists(Item) Then
If Not List(Item) = Index Then
' the item is a duplicate and needs to be removed
Source.Rows(Index).Delete
End If
End If
Next Index
Application.ScreenUpdating = True
End Sub

Excel Macro, read a worksheet, remove lines with no data based off value in a column

I'm trying to read a column, which has a numerical value, to indicate whether or not to search that row to see if there is any data contained within the specified range of that row. If there is no data contained within the range, select that row to be deleted. There will be many rows to be deleted once it has looped through the worksheet.
For example, in column "C" when the value "0" is found, search that row to see if there is any data contained in the cells, the cell range to search for empty cells in that row is D:AM. If the cells in the range are empty, then select that row and delete it. The entire row can be deleted. I need to do this for the entire worksheet, which can contain up to 20,000 rows. The problem I'm having is getting the macro to read the row, once the value 0 is found, to determine if the range of cells(D:AM) are empty. Here is the code I have thus far:
Option Explicit
Sub DeleteBlankRows()
'declare variables
Dim x, curVal, BlankCount As Integer
Dim found, completed As Boolean
Dim rowCount, rangesCount As Long
Dim allRanges(10000) As Range
'set variables
BlankCount = 0
x = 0
rowCount = 2
rangesCount = -1
notFirst = False
'Select the starting Cell
Range("C2").Select
'Loop to go down Row C and search for value
Do Until completed
rowCount = rowCount + 1
curVal = Range("C" & CStr(rowCount)).Value
'If 0 is found then start the range counter
If curVal = x Then
found = True
rangesCount = rangesCount + 1
'reset the blanks counter
BlankCount = 0
'Populate the array with the correct range to be selected
Set allRanges(rangesCount) = Range("D" & CStr(rowCount) & ":AM" & CStr(rowCount))
ElseIf (found) Then
'if the cell is blank, increment the counter
If (IsEmpty(Range("I" & CStr(rowCount)).Value)) Then BlankCount = BlankCount + 1
'if counter is greater then 20, reached end of document, stop selection
If BlankCount > 20 Then Exit Do
End If
'In the safest-side condition to avoid an infinite loop in case of not of finding what is intended.
If (rowCount >= 25000) Then Exit Do
Loop
If (rangesCount > 0) Then
'Declare variables
Dim curRange As Variant
Dim allTogether As Range
'Set variables
Set allTogether = allRanges(0)
For Each curRange In allRanges
If (Not curRange Is Nothing) Then Set allTogether = Union(curRange, allTogether)
Next curRange
'Select the array of data
allTogether.Select
'delete the selection of data
'allTogether.Delete
End If
End Sub
The end of the document is being determined by Column C when it encounters 20 or more blank cells the worksheet has reached its end. Thanks in advance for your input!
This should work for you. I have commented the code to help give it clarity:
Sub DeleteBlankRows()
Dim rngDel As Range
Dim rngFound As Range
Dim strFirst As String
'Searching column C
With Columns("C")
'Find "0" in column C
Set rngFound = .Find(0, .Cells(.Cells.Count), xlValues, xlWhole)
If Not rngFound Is Nothing Then
'Remember first one found
strFirst = rngFound.Address
Do
'Check if there is anything within D:AM on the row of this found cell
If WorksheetFunction.CountA(Intersect(rngFound.EntireRow, .Parent.Range("D:AM"))) = 0 Then
'There is nothing, add this row to rngDel
Select Case (rngDel Is Nothing)
Case True: Set rngDel = rngFound
Case Else: Set rngDel = Union(rngDel, rngFound)
End Select
End If
'Find next "0"
Set rngFound = .Find(0, rngFound, xlValues, xlWhole)
'Advance loop; exit when back to the first one
Loop While rngFound.Address <> strFirst
End If
End With
'Delete all rows added to rngDel (if any)
If Not rngDel Is Nothing Then rngDel.EntireRow.Delete
End Sub