Remove duplicate values and cells from one column - vba

I have tried so many methods from the removeduplicates, selections and scripting dictionaries and I cannot get this to work. I do understand there are multiple ways to do this but if any of you can help, that would be great.
I have one list of values that I am pulling through from another sheet (up to approx 80k rows) into cell B13 downwards. I am then trying to remove the duplicate values and cells so I am left with unique values which I can then use to perform lookups on other sheets.
Sub Address_Sage()
Dim dataBook As Workbook
Dim dict As Object
Dim Sage_Data As Worksheet, Address As Worksheet
Dim dataSource As Range, dataDest As Range
Dim sourceDataRowCount As Integer, index As Integer
Dim rowCount As Long
Dim strVal As String
Set dataBook = Application.ThisWorkbook
Set sheetSource = dataBook.Sheets("Sage_Data")
Set sheetDest = dataBook.Sheets("Address")
Set dict = CreateObject("Scripting.Dictionary")
Set dataSource = sheetSource.Range("A3", _
sheetSource.Range("A90000").End(xlUp))
sourceDataRowCount = dataSource.Rows.Count
Set dataDest = sheetDest.Range("B13", "B" & _
sourceDataRowCount)
For index = 1 To sourceDataRowCount
dataDest(index, 1).Value = dataSource(index, 1).Value
Next index
Sheets("Address").Select
rowCount = ActiveSheet.Range("B13").CurrentRegion.Rows.Count
Do While rowCount > 0
strVal = Address.Cells(rowCount, 1).Value2
If dict.exists(strVal) Then
ActiveSheet.Rows(rowCount).EntireRow.Delete
Else
dict.Add strVal, 0
End If
rowCount = rowCount - 1
Loop
'Set dict = Nothing
End Sub
It always gets stuck on strVal line. I have tried changing value2 to value1 as I only have column but no luck.
thank you

Not super experienced in VBA so I can't speak to exactly what you're doing and what your code is saying but I thought I'd share this with you. Last week I had to create a macrobook that returned the unique entries of electrical defects that different crews observed while on the job. I made a dictionary that read all of the entries in the spreadsheet and then later printed all of the unique entries. I'll post the code and try to walk you through it.
If .Range("A" & i) <> "" Then
If dict.Exists(data) Then
dict(data) = dict(data) + 1
Else
dict.Add Key:=Data, Item:="1"
End If
End If
So the code basically says if column A (i is simply an incrementer) is not empty, then we're going to read the entries of column A. Data is simply a variable and you would set it equal to the range of values you'd like read in the dictionary. Obviously dictionary keys are unique and cannot repeat, so the code asks if the key already exists in the dictionary. If so, we will add one to it's count or value. And if not we will add that key to the dictionary. At the end of your loop, your dictionary will have stored all unique entries and the number of times they appeared.
Now we can reference them or print them.
For r = 0 To dict.Count
Sheets("Results").Range("B" & iResults) = dict.Keys(r)
Sheets("Results").Range("C" & iResults) = dict(dict.Keys(r))
Next
This second piece of code is a loop from 0 to the number of entries in your dictionary. It starts at zero because the dictionary is stored like an array and VBA arrays are base zero. The first statement will print the unique keys for every r until there are no more entries in the dictionary. The second statement will print the value or items associated with them. It will be an integer value equal to the number of times that unique entry showed up in your data.
You can use this same method for other purposes as well, not just printing the data but referencing it and using it somewhere else. But I am sure you will find that the For-loop with dict.Keys(r) is the easiest way to run through your dictionary entries. Took me a few days to figure it out and it revolutionized my program. Hope this helps you out.

Related

Manipulating Excel spreadsheet, removing rows based on values in a column and then removing more rows based on values in another column

I have a rather complicated problem.
I have a log file that when put into excel the column "I" contains event IDs, and the column J contains a custom key that keeps a particular even grouped.
All i want to do is remove any rows that do not contain the value of say 102 in the event id column.
And THEN i need to check the custom key (column J) and remove rows that are duplicates since any duplicates will falsely show other statistics i want.
I have gotten as far as being able to retrieve the values from the columns using com objects and .entirecolumn cell value etc, but I am completely stumped as to how i can piece together a solid way to remove rows. I could not figure out how to get the row for each value.
To give a bit more clarity this is my thought process on what i need to do:
If cell value in Column I does not = 102 Then delete the row that cell contains.
Repeat for all rows in spreadsheet.
And THEN-
Read every cell in column J and remove all rows containing duplicates based on the values in column J.
Save spreadsheet.
Can any kind persons help me?
Additional Info:
Column I holds a string that is an event id number e.g = 1029
Column J holds a string that is a mix of numbers and letters = 1ASER0X3NEX0S
Ellz, I do agree with Macro Man in that your tags are misleading and, more importantly, I did indeed need to know the details of Column J.
However, I got so sick of rude posts today and yours was polite and respectful so I've pasted some code below that will do the trick ... provided Column J can be a string (the details of which you haven't given us ... see what Macro Man's getting at?).
There are many ways to test for duplicates. One is to try and add a unique key to a collection and see if it throws an error. Many wouldn't like that philosophy but it seemed to be okay for you because it also gives you a collection of all the unique (ie remaining) keys in Column J.
Sub Delete102sAndDuplicates()
Dim ws As Worksheet
Dim uniques As Collection
Dim rng As Range
Dim rowPair As Range
Dim iCell As Range
Dim jCell As Range
Dim delRows As Range
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set rng = Intersect(ws.UsedRange, ws.Range("I:J"))
Set uniques = New Collection
For Each rowPair In rng.Rows
Set iCell = rowPair.Cells(, 1)
Set jCell = rowPair.Cells(, 2)
On Error Resume Next
uniques.Add jCell.Value2, jCell.Text
If Err = 457 Or iCell.Value2 = 102 Then
On Error GoTo 0
If delRows Is Nothing Then
Set delRows = rowPair.EntireRow
Else
Set delRows = Union(delRows, rowPair.EntireRow)
End If
End If
Next
If Not delRows is Nothing then
MsgBox delRows.Address(False, False) & " deleted."
delRows.Delete
End If
End Sub
There are a number of ways in which this can be done, and which is best will depend on how frequently you perform this task and whether you want to have it fully automated. Since you've tagged your question with VBA I assume you'll be happy with a VBA-based answer:
Sub removeValues()
Range("I1").Select 'Start at the top of the I column
'We are going to go down the column until we hit an empty row
Do Until IsEmpty(ActiveCell.Value) = True
If ActiveCell.Value <> 102 Then
ActiveCell.EntireRow.Delete 'Then delete the row
Else
ActiveCell.Offset(1).Select 'Select the cell below
End If
Loop
'Now we have removed all non-102 values from the column, let`s remove the duplicates from the J column
Range("A:J").RemoveDuplicates Columns:=10, Header:=xlNo
End Sub
The key line there is Range("A:J").RemoveDuplicates. It will remove rows from the range you specify according to duplicates it finds in the column you specify. In that case, it will remove items from the A-J columns based on duplicates in column 10 (which is J). If your data extends beyond the J column, then you'll need to replace "A:J" with the appropriate range. Note that the Columns value is relative to the index of the first column, so while the J column is 10 when that range starts at A (1), it would be 2 for example if the range were only I:J. Does that make sense?
(Note: Using ActiveCell is not really best practice, but it's the method that most obviously translates to what you were trying to do and as it seems you're new to VBA I thought it would be the easiest to understand).

Count number of unique values containing text

I have the following code that counts the number of cells in a column that contains the string, "ABC-QR":
Ctr = Application.WorksheetFunction.CountIf(Sheet1.Range("D4:D1500"), "*ABC-QR*")
EU.Cells(16, 3) = Ctr
I used "ABC-QR" because that's the part of the data that doesn't change. The true data that's in those cells is, for example, "ABC-QR00012345", or whatever number it may have. I would like to modify my code to not include duplicates when it's counting.
Firstly, you must enable 'Microsoft Scripting Runtime' from within Tools --> References within the Visual Basic Editor.
You assign the data from the worksheet into an array; then import everything which fits the string criteria, and isn't a duplicate, into a dictionary. You can check for duplicates in the dictionary using the .Exists method.
EDIT: As noted by #Zev in the comments, you don't even need to use the .Exists method. You can just assign the array element to the key of the dictionary, and assign the item value as 1. Any duplicate values from the Array will overwrite the previous key, so duplicates will automatically be dealt with.
Once everything which isn't a duplicate has been imported into the dictionary, you can then use the .Count property on the dictionary. This will tell you how many records fit your string criteria, and are not duplicates, within the range passed into the array.
Option Explicit
Sub countNonDuplicates()
Dim wb As Workbook, ws As Worksheet
Dim dict As Scripting.Dictionary
Dim myValues() As Variant
Dim lRow As Long, i As Long
Set wb = ThisWorkbook
Set ws = wb.Sheets(1)
Set dict = New Scripting.Dictionary
lRow = Cells(Rows.Count, 1).End(xlUp).Row
myValues = Range(Cells(1, 1), Cells(lRow, 1))
For i = 1 To UBound(myValues, 1)
If InStr(myValues(i, 1), "ABC-QR") Then dict(myValues(i,1)) = 1 'arbitrary value
Next i
MsgBox (dict.Count)
End Sub
The above currently gets the last row of Column A and then takes the range and assigns it to the array. If you wish to use a different column, then update the following statements with the column number required (example below now uses Column D)
lRow = Cells(Rows.Count, 4).End(xlUp).Row
myValues = Range(Cells(1, 4), Cells(lRow, 4))
Also it's currently performing the above on Sheets(1). Change the worksheet number to what you require.
On 100,000 records this took 0.2 seconds to produce the count.
This array formula should do the trick:
EU.Cells(16,3).FormulaArray = "=SUM(IF(ISERROR(FIND(""ABC-QR"",D4:D1500)),0,1/(COUNTIF(D4:D1500,D4:D1500))))"
Since it's an array formula, it will operate on each cell in your range in turn and look for your text (FIND("ABC-QR",D4:D1500)). If it's not found, it returns 0 to the running SUM(). If it is found, it uses the value 1/count, where count is the number of times the cell value being tested exists in your range.

Finding Max & Min for varying lines in a list VBA excel

I'm having trouble turning my thought process into tangible code and honestly I'm not sure where to start with the code. I have a data set with two applicable columns, for the sake of simplicity we'll say A and B. A contains a list of three initials followed by a number, ex. JFD3, JFD2, JFD6, EUW1, YMG2, YMG3. Column B has a value. I need to find the range of the highest to lowest values for each set of initials, which has me thinking a max - min solution. The list of initials isn't necessarily in order, and there could be one set of initials(with a net variance of 0, which is OK), or up to 8 sets of initials, with the numbers not necessarily being consecutive. I was thinking some sort of Match(Left(3)) but i don't think that would encompass everything.
Any ideas on where to start would be much appreciated. I'll be happy to clarify if theres any questions.
You can use dictionaries from the Scripting Runtime to do this easily. Use two of them with the initials as the keys, one holding the minimum values found and the other holding the maximum values found.
Add a reference to the Microsoft Scripting Runtime (Tools->Add reference..., then check the box next to "Microsoft Scripting Runtime") or late bind (see instructions below). Something like this should do the trick, assumes initials in column 1, values in column 2, no headers:
Private Sub MinMax()
Dim mins As Dictionary
Dim maxes As Dictionary
Dim sheet As Worksheet
Set sheet = ActiveSheet
Set mins = New Dictionary
Set maxes = New Dictionary
Dim row As Long
For row = 1 To sheet.UsedRange.Rows.Count
Dim key As Variant
Dim val As Integer
key = sheet.Cells(row, 1).Value2
If Len(key) >= 3 Then
key = Left$(sheet.Cells(row, 1).Value2, 3)
val = sheet.Cells(row, 2).Value2
If Not mins.Exists(key) Then
mins.Add key, val
Else
If mins(key) > val Then mins(key) = val
End If
If Not mins.Exists(key) Then
maxes.Add key, val
Else
If maxes(key) < val Then maxes(key) = val
End If
End If
Next row
For Each key In mins.Keys
Debug.Print key & ": Min = "; mins(key) & " Max = "; maxes(key)
Next key
End Sub
To use late binding, the code is exactly the same with these exceptions. Instead of declaring mins and maxes as Dictionary, declare them as Object:
Dim mins As Object
Dim maxes As Object
And instead of setting them as New Dictionary, use CreateObject:
Set sheet = ActiveSheet
Set mins = CreateObject("Scripting.Dictionary")
Set maxes = CreateObject("Scripting.Dictionary")
Use a Pivot Table. Put your Column A field* in the Row Labels, then put column B in the Values twice. Change one from Sum to Min, and the other from Sum to Max.
* Not sure if you need to group by JFD for all JFDx or by each JFDx. If you need them grouped by the 3 initials, make a column C =left("A1",3), then use that in your
An approach to this could be:
Sort the data in the range A-B by A in alphabetical order. To do this, you can record a macro while doing this action and edit the code to make it dynamically working every time. This is required to make the below solution work, more performing for many other kinds of similar approaches.
Use While blocks to run the solution. I let you take the time to build and test a working code, but this is the idea:
startSubset = 2 '<-- we start getting the key from row 2
'build the key to define the subset
keyStart = 1
currentKey = ""
Do While Not IsNumeric(Right(Left(Range("A" & startSubset),keyStart),1))
'while the last char of the key is not numeric, let's add it to the key
currentKey = currentKey & Right(Left(Range("A" & startSubset),keyStart),1)
keyStart = keyStart + 1
Loop
After the above, the key is stored in the variable currentKey. It will be JFD if the first cell is JFD213, etc. Hence, you loop until the end of this subset storing max and min in two variables:
min = 0
max = 0
Do While Left(Range("A" & startSubset),Len(currentKey)) = currentKey
If Range("B" & startSubset) < min Then min = Range("B" & startSubset)
If Range("B" & startSubset) > max Then max = Range("B" & startSubset)
startSubset = startSubset + 1
Loop
Once this is done, you just need to cast the values into a collection, for example:
myObs.Add(currentKey)
myObs.Add(min)
myObs.Add(max) '<-- you will get something like myObs = ("DJF", 0, 100)
Then cast this object into a bigger collection:
allValues.Add(myObs) '<-- at the end you will have something like this:
'allValues = [("DJF",0,100), ("ABC", 1, 75), ...]
and re-set the values to let them continue:
currentKey = ""
keyStart = 1
All the above, should be run in a While loop that will break when the data are over.
Please note the above code cannot work standing-alone, but it's rather a possible approach to the problem that you will need to re-work on your data to make it work in real life.

Creating an Excel Macro to delete rows if a column value repeats consecutively less than 3 times

The data I have can be simplified to this:
http://i.imgur.com/mn5GgrQ.png
In this example, I would like to delete the data associated with track 2, since it has only 3 frames associated with it. All data with more than 3 associated frames can stay.
The frame number does not always start from 1, as I've tried to demonstrate. The track number will always be the same number consecutively for as many frames as are tracked. I was thinking of using a function to append 1 to a variable for every consecutive value in column A, then performing a test to see if this value is equal >= 3. If so, then go onto the next integer in A, if no, then delete all rows marked with that integer (2, in this case).
Is this possible with Visual Basic in an Excel Macro, and can anyone give me some starting tips on what functions I might be able to use? Complete novice here. I haven't found anything similar for VBA, only for R.
I assume you understand the code by reading it.
Option Explicit
Public Function GetCountOfRowsForEachTrack(ByVal sourceColumn As Range) As _
Scripting.Dictionary
Dim cell As Range
Dim trackValue As String
Dim groupedData As Scripting.Dictionary
Set groupedData = New Scripting.Dictionary
For Each cell In sourceColumn
trackValue = cell.Value
If groupedData.Exists(trackValue) Then
groupedData(trackValue) = cell.Address(False, False) + "," + groupedData(trackValue)
Else
groupedData(trackValue) = cell.Address(False, False)
End If
Next
Set GetCountOfRowsForEachTrack = groupedData
End Function
Public Sub DeleteRowsWhereTrackLTE3()
Dim groupedData As Scripting.Dictionary
Set groupedData = GetCountOfRowsForEachTrack(Range("A2:A15"))
Dim cellsToBeDeleted As String
Dim item
For Each item In groupedData.Items
If UBound(Split(item, ",")) <= 2 Then
cellsToBeDeleted = item + IIf(cellsToBeDeleted <> "", "," + cellsToBeDeleted, "")
End If
Next
Range(cellsToBeDeleted).EntireRow.Delete
End Sub
GetCountOfRowsForEachTrack is a function returning a dictionary (which stores track number as key, cell address associated with that track as string)
DeleteRowsWhereTrackLTE3 is the procedure which uses GetCountOfRowsForEachTrack to get the aggregated info of Track numbers and cells associated with it. This method loops through the dictionary and checks if the number of cells associated with track is <=2 (because splitting the string returns an array which starts from 0). It builds a string of address of such cells and deletes it all at once towards the end.
Note:
Add the following code in a bas module (or a specific sheet where
you have the data).
Add reference to "Microsoft Scripting.Runtime" library. Inside VBA, click on "Tools" -> "References" menu. Tick the "Microsoft Scripting.Runtime" and click on OK.
I have used A2:A15 as an example. Please modify it as per your cell range.
The assumption is that you don't have thousands of cells to be deleted, in which case the method could fail.
Make a call to DeleteRowsWhereTrackLTE3 to remove such rows.

Add new row to excel Table (VBA)

I have an excel which serves to record the food you ingest for a specific day and meal. I hav a grid in which each line represent a food you ate, how much sugar it has, etc.
Then i've added an save button to save all the data to a table in another sheet.
This is what i have tried
Public Sub addDataToTable(ByVal strTableName As String, ByRef arrData As Variant)
Dim lLastRow As Long
Dim iHeader As Integer
Dim iCount As Integer
With Worksheets(4).ListObjects(strTableName)
'find the last row of the list
lLastRow = Worksheets(4).ListObjects(strTableName).ListRows.Count
'shift from an extra row if list has header
If .Sort.Header = xlYes Then
iHeader = 1
Else
iHeader = 0
End If
End With
'Cycle the array to add each value
For iCount = LBound(arrData) To UBound(arrData)
**Worksheets(4).Cells(lLastRow + 1, iCount).Value = arrData(iCount)**
Next iCount
End Sub
but i keep getting the same error on the highlighted line:
Application-defined or object-defined error
What i am doing wrong?
Thanks in advance!
You don't say which version of Excel you are using. This is written for 2007/2010 (a different apprach is required for Excel 2003 )
You also don't say how you are calling addDataToTable and what you are passing into arrData.
I'm guessing you are passing a 0 based array. If this is the case (and the Table starts in Column A) then iCount will count from 0 and .Cells(lLastRow + 1, iCount) will try to reference column 0 which is invalid.
You are also not taking advantage of the ListObject. Your code assumes the ListObject1 is located starting at row 1. If this is not the case your code will place the data in the wrong row.
Here's an alternative that utilised the ListObject
Sub MyAdd(ByVal strTableName As String, ByRef arrData As Variant)
Dim Tbl As ListObject
Dim NewRow As ListRow
' Based on OP
' Set Tbl = Worksheets(4).ListObjects(strTableName)
' Or better, get list on any sheet in workbook
Set Tbl = Range(strTableName).ListObject
Set NewRow = Tbl.ListRows.Add(AlwaysInsert:=True)
' Handle Arrays and Ranges
If TypeName(arrData) = "Range" Then
NewRow.Range = arrData.Value
Else
NewRow.Range = arrData
End If
End Sub
Can be called in a variety of ways:
Sub zx()
' Pass a variant array copied from a range
MyAdd "MyTable", [G1:J1].Value
' Pass a range
MyAdd "MyTable", [G1:J1]
' Pass an array
MyAdd "MyTable", Array(1, 2, 3, 4)
End Sub
Tbl.ListRows.Add doesn't work for me and I believe lot others are facing the same problem. I use the following workaround:
'First check if the last row is empty; if not, add a row
If table.ListRows.count > 0 Then
Set lastRow = table.ListRows(table.ListRows.count).Range
For col = 1 To lastRow.Columns.count
If Trim(CStr(lastRow.Cells(1, col).Value)) <> "" Then
lastRow.Cells(1, col).EntireRow.Insert
'Cut last row and paste to second last
lastRow.Cut Destination:=table.ListRows(table.ListRows.count - 1).Range
Exit For
End If
Next col
End If
'Populate last row with the form data
Set lastRow = table.ListRows(table.ListRows.count).Range
Range("E7:E10").Copy
lastRow.PasteSpecial Transpose:=True
Range("E7").Select
Application.CutCopyMode = False
Hope it helps someone out there.
I had the same error message and after lots of trial and error found out that it was caused by an advanced filter which was set on the ListObject.
After clearing the advanced filter .listrows.add worked fine again.
To clear the filter I use this - no idea how one could clear the filter only for the specific listobject instead of the complete worksheet.
Worksheets("mysheet").ShowAllData
I actually just found that if you want to add multiple rows below the selection in your table
Selection.ListObject.ListRows.Add AlwaysInsert:=True works really well. I just duplicated the code five times to add five rows to my table
I had the same problem before and i fixed it by creating the same table in a new sheet and deleting all the name ranges associated to the table, i believe whene you're using listobjects you're not alowed to have name ranges contained within your table hope that helps thanks
Ran into this issue today (Excel crashes on adding rows using .ListRows.Add).
After reading this post and checking my table, I realized the calculations of the formula's in some of the cells in the row depend on a value in other cells.
In my case of cells in a higher column AND even cells with a formula!
The solution was to fill the new added row from back to front, so calculations would not go wrong.
Excel normally can deal with formula's in different cells, but it seems adding a row in a table kicks of a recalculation in order of the columns (A,B,C,etc..).
Hope this helps clearing issues with .ListRows.Add
As using ListRow.Add can be a huge bottle neck, we should only use it if it can’t be avoided.
If performance is important to you, use this function here to resize the table, which is quite faster than adding rows the recommended way.
Be aware that this will overwrite data below your table if there is any!
This function is based on the accepted answer of Chris Neilsen
Public Sub AddRowToTable(ByRef tableName As String, ByRef data As Variant)
Dim tableLO As ListObject
Dim tableRange As Range
Dim newRow As Range
Set tableLO = Range(tableName).ListObject
tableLO.AutoFilter.ShowAllData
If (tableLO.ListRows.Count = 0) Then
Set newRow = tableLO.ListRows.Add(AlwaysInsert:=True).Range
Else
Set tableRange = tableLO.Range
tableLO.Resize tableRange.Resize(tableRange.Rows.Count + 1, tableRange.Columns.Count)
Set newRow = tableLO.ListRows(tableLO.ListRows.Count).Range
End If
If TypeName(data) = "Range" Then
newRow = data.Value
Else
newRow = data
End If
End Sub
Just delete the table and create a new table with a different name. Also Don't delete entire row for that table. It seems when entire row containing table row is delete it damages the DataBodyRange is damaged