VBA Script to Combine Duplicate Data into One Row - vba

I'm attempting to write a script to, given a list of parts and their used-on numbers, combine the used-on values for all duplicate parts and sort the resulting values. That sounded confusing so here is an image of what I'm shooting for:
Desired Result
It should be able to handle duplicates and blanks.
Through a combination of research and tinkering, I've gotten a sortWithinCell function made, but my headaches are coming from writing the script to iterate down the list of parts, combining used-on values as you go, and then upon reaching a different part number, delete all but the first row that you started from and then continue. Assume that the input parts list is already sorted by part number.
Thanks all!
-Edit 1-
I did manage to find a way to combine the data values:
Function ConcatinateAllCellValuesInRange(sourceRange As Excel.Range) As String
Dim finalValue As String
Dim cell As Excel.Range
For Each cell In sourceRange.Cells
finalValue = finalValue + "," + CStr(cell.Value)
Next cell
ConcatinateAllCellValuesInRange = Right(finalValue, Len(finalValue) - 1)
End Function
However, now the trick is still identifying the input range for this function, i.e. when the part number changes.

A possibile solution would be to create a collection which uses the Part as Key and has the Used-On values as value, so at the first occurence of Part, the KeyValue pair is added, then in an ongoing loop, if the key already exists in the collection, take the value of the key and the one from the collection, merge them together, delete the current KeyValue pair and add a new one (You can't modify a KeyValue pair of a collection, therefore delete the old one and add a new one).
If you are finished looping through your list, you can loop through the elements in the collection and save them somewhere else on the sheet. Or if you want, delete the current range and insert everything where the old values were.
Let me know if you need help with this
EDIT
Here is an example of the method I described, though i used a Dictionary instead of a Collection, because it has more functionality (REQUIRES A REFERENCE TO "Microsoft Scripting Runtime"):
Option Explicit
Sub combineStuff()
'Create Dictionary
Dim tmpDic As New Dictionary
'tempstring to preserve the value of a key
Dim tmpValue As String
Dim i As Integer
'Loop through rows - change this to your needs
For i = 1 To 15 Step 1
'Simple check if "Part" and "Used-On" are both filled
If Cells(i, 1).Value <> "" And Cells(i, 2).Value <> "" Then
'Check if the Key already exists
If tmpDic.Exists(CStr(Cells(i, 1).Value)) Then
'If it does, save it's value
tmpValue = tmpDic.Item(CStr(Cells(i, 1).Value))
'Delete the Key Value pair
tmpDic.Remove CStr(Cells(i, 1).Value)
'Add Key Value pair with updated Value
tmpDic.Add CStr(Cells(i, 1).Value), tmpValue & "," & CStr(Cells(i, 2).Value)
Else
'Key does not exist yet, add it
tmpDic.Add CStr(Cells(i, 1).Value), Cells(i, 2).Value
End If
End If
Next i
Dim tmpKey As Variant
i = 1
'Loop through all items in the Dictionary
For Each tmpKey In tmpDic
'Insert Key
Cells(i, 4).Value = tmpKey
'Insert Value
Cells(i, 5).Value = tmpDic(tmpKey)
i = i + 1
Next tmpKey
End Sub
Note that the Dictionary will not be sorted! The key that was modified last is the last one in the list. If you want the outcome to be sorted, it might be better to use a multidimensional array to store the key and value pairs. Then you can easily sort those.

Related

Remove duplicate values and cells from one column

I have tried so many methods from the removeduplicates, selections and scripting dictionaries and I cannot get this to work. I do understand there are multiple ways to do this but if any of you can help, that would be great.
I have one list of values that I am pulling through from another sheet (up to approx 80k rows) into cell B13 downwards. I am then trying to remove the duplicate values and cells so I am left with unique values which I can then use to perform lookups on other sheets.
Sub Address_Sage()
Dim dataBook As Workbook
Dim dict As Object
Dim Sage_Data As Worksheet, Address As Worksheet
Dim dataSource As Range, dataDest As Range
Dim sourceDataRowCount As Integer, index As Integer
Dim rowCount As Long
Dim strVal As String
Set dataBook = Application.ThisWorkbook
Set sheetSource = dataBook.Sheets("Sage_Data")
Set sheetDest = dataBook.Sheets("Address")
Set dict = CreateObject("Scripting.Dictionary")
Set dataSource = sheetSource.Range("A3", _
sheetSource.Range("A90000").End(xlUp))
sourceDataRowCount = dataSource.Rows.Count
Set dataDest = sheetDest.Range("B13", "B" & _
sourceDataRowCount)
For index = 1 To sourceDataRowCount
dataDest(index, 1).Value = dataSource(index, 1).Value
Next index
Sheets("Address").Select
rowCount = ActiveSheet.Range("B13").CurrentRegion.Rows.Count
Do While rowCount > 0
strVal = Address.Cells(rowCount, 1).Value2
If dict.exists(strVal) Then
ActiveSheet.Rows(rowCount).EntireRow.Delete
Else
dict.Add strVal, 0
End If
rowCount = rowCount - 1
Loop
'Set dict = Nothing
End Sub
It always gets stuck on strVal line. I have tried changing value2 to value1 as I only have column but no luck.
thank you
Not super experienced in VBA so I can't speak to exactly what you're doing and what your code is saying but I thought I'd share this with you. Last week I had to create a macrobook that returned the unique entries of electrical defects that different crews observed while on the job. I made a dictionary that read all of the entries in the spreadsheet and then later printed all of the unique entries. I'll post the code and try to walk you through it.
If .Range("A" & i) <> "" Then
If dict.Exists(data) Then
dict(data) = dict(data) + 1
Else
dict.Add Key:=Data, Item:="1"
End If
End If
So the code basically says if column A (i is simply an incrementer) is not empty, then we're going to read the entries of column A. Data is simply a variable and you would set it equal to the range of values you'd like read in the dictionary. Obviously dictionary keys are unique and cannot repeat, so the code asks if the key already exists in the dictionary. If so, we will add one to it's count or value. And if not we will add that key to the dictionary. At the end of your loop, your dictionary will have stored all unique entries and the number of times they appeared.
Now we can reference them or print them.
For r = 0 To dict.Count
Sheets("Results").Range("B" & iResults) = dict.Keys(r)
Sheets("Results").Range("C" & iResults) = dict(dict.Keys(r))
Next
This second piece of code is a loop from 0 to the number of entries in your dictionary. It starts at zero because the dictionary is stored like an array and VBA arrays are base zero. The first statement will print the unique keys for every r until there are no more entries in the dictionary. The second statement will print the value or items associated with them. It will be an integer value equal to the number of times that unique entry showed up in your data.
You can use this same method for other purposes as well, not just printing the data but referencing it and using it somewhere else. But I am sure you will find that the For-loop with dict.Keys(r) is the easiest way to run through your dictionary entries. Took me a few days to figure it out and it revolutionized my program. Hope this helps you out.

Creating new cell values for each new value in a range

I'm having trouble with a specific process.
In column H, there are a bunch of different numbers. I'm looking for a loop, that for each newfound value in the column, counts how many times the value occur in that column, and put's the counted value in the next worksheet "Statistik".
I do have a solution were I make a code for each individual number in the column, but i'm looking for a loop, since there are 28 different values ind that column.
Image of my workbook
Does anyone have a bright solution for my problem?
Thanks in advance.
You need to use COUNTIF. Either as a formula or in your VBA code (Application.CountIf()).
In your case =COUNTIF(H:H, [UNIQUE_VALUE]) where unique value is the value you want to extract. To get unique values you have two options. One is to copy unique values from H:H to your Statistik sheet as follows:
Click the Data Ribbon Menu
Select the Advanced Button in the Sort & Filter section
Fill in the dialog Box, Copy to another location, List range H:H, Copy to Some column in Statistik sheet* making sure you tick **Unique records only
Other option to get unique values is detailed here https://exceljet.net/formula/extract-unique-items-from-a-list
For more information about COUNTIF
https://support.office.com/en-us/article/countif-function-e0de10c6-f885-4e71-abb4-1f464816df34
You could use a dictionary to output only 1 key and value
Option Explicit
Public Sub GetCount()
Dim dict As Object
Set dict = CreateObject("Scripting.Dictionary")
With Worksheets("Sheet1") 'change as appropriate
Dim arr(), i As Long
arr = Intersect(.Columns("H"), .UsedRange) '<=== Specify exact range in H to loop over if you want including header
For i = LBound(arr, 1) + 1 To UBound(arr, 1) 'assuming header to ignore otherwise start at 1
If Not dict.exists(arr(i, 1)) Then
dict.Add arr(i, 1), 1 '<==== if value not seen before add to dictionary with value of 1
Else
dict(arr(i, 1)) = dict(arr(i, 1)) + 1 ' <====== if seen before add 1 to the existing count
End If
Next i
End With
With Worksheets("Statistik")
.Range("A1") = "StudyBoard_ID"
.Range("B1") = "Count"
.Range("A2").Resize(dict.Count, 1) = Application.Transpose(dict.Keys)
.Range("B2").Resize(dict.Count, 1) = Application.Transpose(dict.Items)
End With
End Sub

Manipulating Excel spreadsheet, removing rows based on values in a column and then removing more rows based on values in another column

I have a rather complicated problem.
I have a log file that when put into excel the column "I" contains event IDs, and the column J contains a custom key that keeps a particular even grouped.
All i want to do is remove any rows that do not contain the value of say 102 in the event id column.
And THEN i need to check the custom key (column J) and remove rows that are duplicates since any duplicates will falsely show other statistics i want.
I have gotten as far as being able to retrieve the values from the columns using com objects and .entirecolumn cell value etc, but I am completely stumped as to how i can piece together a solid way to remove rows. I could not figure out how to get the row for each value.
To give a bit more clarity this is my thought process on what i need to do:
If cell value in Column I does not = 102 Then delete the row that cell contains.
Repeat for all rows in spreadsheet.
And THEN-
Read every cell in column J and remove all rows containing duplicates based on the values in column J.
Save spreadsheet.
Can any kind persons help me?
Additional Info:
Column I holds a string that is an event id number e.g = 1029
Column J holds a string that is a mix of numbers and letters = 1ASER0X3NEX0S
Ellz, I do agree with Macro Man in that your tags are misleading and, more importantly, I did indeed need to know the details of Column J.
However, I got so sick of rude posts today and yours was polite and respectful so I've pasted some code below that will do the trick ... provided Column J can be a string (the details of which you haven't given us ... see what Macro Man's getting at?).
There are many ways to test for duplicates. One is to try and add a unique key to a collection and see if it throws an error. Many wouldn't like that philosophy but it seemed to be okay for you because it also gives you a collection of all the unique (ie remaining) keys in Column J.
Sub Delete102sAndDuplicates()
Dim ws As Worksheet
Dim uniques As Collection
Dim rng As Range
Dim rowPair As Range
Dim iCell As Range
Dim jCell As Range
Dim delRows As Range
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set rng = Intersect(ws.UsedRange, ws.Range("I:J"))
Set uniques = New Collection
For Each rowPair In rng.Rows
Set iCell = rowPair.Cells(, 1)
Set jCell = rowPair.Cells(, 2)
On Error Resume Next
uniques.Add jCell.Value2, jCell.Text
If Err = 457 Or iCell.Value2 = 102 Then
On Error GoTo 0
If delRows Is Nothing Then
Set delRows = rowPair.EntireRow
Else
Set delRows = Union(delRows, rowPair.EntireRow)
End If
End If
Next
If Not delRows is Nothing then
MsgBox delRows.Address(False, False) & " deleted."
delRows.Delete
End If
End Sub
There are a number of ways in which this can be done, and which is best will depend on how frequently you perform this task and whether you want to have it fully automated. Since you've tagged your question with VBA I assume you'll be happy with a VBA-based answer:
Sub removeValues()
Range("I1").Select 'Start at the top of the I column
'We are going to go down the column until we hit an empty row
Do Until IsEmpty(ActiveCell.Value) = True
If ActiveCell.Value <> 102 Then
ActiveCell.EntireRow.Delete 'Then delete the row
Else
ActiveCell.Offset(1).Select 'Select the cell below
End If
Loop
'Now we have removed all non-102 values from the column, let`s remove the duplicates from the J column
Range("A:J").RemoveDuplicates Columns:=10, Header:=xlNo
End Sub
The key line there is Range("A:J").RemoveDuplicates. It will remove rows from the range you specify according to duplicates it finds in the column you specify. In that case, it will remove items from the A-J columns based on duplicates in column 10 (which is J). If your data extends beyond the J column, then you'll need to replace "A:J" with the appropriate range. Note that the Columns value is relative to the index of the first column, so while the J column is 10 when that range starts at A (1), it would be 2 for example if the range were only I:J. Does that make sense?
(Note: Using ActiveCell is not really best practice, but it's the method that most obviously translates to what you were trying to do and as it seems you're new to VBA I thought it would be the easiest to understand).

Put entire column (each value in column) in an array?

So i'm making a macro to do a bunch of things. one thing is find duplicates of cells in sheet1 from sheet2. given columnA in sheet 1, do any values in columnB on sheet2 match any of the values in columna sheet1.
I know theres a remove duplicates, but I just want to mark them, not remove.
I was thinking something with the filtering. I know when you filter you can select multiple criteria, so if u have a column with 20 different values in it, you can select 5 values in the filter and it will show rows with those 5 values for the particular column. So i recorded a macro of that, and checked out the code, and I see for that it uses a string array, where each value to search for is in a string array. Is there any way to just specify an entire column and add every value to the string array?
thanks in advance
Here are three different ways to load items into an array. The first method is much faster but simply stores everything in the column. You have to be careful with this though because it creates a multidimensional array which isn't something that can be passed to AutoFilter.
Method 1:
Sub LoadArray()
Dim strArray As Variant
Dim TotalRows As Long
TotalRows = Rows(Rows.Count).End(xlUp).Row
strArray = Range(Cells(1, 1), Cells(TotalRows, 1)).Value
MsgBox "Loaded " & UBound(strArray) & " items!"
End Sub
Method 2:
Sub LoadArray2()
Dim strArray() As String
Dim TotalRows As Long
Dim i As Long
TotalRows = Rows(Rows.Count).End(xlUp).Row
ReDim strArray(1 To TotalRows)
For i = 1 To TotalRows
strArray(i) = Cells(i, 1).Value
Next
MsgBox "Loaded " & UBound(strArray) & " items!"
End Sub
if you know the values ahead of time and just want to list them in a variable you can assign a variant using Array()
Sub LoadArray3()
Dim strArray As Variant
strArray = Array("Value1", "Value2", "Value3", "Value4")
MsgBox "Loaded " & UBound(strArray) + 1 & " items!"
End Sub
not sure if anyone else will have this problem or not so I figured I'd post the answer I found. I like the solution of the array posted by #Ripster (and thanks for that, it almost worked) but it won't really work in this case. What I'm working with is a large sheet of data with 1 ID column, and I want to check other sheets to see if there are duplicates in that sheet (using ID column). not delete though, just mark so I can check them out. With potentially upwards of 50K rows looping through each row would take a LONG time.
So, what I figured out I can do is copy the ID column from the other sheet into the main sheet, and use the conditional formatting option to mark duplicates in some colour. (It'll mark the rows in both columns) and then I can filter the column by colour to show me only the colour I used to mark the duplicates. If I programmatically add a column to the sheet I'm checking with the row numbers, I can even include that column in the main sheet so when I filter for colour I can see which rows they were in their sheet.
After doing that I can record and adapt a macro to do this automatically for my less programming inclined co-workers
Thanks much all!
Edit - Added Code
After selecting the columns to compare, here is the code to mark the duplicates with red text and no fill:
Selection.FormatConditions.AddUniqueValues
Selection.FormatConditions(Selection.FormatConditions.Count).SetFirstPriority
Selection.FormatConditions(1).DupeUnique = xlDuplicate
With Selection.FormatConditions(1).Font
.Color = -16383844
.TintAndShade = 0
End With
Selection.FormatConditions(1).StopIfTrue = False
and then, since both columns have the duplicates marked you select the one that you actually want to examine and heres the code to filter:
`Selection.AutoFilter
ActiveSheet.Range("$C$1:$C$12").AutoFilter Field:=1, Criteria1:=RGB(156, 0 _
, 6), Operator:=xlFilterFontColor`
(in my test i used column c as the one to filter, that can be programmatically with a cells() reference or a range(cells(), cells()) sort of reference
I wish everyone the best of luck in their future endevors! thanks again to #ripster

Validate for unique composite key in an excel using macro

I am creating an excel macro. As part of that I need to validate unique composite key in an Excel . ie say Column1 + Column2 + Column3 shouldn't be repeating.
If so the row should be highlighted in red color.
What is the best way to do it?
Thanks in Advance
There are several ways to accomplish this: Two depend on sorting your data while the third does not. I'll supply them in different answers so readers can indicate which they prefer.
Identify duplicates using a Dictionary in VBA
Pro: Fast, does not require sorting
Con: Requires code, does not automatically update
In this case I treat the problem of actually identifying the duplicate rows separately from the trivial step of highlighting them. This function returns a Dictionary where the keys are the compound keys that have more than one row and the values are Collections containing the row numbers of all the rows which matched the key. It's the equivalent of a Dictionary<string,List<int>> in .NET. Conceptually it looks like this:
"some..key..1" : [1, 42, 401]
"some..key..2" : [134, 135]
The keys are the concatenated contents of each of the key columns, delimited by a null character. I use the unprintable null character so that the key set ("A", "Dog", "2") does not become equal to ("AD", "o", "g2").
As written the key comparison is case-sensitive. If you desire a case-insensitive match, set the CompareMode property of dctValues and dctDuplicates to TextCompare.
Note: You'll need to add a reference to Microsoft Scripting Runtime
Public Function FindDuplicates(ByVal DataRange As Range, ParamArray KeyColumns()) As Dictionary
Dim ws As Worksheet
Dim vKeyRange, rngCol As Range
Dim dctKeys As New Dictionary
Dim colKeys
Dim keyParts() As String
Dim strKey As String
Dim dctValues As New Dictionary
Dim dctDuplicates As New Dictionary
Dim i As Long, ub As Long
Dim lngFirstRow As Long, lngLastRow As Long, lngRow As Long
Set ws = DataRange.Worksheet
' Identify unique key column numbers
For Each vKeyRange In KeyColumns
For Each rngCol In vKeyRange.Columns
dctKeys(rngCol.Column) = True
Next
Next
colKeys = dctKeys.Keys
ub = UBound(colKeys)
ReDim keyParts(ub)
' Find first and last row of data range
lngFirstRow = DataRange.Cells(1, 1).Row
lngLastRow = DataRange.Cells(DataRange.Rows.Count, 1).Row
' Loop through rows
For lngRow = lngFirstRow To lngLastRow
' Get the parts for the key
For i = 0 To ub
keyParts(i) = ws.Cells(lngRow, colKeys(i)).Value
Next
' Concatenate the parts with an unprintable character as
' the delimiter, so that "A" + "Dog" != "AD" + "og"
strKey = Join(keyParts, Chr(0))
' If the key hasn't been found yet, create a new collection
If Not dctValues.Exists(strKey) Then
dctValues.Add strKey, New Collection
End If
' Push the row number to the list of rows with this key
dctValues(strKey).Add lngRow
' If this is the second row with this key, add the
' list to the dictionary of keys with multiple rows
If dctValues(strKey).Count = 2 Then
dctDuplicates.Add strKey, dctValues(strKey)
End If
Next
Set FindDuplicates = dctDuplicates
End Function
Usage: Find all duplicate rows in A2:I5000, using columns A, B, and E as key columns
Dim ws As Worksheet, dctDups As Dictionary, vKey, vRow
Set ws = ThisWorkbook.Worksheets(1)
Set dctDups = FindDuplicates(ws.Range("A2:I5000"), ws.Range("A:B"), ws.Range("E:E"))
For Each vKey In dctDups
For Each vRow In dctDups(vKey)
ws.Range("A" & vRow & ":I" & vRow).Interior.Color = vbRed
Next
Next
There are several ways to accomplish this: Two depend on sorting your data while the third does not. I'll supply them in different answers so readers can indicate which they prefer.
Sort and apply conditional formatting
Pro: Dynamic (adjusts to changes in data), does not require any code
Con: Requires sorting, can become messy
Manually sort by the key columns
Create a conditional formatting rule and apply it to all rows of data.
Highlight all the data, but starting with the first row of data
Select Conditional Formatting -> New Rule
Set the format to a red fill
Select "Use a formula to determine which cells to format"
Here's the formula you need, assuming your selection starts on row 2 (there's a header in row 1), and your key columns are A, B, and C. Note carefully where the $ signs appear and where they do not:
=OR((CONCATENATE($A2,$B2,$C2)=CONCATENATE($A1,$B1,$C1)),
(CONCATENATE($A2,$B2,$C2)=CONCATENATE($A3,$B3,$C3)))
This will highlight both rows that have duplicate keys, or all rows if there are more than two.