Validate for unique composite key in an excel using macro - vba

I am creating an excel macro. As part of that I need to validate unique composite key in an Excel . ie say Column1 + Column2 + Column3 shouldn't be repeating.
If so the row should be highlighted in red color.
What is the best way to do it?
Thanks in Advance

There are several ways to accomplish this: Two depend on sorting your data while the third does not. I'll supply them in different answers so readers can indicate which they prefer.
Identify duplicates using a Dictionary in VBA
Pro: Fast, does not require sorting
Con: Requires code, does not automatically update
In this case I treat the problem of actually identifying the duplicate rows separately from the trivial step of highlighting them. This function returns a Dictionary where the keys are the compound keys that have more than one row and the values are Collections containing the row numbers of all the rows which matched the key. It's the equivalent of a Dictionary<string,List<int>> in .NET. Conceptually it looks like this:
"some..key..1" : [1, 42, 401]
"some..key..2" : [134, 135]
The keys are the concatenated contents of each of the key columns, delimited by a null character. I use the unprintable null character so that the key set ("A", "Dog", "2") does not become equal to ("AD", "o", "g2").
As written the key comparison is case-sensitive. If you desire a case-insensitive match, set the CompareMode property of dctValues and dctDuplicates to TextCompare.
Note: You'll need to add a reference to Microsoft Scripting Runtime
Public Function FindDuplicates(ByVal DataRange As Range, ParamArray KeyColumns()) As Dictionary
Dim ws As Worksheet
Dim vKeyRange, rngCol As Range
Dim dctKeys As New Dictionary
Dim colKeys
Dim keyParts() As String
Dim strKey As String
Dim dctValues As New Dictionary
Dim dctDuplicates As New Dictionary
Dim i As Long, ub As Long
Dim lngFirstRow As Long, lngLastRow As Long, lngRow As Long
Set ws = DataRange.Worksheet
' Identify unique key column numbers
For Each vKeyRange In KeyColumns
For Each rngCol In vKeyRange.Columns
dctKeys(rngCol.Column) = True
Next
Next
colKeys = dctKeys.Keys
ub = UBound(colKeys)
ReDim keyParts(ub)
' Find first and last row of data range
lngFirstRow = DataRange.Cells(1, 1).Row
lngLastRow = DataRange.Cells(DataRange.Rows.Count, 1).Row
' Loop through rows
For lngRow = lngFirstRow To lngLastRow
' Get the parts for the key
For i = 0 To ub
keyParts(i) = ws.Cells(lngRow, colKeys(i)).Value
Next
' Concatenate the parts with an unprintable character as
' the delimiter, so that "A" + "Dog" != "AD" + "og"
strKey = Join(keyParts, Chr(0))
' If the key hasn't been found yet, create a new collection
If Not dctValues.Exists(strKey) Then
dctValues.Add strKey, New Collection
End If
' Push the row number to the list of rows with this key
dctValues(strKey).Add lngRow
' If this is the second row with this key, add the
' list to the dictionary of keys with multiple rows
If dctValues(strKey).Count = 2 Then
dctDuplicates.Add strKey, dctValues(strKey)
End If
Next
Set FindDuplicates = dctDuplicates
End Function
Usage: Find all duplicate rows in A2:I5000, using columns A, B, and E as key columns
Dim ws As Worksheet, dctDups As Dictionary, vKey, vRow
Set ws = ThisWorkbook.Worksheets(1)
Set dctDups = FindDuplicates(ws.Range("A2:I5000"), ws.Range("A:B"), ws.Range("E:E"))
For Each vKey In dctDups
For Each vRow In dctDups(vKey)
ws.Range("A" & vRow & ":I" & vRow).Interior.Color = vbRed
Next
Next

There are several ways to accomplish this: Two depend on sorting your data while the third does not. I'll supply them in different answers so readers can indicate which they prefer.
Sort and apply conditional formatting
Pro: Dynamic (adjusts to changes in data), does not require any code
Con: Requires sorting, can become messy
Manually sort by the key columns
Create a conditional formatting rule and apply it to all rows of data.
Highlight all the data, but starting with the first row of data
Select Conditional Formatting -> New Rule
Set the format to a red fill
Select "Use a formula to determine which cells to format"
Here's the formula you need, assuming your selection starts on row 2 (there's a header in row 1), and your key columns are A, B, and C. Note carefully where the $ signs appear and where they do not:
=OR((CONCATENATE($A2,$B2,$C2)=CONCATENATE($A1,$B1,$C1)),
(CONCATENATE($A2,$B2,$C2)=CONCATENATE($A3,$B3,$C3)))
This will highlight both rows that have duplicate keys, or all rows if there are more than two.

Related

Remove duplicate values and cells from one column

I have tried so many methods from the removeduplicates, selections and scripting dictionaries and I cannot get this to work. I do understand there are multiple ways to do this but if any of you can help, that would be great.
I have one list of values that I am pulling through from another sheet (up to approx 80k rows) into cell B13 downwards. I am then trying to remove the duplicate values and cells so I am left with unique values which I can then use to perform lookups on other sheets.
Sub Address_Sage()
Dim dataBook As Workbook
Dim dict As Object
Dim Sage_Data As Worksheet, Address As Worksheet
Dim dataSource As Range, dataDest As Range
Dim sourceDataRowCount As Integer, index As Integer
Dim rowCount As Long
Dim strVal As String
Set dataBook = Application.ThisWorkbook
Set sheetSource = dataBook.Sheets("Sage_Data")
Set sheetDest = dataBook.Sheets("Address")
Set dict = CreateObject("Scripting.Dictionary")
Set dataSource = sheetSource.Range("A3", _
sheetSource.Range("A90000").End(xlUp))
sourceDataRowCount = dataSource.Rows.Count
Set dataDest = sheetDest.Range("B13", "B" & _
sourceDataRowCount)
For index = 1 To sourceDataRowCount
dataDest(index, 1).Value = dataSource(index, 1).Value
Next index
Sheets("Address").Select
rowCount = ActiveSheet.Range("B13").CurrentRegion.Rows.Count
Do While rowCount > 0
strVal = Address.Cells(rowCount, 1).Value2
If dict.exists(strVal) Then
ActiveSheet.Rows(rowCount).EntireRow.Delete
Else
dict.Add strVal, 0
End If
rowCount = rowCount - 1
Loop
'Set dict = Nothing
End Sub
It always gets stuck on strVal line. I have tried changing value2 to value1 as I only have column but no luck.
thank you
Not super experienced in VBA so I can't speak to exactly what you're doing and what your code is saying but I thought I'd share this with you. Last week I had to create a macrobook that returned the unique entries of electrical defects that different crews observed while on the job. I made a dictionary that read all of the entries in the spreadsheet and then later printed all of the unique entries. I'll post the code and try to walk you through it.
If .Range("A" & i) <> "" Then
If dict.Exists(data) Then
dict(data) = dict(data) + 1
Else
dict.Add Key:=Data, Item:="1"
End If
End If
So the code basically says if column A (i is simply an incrementer) is not empty, then we're going to read the entries of column A. Data is simply a variable and you would set it equal to the range of values you'd like read in the dictionary. Obviously dictionary keys are unique and cannot repeat, so the code asks if the key already exists in the dictionary. If so, we will add one to it's count or value. And if not we will add that key to the dictionary. At the end of your loop, your dictionary will have stored all unique entries and the number of times they appeared.
Now we can reference them or print them.
For r = 0 To dict.Count
Sheets("Results").Range("B" & iResults) = dict.Keys(r)
Sheets("Results").Range("C" & iResults) = dict(dict.Keys(r))
Next
This second piece of code is a loop from 0 to the number of entries in your dictionary. It starts at zero because the dictionary is stored like an array and VBA arrays are base zero. The first statement will print the unique keys for every r until there are no more entries in the dictionary. The second statement will print the value or items associated with them. It will be an integer value equal to the number of times that unique entry showed up in your data.
You can use this same method for other purposes as well, not just printing the data but referencing it and using it somewhere else. But I am sure you will find that the For-loop with dict.Keys(r) is the easiest way to run through your dictionary entries. Took me a few days to figure it out and it revolutionized my program. Hope this helps you out.

Creating column immune references in VBA?

I have a project that I am working on where multiple conditions are checked across all rows and many columns. The issue is that columns are added/removed from the sheet, and, at present, that results in all of my cell(row,column) references being off + outputting incorrect information. I'm wondering if there's a way to make my column references more robust so that they automatically find the correct headers and use them when checking? Would a solution to this problem be able to account for multiple columns containing the exact same header text?
Basically:
No blank columns
Column headers have repeats (e.g., Column 1 header: "Financials"; Column 15 header: "Financials")
Columns are shifting right and left based on adding/removing columns from sheet
Please find a short sample of my current code below with notes:
Dim i As Integer
Dim lastRow As Long
Dim lastCol As Long
lastRow = Range("A1").End(xlDown).Row
lastCol = Cells(1, Columns.Count).End(xlToLeft).Column
For i = 2 To lastRow
Select Case Cells(i, 14).Value
Case Is = "Yes"
Select Case True
Case Cells(i, 63).Value = 6 And _
(IsEmpty(Cells(i, 77)) Or IsEmpty(Cells(i, 93)) Or IsEmpty(Cells(i, 109)) Or _
IsEmpty(Cells(i, 125)) Or IsEmpty(Cells(i, 141)) Or IsEmpty(Cells(i, 157)))
Cells(i, 174).Value = "True" '^THESE CELL VALUES ALL HAVE THE SAME COLUMN HEADER TITLE
If the table is consistent - starting at A1 and occupying a contiguous block - then Range("A1").CurrentRegion will reference the table.
You can then use .CreateNames to name the columns (that is, using Named Ranges) according to their headings.
Dim rngTable As Range
Dim rng As Range
Set rngTable = Range("A1").CurrentRegion
rngTable.CreateNames True, False, False, False
' that is, based on the first row headings
Range("Salary").Select 'prove it works
'if necessary, iterate the cells of the column,
For Each rng In Range("Salary")
rng.Value = rng.Value + 10
Next 'rng
If a column heading is duplicated ("Financial"), though, then you'll be asked to confirm and the second occurrence will overrule the first. (Or you could say "No" and the first occurrence will be named.) In which case, it is preferable that you first correct these duplicate headings.
Correcting the duplicate headings is not necessarily straight forward, but something that you should resolve anyway. If it is a specific word "Financials" (or words) that could be duplicated then this makes the task easier. You could count how many occurrences there are, and correct the second, etc., to "Financials2".
One easy way to to assign a Name to the column. Say column N has the header "Payments". First assign the Name "Payments" to that column:
Then in VBA we can code like:
Sub dural()
Dim rng As Range, colly As Long
Set rng = Range("Payments")
colly = rng.Column
For i = 2 To 100
If Cells(i, colly) = "whatever" Then
MsgBox "Help"
End If
Next i
End Sub
The code will continue to work even if you add/remove columns beforre column N.

Best way to populate an excel string column for fastest subsequent vba search (can I use metadata, etc?)

In a column with hundreds or even 1-2 thousand strings of approximately 40 characters, with one string per cell and many repeating entries, what is the best way to populate the column to conduct the fastest possible search later? The search should return a row number so that the corresponding row can be deleted.
Is there some way to append metadata or label to a cell/row for faster search? Is there some other mechanism that can identify cells that will make searching easier?
I'm new to VBA, and I want to set out on the best path before I get too far into the project and have to search through thousands of strings.
edit: Someone requested an example cell: The cells will have email addresses in them. I can control the email addresses on the server, so they will roughly be 40 characters long each. They will contain alphanumeric characters only.
Example of a fast way to implement a dictionary lookup
Data is on Sheet1, and starts in column A
The strings are in column B
Option Explicit
Public Sub SearchStrings()
Dim ur As Variant, r As Long, d As Object
Const COL_ID = 2
Set d = CreateObject("Scripting.Dictionary") 'or Reference to Microsof Scripting Runtime
d.CompareMode = TextCompare 'Case insensitive, or "BinaryCompare" otherwise
ur = Sheet1.UsedRange.Columns(COL_ID) 'read strings from column COL_ID into array
For r = LBound(ur) To UBound(ur) 'populate dictionary; Key = string (unique)
If Not IsError(ur(r, 1)) Then d(CStr(ur(r, 1))) = r 'Item = row id
Next
Debug.Print d.Keys()(3) 'prints the string in row 3
Debug.Print d.Items()(3) 'prints the row number of the 3rd string
End Sub
If you want to store string duplicates use this:
If Not IsError(ur(r, 1)) Then d(COL_ID & "-" & r) = CStr(ur(r, 1))
which is Key = Column ID & "-" & row ID (2-5), and Item = String itself

VBA Script to Combine Duplicate Data into One Row

I'm attempting to write a script to, given a list of parts and their used-on numbers, combine the used-on values for all duplicate parts and sort the resulting values. That sounded confusing so here is an image of what I'm shooting for:
Desired Result
It should be able to handle duplicates and blanks.
Through a combination of research and tinkering, I've gotten a sortWithinCell function made, but my headaches are coming from writing the script to iterate down the list of parts, combining used-on values as you go, and then upon reaching a different part number, delete all but the first row that you started from and then continue. Assume that the input parts list is already sorted by part number.
Thanks all!
-Edit 1-
I did manage to find a way to combine the data values:
Function ConcatinateAllCellValuesInRange(sourceRange As Excel.Range) As String
Dim finalValue As String
Dim cell As Excel.Range
For Each cell In sourceRange.Cells
finalValue = finalValue + "," + CStr(cell.Value)
Next cell
ConcatinateAllCellValuesInRange = Right(finalValue, Len(finalValue) - 1)
End Function
However, now the trick is still identifying the input range for this function, i.e. when the part number changes.
A possibile solution would be to create a collection which uses the Part as Key and has the Used-On values as value, so at the first occurence of Part, the KeyValue pair is added, then in an ongoing loop, if the key already exists in the collection, take the value of the key and the one from the collection, merge them together, delete the current KeyValue pair and add a new one (You can't modify a KeyValue pair of a collection, therefore delete the old one and add a new one).
If you are finished looping through your list, you can loop through the elements in the collection and save them somewhere else on the sheet. Or if you want, delete the current range and insert everything where the old values were.
Let me know if you need help with this
EDIT
Here is an example of the method I described, though i used a Dictionary instead of a Collection, because it has more functionality (REQUIRES A REFERENCE TO "Microsoft Scripting Runtime"):
Option Explicit
Sub combineStuff()
'Create Dictionary
Dim tmpDic As New Dictionary
'tempstring to preserve the value of a key
Dim tmpValue As String
Dim i As Integer
'Loop through rows - change this to your needs
For i = 1 To 15 Step 1
'Simple check if "Part" and "Used-On" are both filled
If Cells(i, 1).Value <> "" And Cells(i, 2).Value <> "" Then
'Check if the Key already exists
If tmpDic.Exists(CStr(Cells(i, 1).Value)) Then
'If it does, save it's value
tmpValue = tmpDic.Item(CStr(Cells(i, 1).Value))
'Delete the Key Value pair
tmpDic.Remove CStr(Cells(i, 1).Value)
'Add Key Value pair with updated Value
tmpDic.Add CStr(Cells(i, 1).Value), tmpValue & "," & CStr(Cells(i, 2).Value)
Else
'Key does not exist yet, add it
tmpDic.Add CStr(Cells(i, 1).Value), Cells(i, 2).Value
End If
End If
Next i
Dim tmpKey As Variant
i = 1
'Loop through all items in the Dictionary
For Each tmpKey In tmpDic
'Insert Key
Cells(i, 4).Value = tmpKey
'Insert Value
Cells(i, 5).Value = tmpDic(tmpKey)
i = i + 1
Next tmpKey
End Sub
Note that the Dictionary will not be sorted! The key that was modified last is the last one in the list. If you want the outcome to be sorted, it might be better to use a multidimensional array to store the key and value pairs. Then you can easily sort those.

Count number of unique values containing text

I have the following code that counts the number of cells in a column that contains the string, "ABC-QR":
Ctr = Application.WorksheetFunction.CountIf(Sheet1.Range("D4:D1500"), "*ABC-QR*")
EU.Cells(16, 3) = Ctr
I used "ABC-QR" because that's the part of the data that doesn't change. The true data that's in those cells is, for example, "ABC-QR00012345", or whatever number it may have. I would like to modify my code to not include duplicates when it's counting.
Firstly, you must enable 'Microsoft Scripting Runtime' from within Tools --> References within the Visual Basic Editor.
You assign the data from the worksheet into an array; then import everything which fits the string criteria, and isn't a duplicate, into a dictionary. You can check for duplicates in the dictionary using the .Exists method.
EDIT: As noted by #Zev in the comments, you don't even need to use the .Exists method. You can just assign the array element to the key of the dictionary, and assign the item value as 1. Any duplicate values from the Array will overwrite the previous key, so duplicates will automatically be dealt with.
Once everything which isn't a duplicate has been imported into the dictionary, you can then use the .Count property on the dictionary. This will tell you how many records fit your string criteria, and are not duplicates, within the range passed into the array.
Option Explicit
Sub countNonDuplicates()
Dim wb As Workbook, ws As Worksheet
Dim dict As Scripting.Dictionary
Dim myValues() As Variant
Dim lRow As Long, i As Long
Set wb = ThisWorkbook
Set ws = wb.Sheets(1)
Set dict = New Scripting.Dictionary
lRow = Cells(Rows.Count, 1).End(xlUp).Row
myValues = Range(Cells(1, 1), Cells(lRow, 1))
For i = 1 To UBound(myValues, 1)
If InStr(myValues(i, 1), "ABC-QR") Then dict(myValues(i,1)) = 1 'arbitrary value
Next i
MsgBox (dict.Count)
End Sub
The above currently gets the last row of Column A and then takes the range and assigns it to the array. If you wish to use a different column, then update the following statements with the column number required (example below now uses Column D)
lRow = Cells(Rows.Count, 4).End(xlUp).Row
myValues = Range(Cells(1, 4), Cells(lRow, 4))
Also it's currently performing the above on Sheets(1). Change the worksheet number to what you require.
On 100,000 records this took 0.2 seconds to produce the count.
This array formula should do the trick:
EU.Cells(16,3).FormulaArray = "=SUM(IF(ISERROR(FIND(""ABC-QR"",D4:D1500)),0,1/(COUNTIF(D4:D1500,D4:D1500))))"
Since it's an array formula, it will operate on each cell in your range in turn and look for your text (FIND("ABC-QR",D4:D1500)). If it's not found, it returns 0 to the running SUM(). If it is found, it uses the value 1/count, where count is the number of times the cell value being tested exists in your range.