Count number of unique values containing text - vba

I have the following code that counts the number of cells in a column that contains the string, "ABC-QR":
Ctr = Application.WorksheetFunction.CountIf(Sheet1.Range("D4:D1500"), "*ABC-QR*")
EU.Cells(16, 3) = Ctr
I used "ABC-QR" because that's the part of the data that doesn't change. The true data that's in those cells is, for example, "ABC-QR00012345", or whatever number it may have. I would like to modify my code to not include duplicates when it's counting.

Firstly, you must enable 'Microsoft Scripting Runtime' from within Tools --> References within the Visual Basic Editor.
You assign the data from the worksheet into an array; then import everything which fits the string criteria, and isn't a duplicate, into a dictionary. You can check for duplicates in the dictionary using the .Exists method.
EDIT: As noted by #Zev in the comments, you don't even need to use the .Exists method. You can just assign the array element to the key of the dictionary, and assign the item value as 1. Any duplicate values from the Array will overwrite the previous key, so duplicates will automatically be dealt with.
Once everything which isn't a duplicate has been imported into the dictionary, you can then use the .Count property on the dictionary. This will tell you how many records fit your string criteria, and are not duplicates, within the range passed into the array.
Option Explicit
Sub countNonDuplicates()
Dim wb As Workbook, ws As Worksheet
Dim dict As Scripting.Dictionary
Dim myValues() As Variant
Dim lRow As Long, i As Long
Set wb = ThisWorkbook
Set ws = wb.Sheets(1)
Set dict = New Scripting.Dictionary
lRow = Cells(Rows.Count, 1).End(xlUp).Row
myValues = Range(Cells(1, 1), Cells(lRow, 1))
For i = 1 To UBound(myValues, 1)
If InStr(myValues(i, 1), "ABC-QR") Then dict(myValues(i,1)) = 1 'arbitrary value
Next i
MsgBox (dict.Count)
End Sub
The above currently gets the last row of Column A and then takes the range and assigns it to the array. If you wish to use a different column, then update the following statements with the column number required (example below now uses Column D)
lRow = Cells(Rows.Count, 4).End(xlUp).Row
myValues = Range(Cells(1, 4), Cells(lRow, 4))
Also it's currently performing the above on Sheets(1). Change the worksheet number to what you require.
On 100,000 records this took 0.2 seconds to produce the count.

This array formula should do the trick:
EU.Cells(16,3).FormulaArray = "=SUM(IF(ISERROR(FIND(""ABC-QR"",D4:D1500)),0,1/(COUNTIF(D4:D1500,D4:D1500))))"
Since it's an array formula, it will operate on each cell in your range in turn and look for your text (FIND("ABC-QR",D4:D1500)). If it's not found, it returns 0 to the running SUM(). If it is found, it uses the value 1/count, where count is the number of times the cell value being tested exists in your range.

Related

Remove duplicate values and cells from one column

I have tried so many methods from the removeduplicates, selections and scripting dictionaries and I cannot get this to work. I do understand there are multiple ways to do this but if any of you can help, that would be great.
I have one list of values that I am pulling through from another sheet (up to approx 80k rows) into cell B13 downwards. I am then trying to remove the duplicate values and cells so I am left with unique values which I can then use to perform lookups on other sheets.
Sub Address_Sage()
Dim dataBook As Workbook
Dim dict As Object
Dim Sage_Data As Worksheet, Address As Worksheet
Dim dataSource As Range, dataDest As Range
Dim sourceDataRowCount As Integer, index As Integer
Dim rowCount As Long
Dim strVal As String
Set dataBook = Application.ThisWorkbook
Set sheetSource = dataBook.Sheets("Sage_Data")
Set sheetDest = dataBook.Sheets("Address")
Set dict = CreateObject("Scripting.Dictionary")
Set dataSource = sheetSource.Range("A3", _
sheetSource.Range("A90000").End(xlUp))
sourceDataRowCount = dataSource.Rows.Count
Set dataDest = sheetDest.Range("B13", "B" & _
sourceDataRowCount)
For index = 1 To sourceDataRowCount
dataDest(index, 1).Value = dataSource(index, 1).Value
Next index
Sheets("Address").Select
rowCount = ActiveSheet.Range("B13").CurrentRegion.Rows.Count
Do While rowCount > 0
strVal = Address.Cells(rowCount, 1).Value2
If dict.exists(strVal) Then
ActiveSheet.Rows(rowCount).EntireRow.Delete
Else
dict.Add strVal, 0
End If
rowCount = rowCount - 1
Loop
'Set dict = Nothing
End Sub
It always gets stuck on strVal line. I have tried changing value2 to value1 as I only have column but no luck.
thank you
Not super experienced in VBA so I can't speak to exactly what you're doing and what your code is saying but I thought I'd share this with you. Last week I had to create a macrobook that returned the unique entries of electrical defects that different crews observed while on the job. I made a dictionary that read all of the entries in the spreadsheet and then later printed all of the unique entries. I'll post the code and try to walk you through it.
If .Range("A" & i) <> "" Then
If dict.Exists(data) Then
dict(data) = dict(data) + 1
Else
dict.Add Key:=Data, Item:="1"
End If
End If
So the code basically says if column A (i is simply an incrementer) is not empty, then we're going to read the entries of column A. Data is simply a variable and you would set it equal to the range of values you'd like read in the dictionary. Obviously dictionary keys are unique and cannot repeat, so the code asks if the key already exists in the dictionary. If so, we will add one to it's count or value. And if not we will add that key to the dictionary. At the end of your loop, your dictionary will have stored all unique entries and the number of times they appeared.
Now we can reference them or print them.
For r = 0 To dict.Count
Sheets("Results").Range("B" & iResults) = dict.Keys(r)
Sheets("Results").Range("C" & iResults) = dict(dict.Keys(r))
Next
This second piece of code is a loop from 0 to the number of entries in your dictionary. It starts at zero because the dictionary is stored like an array and VBA arrays are base zero. The first statement will print the unique keys for every r until there are no more entries in the dictionary. The second statement will print the value or items associated with them. It will be an integer value equal to the number of times that unique entry showed up in your data.
You can use this same method for other purposes as well, not just printing the data but referencing it and using it somewhere else. But I am sure you will find that the For-loop with dict.Keys(r) is the easiest way to run through your dictionary entries. Took me a few days to figure it out and it revolutionized my program. Hope this helps you out.

Creating new cell values for each new value in a range

I'm having trouble with a specific process.
In column H, there are a bunch of different numbers. I'm looking for a loop, that for each newfound value in the column, counts how many times the value occur in that column, and put's the counted value in the next worksheet "Statistik".
I do have a solution were I make a code for each individual number in the column, but i'm looking for a loop, since there are 28 different values ind that column.
Image of my workbook
Does anyone have a bright solution for my problem?
Thanks in advance.
You need to use COUNTIF. Either as a formula or in your VBA code (Application.CountIf()).
In your case =COUNTIF(H:H, [UNIQUE_VALUE]) where unique value is the value you want to extract. To get unique values you have two options. One is to copy unique values from H:H to your Statistik sheet as follows:
Click the Data Ribbon Menu
Select the Advanced Button in the Sort & Filter section
Fill in the dialog Box, Copy to another location, List range H:H, Copy to Some column in Statistik sheet* making sure you tick **Unique records only
Other option to get unique values is detailed here https://exceljet.net/formula/extract-unique-items-from-a-list
For more information about COUNTIF
https://support.office.com/en-us/article/countif-function-e0de10c6-f885-4e71-abb4-1f464816df34
You could use a dictionary to output only 1 key and value
Option Explicit
Public Sub GetCount()
Dim dict As Object
Set dict = CreateObject("Scripting.Dictionary")
With Worksheets("Sheet1") 'change as appropriate
Dim arr(), i As Long
arr = Intersect(.Columns("H"), .UsedRange) '<=== Specify exact range in H to loop over if you want including header
For i = LBound(arr, 1) + 1 To UBound(arr, 1) 'assuming header to ignore otherwise start at 1
If Not dict.exists(arr(i, 1)) Then
dict.Add arr(i, 1), 1 '<==== if value not seen before add to dictionary with value of 1
Else
dict(arr(i, 1)) = dict(arr(i, 1)) + 1 ' <====== if seen before add 1 to the existing count
End If
Next i
End With
With Worksheets("Statistik")
.Range("A1") = "StudyBoard_ID"
.Range("B1") = "Count"
.Range("A2").Resize(dict.Count, 1) = Application.Transpose(dict.Keys)
.Range("B2").Resize(dict.Count, 1) = Application.Transpose(dict.Items)
End With
End Sub

i have a list of values mentioned in a column. i want to use vba to pick x number of values from that list

I have a list of values mentioned in a column. i want to use vba to pick x number of values from that list.the number x is mentioned in another cell. can you help me?
i have tried various formulas but no solution.
assuming myColumnRng is your range of cells in a columns containing your list of values, you can get its first N elements in a Variant array as follows:
myElements = Application.Transpose(myColumnRng .Resize(nElements).Value)
to give a little bit more of context:
Option Explicit
Sub main()
Dim myColumnRng As Range
Dim myElements As Variant
Dim iEl As Long
Set myColumnRng = Range("A1", Cells(Rows.count, "A").End(xlUp)) '<--| set your range as the column "A" one from row 1 down to last not empty row
myElements = Application.Transpose(myColumnRng .Resize(nElements).Value)
For iEl = 1 To UBound(myElements)
Debug.Print myElements(iEl)
Next
End Sub
Pick x values on random, or pick first x values in the list?

Copy/Paste rows to matching named sheet

I have a worksheet "List" which has rows of data that I need to copy to other worksheets. In column "J" of "List", there is a name (Matthew, Mark, Linda, etc.) that designates who's data that row is.
Each of those names (22 in all) has a matching spreadsheet with the same name. I want all rows that say "Linda" in column "J" to paste to worksheet "Linda", all rows with "Matthew" to paste to worksheet "Matthew", etc.
I have some code below, which mostly works, but I'd have to rewrite it for all 22 names/sheets.
Is there a way to loop through all the sheets, pasting the rows with matching names? Also, the code below works really slowly, and I'm using data sets with anywhere from 200 to 60,000 rows that need sorted and pasted, which means that if its slow on a small data set like the one I'm currently working on, and only for one sheet, it's going to be glacially slow for the big data sets.
Sub CopyMatch()
Dim c As Range
Dim j As Integer
Dim Source As Worksheet
Dim Target As Worksheet
Set Source = Worksheets("List")
Set Target = Worksheets("Linda")
j = 4 ' Start copying to row 1 in target sheet
For Each c In Source.Range("J4:J1000") ' Do 1000 rows
If c = "Linda" Then
Source.Rows(c.Row).Copy Target.Rows(j)
j = j + 1
End If
Next c
End Sub
Unless you've turned calculation off somewhere we can't see here, then every time you copy a row, Excel is recalculating - even if your sheets contain no formulas.
If you're not doing so already, simply putting:
application.calculation=xlcalculationmanual
before you start your loop and:
application.calculation=xlcalculationautomatic
after exiting the loop will massively speed up your loop. For extra swank, you can use a variable to store the calculation setting before you turn it off and restore that setting at the end, e.g.
dim lCalc as long
lCalc = application.calculation
application.calculation = xlcalculationmanual
for ... next goes here
application.calculation = lCalc
Also consider other settings, e.g.: application.screenupdating=False|True.
Sort the data by the name you're selecting on, then by any other sorts you want. That way you can skip through any size sheet in 22 steps (since you say you have 22 names).
How you copy the data depends on preference and how much data there is. Copying one row at a time is economical on memory and pretty much guaranteed to work, but is slower. Or you can identify the top and bottom rows of each person's data and copy the whole block as a single range, at the risk of exceeding the memory available on large blocks in large sheets.
Assuming the value in your name column, for the range you're checking, is always one of the 22 names, then if you've sorted first by that column you can use the value in that column to determine the destination, e.g.:
dim sTarget as string
dim rng as range
sTarget = ""
For Each c In Source.Range("J4:J1000") ' Do 1000 rows
if c <> "" then ' skip empty rows
if c <> sTarget then ' new name block
sTarget = c
Set Target = Worksheets(c)
set rng = Target.cells(Target.rows.count, 10).end(xlup) ' 10="J"
j = rng.row + 1 ' first row below last name pasted
end if
Source.Rows(c.Row).Copy Target.Rows(j)
j = j + 1
end if
Next
This is economical of memory because you're going row by row, but still reasonably fast because you're only recalculating Target and resetting j when the name changes.
you could use:
Dictionary object to quickly build the list of unique names out of column J names
AutoFilter() method of Range object for filtering on each name:
as follows
Option Explicit
Sub CopyMatch()
Dim c As Range, namesRng As Range
Dim name As Variant
With Worksheets("List") '<--| reference "List" worskheet
Set namesRng = .Range("J4", .Cells(.Rows.count, "J").End(xlUp)) '<--| set the range of "names" in column "J" starting from row 4 down to last not empty row
End With
With CreateObject("Scripting.Dictionary") '<--| instance a 'Dictionary' object
For Each c In namesRng.SpecialCells(xlCellTypeConstants, xlTextValues) '<--| loop through "names" range cells with text content only
.item(c.Value) = c.Value '<--| build the unique list of names using dictionary key
Next
Set namesRng = namesRng.Resize(namesRng.Rows.count + 1).Offset(-1) '<--| resize the range of "names" to have a "header" cell (not a name to filter on) in the first row
For Each name In .Keys '<--| loop through dictionary keys, i.e. the unique names list
FilterNameAndCopyToWorksheet namesRng, name '<--| filter on current name and copy to corresponding worksheet
Next
End With '<--| release the 'Dictionary' object
End Sub
Sub FilterNameAndCopyToWorksheet(rangeToFilter As Range, nameToFilter As Variant)
Dim destsht As Worksheet
Set destsht = Worksheets(nameToFilter) '<--| set the worksheet object corresponding to passed name
With rangeToFilter
.AutoFilter Field:=1, Criteria1:=nameToFilter
Intersect(.Parent.UsedRange, .Resize(.Rows.count - 1).Offset(1).SpecialCells(xlCellTypeVisible).EntireRow).Copy destsht.Cells(destsht.Rows.count, "J").End(xlUp)
.Parent.AutoFilterMode = False
End With
End Sub

Validate for unique composite key in an excel using macro

I am creating an excel macro. As part of that I need to validate unique composite key in an Excel . ie say Column1 + Column2 + Column3 shouldn't be repeating.
If so the row should be highlighted in red color.
What is the best way to do it?
Thanks in Advance
There are several ways to accomplish this: Two depend on sorting your data while the third does not. I'll supply them in different answers so readers can indicate which they prefer.
Identify duplicates using a Dictionary in VBA
Pro: Fast, does not require sorting
Con: Requires code, does not automatically update
In this case I treat the problem of actually identifying the duplicate rows separately from the trivial step of highlighting them. This function returns a Dictionary where the keys are the compound keys that have more than one row and the values are Collections containing the row numbers of all the rows which matched the key. It's the equivalent of a Dictionary<string,List<int>> in .NET. Conceptually it looks like this:
"some..key..1" : [1, 42, 401]
"some..key..2" : [134, 135]
The keys are the concatenated contents of each of the key columns, delimited by a null character. I use the unprintable null character so that the key set ("A", "Dog", "2") does not become equal to ("AD", "o", "g2").
As written the key comparison is case-sensitive. If you desire a case-insensitive match, set the CompareMode property of dctValues and dctDuplicates to TextCompare.
Note: You'll need to add a reference to Microsoft Scripting Runtime
Public Function FindDuplicates(ByVal DataRange As Range, ParamArray KeyColumns()) As Dictionary
Dim ws As Worksheet
Dim vKeyRange, rngCol As Range
Dim dctKeys As New Dictionary
Dim colKeys
Dim keyParts() As String
Dim strKey As String
Dim dctValues As New Dictionary
Dim dctDuplicates As New Dictionary
Dim i As Long, ub As Long
Dim lngFirstRow As Long, lngLastRow As Long, lngRow As Long
Set ws = DataRange.Worksheet
' Identify unique key column numbers
For Each vKeyRange In KeyColumns
For Each rngCol In vKeyRange.Columns
dctKeys(rngCol.Column) = True
Next
Next
colKeys = dctKeys.Keys
ub = UBound(colKeys)
ReDim keyParts(ub)
' Find first and last row of data range
lngFirstRow = DataRange.Cells(1, 1).Row
lngLastRow = DataRange.Cells(DataRange.Rows.Count, 1).Row
' Loop through rows
For lngRow = lngFirstRow To lngLastRow
' Get the parts for the key
For i = 0 To ub
keyParts(i) = ws.Cells(lngRow, colKeys(i)).Value
Next
' Concatenate the parts with an unprintable character as
' the delimiter, so that "A" + "Dog" != "AD" + "og"
strKey = Join(keyParts, Chr(0))
' If the key hasn't been found yet, create a new collection
If Not dctValues.Exists(strKey) Then
dctValues.Add strKey, New Collection
End If
' Push the row number to the list of rows with this key
dctValues(strKey).Add lngRow
' If this is the second row with this key, add the
' list to the dictionary of keys with multiple rows
If dctValues(strKey).Count = 2 Then
dctDuplicates.Add strKey, dctValues(strKey)
End If
Next
Set FindDuplicates = dctDuplicates
End Function
Usage: Find all duplicate rows in A2:I5000, using columns A, B, and E as key columns
Dim ws As Worksheet, dctDups As Dictionary, vKey, vRow
Set ws = ThisWorkbook.Worksheets(1)
Set dctDups = FindDuplicates(ws.Range("A2:I5000"), ws.Range("A:B"), ws.Range("E:E"))
For Each vKey In dctDups
For Each vRow In dctDups(vKey)
ws.Range("A" & vRow & ":I" & vRow).Interior.Color = vbRed
Next
Next
There are several ways to accomplish this: Two depend on sorting your data while the third does not. I'll supply them in different answers so readers can indicate which they prefer.
Sort and apply conditional formatting
Pro: Dynamic (adjusts to changes in data), does not require any code
Con: Requires sorting, can become messy
Manually sort by the key columns
Create a conditional formatting rule and apply it to all rows of data.
Highlight all the data, but starting with the first row of data
Select Conditional Formatting -> New Rule
Set the format to a red fill
Select "Use a formula to determine which cells to format"
Here's the formula you need, assuming your selection starts on row 2 (there's a header in row 1), and your key columns are A, B, and C. Note carefully where the $ signs appear and where they do not:
=OR((CONCATENATE($A2,$B2,$C2)=CONCATENATE($A1,$B1,$C1)),
(CONCATENATE($A2,$B2,$C2)=CONCATENATE($A3,$B3,$C3)))
This will highlight both rows that have duplicate keys, or all rows if there are more than two.