Is there way to find duplicate words? - vba

I'm trying to find/make a program that will find all my duplicate words in Excel. For example in A1 "someone" in A2 ""person" and etc but I'll have "someone" multiples times or another word and I need to condense that information together. But I need to do it in a way where I don't search manually to concatenate duplicates. So is there a way to find the duplicate words and concatenate them?
I have also been looking into doing it using "FIND" to look for them but it has yielded no luck yet. I also have been using the "FILTER" but I don't know a way to condense the duplicates without doing it manually. I also been wondering where you can find the code for functions like "FIND, REPLACE and ect."? If I could find that I could change the coding for "REMOVE DUPLICATES" to change it for words. But hey I don't really know if that really would work or not. Anything would help.
For example:
column1 column2 column3
-----------------------------
y A (nothing)
z B (nothing)
z (nothing) I
x (nothing) k
y (nothing) j
x C (nothing)
to this
column1 column2 column3
-----------------------------
y A j
z B I
x C k
except the letters are words.

I don't know if you could do this with formulas in Excel unless you know what word you are looking for within the cell. You could try either a UDF, or a Regular Expression.
my question and answer with links might get you started:
StackOverflow: formula to see if a surname is repeated within a cell
and maybe:
VBA Express
Once you've posted your Excel worksheet with data we see if I've got it wrong!

You could use advanced filter to copy unique values from column 1 to a new column. Then you would use a vlookup formula to get the rest.
Assumptions:
Row 1 is a header row so actual data starts in row 2
Column1 is column "A"
Column2 is column "B"
Column3 is column "C"
The new column with the unique values is column "E".
In cell F2 and copied over to G2 and then down as needed:
=INDEX(INDEX($B$2:$C$7,0,COLUMNS($E2:E2)),MATCH(1,INDEX(($A$2:$A$7=$E2)*(INDEX($B$2:$C$7,0,COLUMNS($E2:E2))<>""),),0))

Sheet1 Before:
Code:
Sub Macro1()
With Sheet1
.Columns("A:A").AdvancedFilter Action:=xlFilterCopy, _
CriteriaRange:=.Range("F1:F2"), CopyToRange:=.Range("K1"), Unique:=True
.Columns("B:B").AdvancedFilter Action:=xlFilterCopy, _
CriteriaRange:=.Range("G1:G2"), CopyToRange:=.Range("L1"), Unique:=True
.Columns("C:C").AdvancedFilter Action:=xlFilterCopy, _
CriteriaRange:=.Range("H1:H2"), CopyToRange:=.Range("M1"), Unique:=True
End With
End Sub
Sheet1 After:
make sure field names are used.

This will give you a function that will find the first non-blank cell against a specific string
Option Explicit
Function NonBlankLookup(SearchTxt As String, LookIn As Range, OffSetRows As Long) As Variant
Dim loc As Range
Dim FirstFound As Range
Set loc = LookIn.Find(what:=SearchTxt)
While Not (loc Is Nothing)
If Not IsEmpty(loc.Offset(0, OffSetRows)) Then
NonBlankLookup = loc.Offset(0, OffSetRows).Value
Exit Function
End If
If FirstFound Is Nothing Then
Set FirstFound = loc
ElseIf loc = FirstFound Then
NonBlankLookup = CVErr(2000)
Exit Function
End If
Set loc = LookIn.Find(what:=SearchTxt, after:=loc)
Wend
NonBlankLookup = CVErr(2000)
End Function
to use, insert this code into a module, then in your excel spreadsheet, you can use a formula like =NonBlankLookup(E1,$A$1:$A$6,1) which will search for your text in A1:A6, and check 1 column to the right. If no text is found that matches the search string, or if the text is found but no data exists in the specified column, #NULL! is returned.
This also has a slight advantage to vlookup, as it will allow negative offset, so you could have the search text in column 2, and by using -1 for the offset, you could return data from column 1
Just so you are aware, because of the way that .find works, when you specify a range, it will start at the 2nd cell, and go down, and search the first cell you give it last.
e.g. with my example of A1:A6, it will search A2,A3,A4,A5,A6 and finally A1

Related

Counting cells using Excel VBA which are a result of a vlookup statement

I have a spreadsheet on one sheet with the values in column C being generated using the results of a vlookup statement, from a value which I enter in column A.
I need to be able to count the number of cells in column C up to a maximum of 51 rows (from row 1 to row 51) which have a value in them, not including errors, after I have entered all my values in column A.
Oh - by the way, each time I do the count there will be a different number of rows used.
I've tried using:
ccc = Range("C:C").Cells.SpecialCells(xlCellTypeConstants).Count
but this only counts the first line which is my header row.
Sorry if there is already an answer out there, but I've been looking for quite a while and can't find anything.
Thanks.
You can easily do this without VBA, but you could try:
sub testy()
dim myRange as range
dim numRows as long
Set myRange = Range("C:C")
numRows = Application.WorksheetFunction.CountA(myRange) - _
myRange.SpecialCells(xlCellTypeFormulas, xlErrors).Count
end sub
Your code is not working because xlCellTypeConstants is specifically telling it to count only constant values, ignoring formulas calculated values.
The worksheet function CountA counts only cells with values:
=CountA(C1:C51)
We can call any worksheet function from VBA with the WorksheetFunction function:
dim c as integer
c = WorksheetFunction.CountA([C1:C51])
CountIf can be used to skip errors:
Skip errors with: `=COUNTIF(D5:D9,">0")`
You are looking to count cells that have no errors.
Replace your vlookup by the below formula. So all errors will be replaced by "NOT FOUND" Text
=IFERROR(VLOOKUP(C1,A1:B3,2,FALSE), "NOT FOUND")
Then add this to find the number of cells that are non blank and non erroneous
=COUNTA(D:D) - COUNTIF(D:D,"NOT FOUND")
Assumptions:-
A:B Source Range
C Lookup Column
D the vlookup function is in this coulmn
For VBA
cnt = Application.WorksheetFunction.CountA(D:D) - Application.WorksheetFunction.Countif(D:D, "NOT FOUND")

Manipulating Excel spreadsheet, removing rows based on values in a column and then removing more rows based on values in another column

I have a rather complicated problem.
I have a log file that when put into excel the column "I" contains event IDs, and the column J contains a custom key that keeps a particular even grouped.
All i want to do is remove any rows that do not contain the value of say 102 in the event id column.
And THEN i need to check the custom key (column J) and remove rows that are duplicates since any duplicates will falsely show other statistics i want.
I have gotten as far as being able to retrieve the values from the columns using com objects and .entirecolumn cell value etc, but I am completely stumped as to how i can piece together a solid way to remove rows. I could not figure out how to get the row for each value.
To give a bit more clarity this is my thought process on what i need to do:
If cell value in Column I does not = 102 Then delete the row that cell contains.
Repeat for all rows in spreadsheet.
And THEN-
Read every cell in column J and remove all rows containing duplicates based on the values in column J.
Save spreadsheet.
Can any kind persons help me?
Additional Info:
Column I holds a string that is an event id number e.g = 1029
Column J holds a string that is a mix of numbers and letters = 1ASER0X3NEX0S
Ellz, I do agree with Macro Man in that your tags are misleading and, more importantly, I did indeed need to know the details of Column J.
However, I got so sick of rude posts today and yours was polite and respectful so I've pasted some code below that will do the trick ... provided Column J can be a string (the details of which you haven't given us ... see what Macro Man's getting at?).
There are many ways to test for duplicates. One is to try and add a unique key to a collection and see if it throws an error. Many wouldn't like that philosophy but it seemed to be okay for you because it also gives you a collection of all the unique (ie remaining) keys in Column J.
Sub Delete102sAndDuplicates()
Dim ws As Worksheet
Dim uniques As Collection
Dim rng As Range
Dim rowPair As Range
Dim iCell As Range
Dim jCell As Range
Dim delRows As Range
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set rng = Intersect(ws.UsedRange, ws.Range("I:J"))
Set uniques = New Collection
For Each rowPair In rng.Rows
Set iCell = rowPair.Cells(, 1)
Set jCell = rowPair.Cells(, 2)
On Error Resume Next
uniques.Add jCell.Value2, jCell.Text
If Err = 457 Or iCell.Value2 = 102 Then
On Error GoTo 0
If delRows Is Nothing Then
Set delRows = rowPair.EntireRow
Else
Set delRows = Union(delRows, rowPair.EntireRow)
End If
End If
Next
If Not delRows is Nothing then
MsgBox delRows.Address(False, False) & " deleted."
delRows.Delete
End If
End Sub
There are a number of ways in which this can be done, and which is best will depend on how frequently you perform this task and whether you want to have it fully automated. Since you've tagged your question with VBA I assume you'll be happy with a VBA-based answer:
Sub removeValues()
Range("I1").Select 'Start at the top of the I column
'We are going to go down the column until we hit an empty row
Do Until IsEmpty(ActiveCell.Value) = True
If ActiveCell.Value <> 102 Then
ActiveCell.EntireRow.Delete 'Then delete the row
Else
ActiveCell.Offset(1).Select 'Select the cell below
End If
Loop
'Now we have removed all non-102 values from the column, let`s remove the duplicates from the J column
Range("A:J").RemoveDuplicates Columns:=10, Header:=xlNo
End Sub
The key line there is Range("A:J").RemoveDuplicates. It will remove rows from the range you specify according to duplicates it finds in the column you specify. In that case, it will remove items from the A-J columns based on duplicates in column 10 (which is J). If your data extends beyond the J column, then you'll need to replace "A:J" with the appropriate range. Note that the Columns value is relative to the index of the first column, so while the J column is 10 when that range starts at A (1), it would be 2 for example if the range were only I:J. Does that make sense?
(Note: Using ActiveCell is not really best practice, but it's the method that most obviously translates to what you were trying to do and as it seems you're new to VBA I thought it would be the easiest to understand).

Advance AutoFilter to exclude certain values

I want to filter a large list of names in a Sheet in excel. In another sheet I have contained a list of names that I want to filter out and exclude from the larger list. How would I use the advanced filter to do this? I have tried this below but it is not seeming to work. My big list is in K2:K5000 and my criteria is in H2:H3 (The criteria will grow but I kept the list small for testing). Any help would be greatly appreciated!
Sub Filter()
Sheet5.Range("K2:K5000").AdvancedFilter Action:=xlFilterInPlace, _
CriteriaRange:=Sheets("Sheet3").Range("H2:H3"), Unique:=False
End Sub
To exclude the values in H2:H3 from K2:K5000 using advanced filter you can use following approach:
Make sure cell K1 is not empty (enter any header)
Find 2 unused cells (e.g. I1:I2)
Leave I1blank
Enter the following formula in I2
=ISNA(MATCH(K2,$H$2:$H$3,0))
Use the following code to exclude rows
Sheet5.Range("K1:K5000").AdvancedFilter Action:=xlFilterInPlace, _
CriteriaRange:= Sheets("Sheet3").Range ("I1:I2"), Unique:=False
I am not sure off the top of my head how you would use advanced filter to exclude, but you can use formulas in your advanced filter (near the bottom). You can, however, just use a dictionary to store values you want to exclude, then exclude (hide rows, or autofilter on the ones not found in your exclusion list)
Sub Filter()
Dim i as integer
Dim str as string
Dim dict As Object
Set dict = CreateObject("Scripting.Dictionary")
With Worksheets("Sheet3")
For i = 2 To 3
str = CStr(.Range("H" & i).Value)
If Not dict.exists(str) Then
dict.Add str, vbNullString
End If
Next i
End With
With Sheet5
For i = 2 To 5000
str = CStr(.Range("K" & i).Value)
If Len(str) > 0 And dict.exists(str) Then
.Range("K" & i).EntireRow.Hidden = True
Elseif
'alternatively, you can add those that aren't found
'to an array for autofilter
End if
Next i
End With
'If building autofilter array, apply filter here.
End Sub
Using AutoFilter:
Use an array of strings as criteria to filter on with the "Operator:=xlFilterValues" argument of AutoFilter. Build your array however you want, I chose to do it by building a string with a for loop and splitting (quick to write and test, but not ideal for a number of reasons).
Note: AutoFilter is applied to the headers, not data.
With Sheet5
.AutoFilterMode = False
.Range("K1").AutoFilter _
Field:=1, _
Criteria1:=arr, _
Operator:=xlFilterValues
End With
I think you need to understand first how to use the Advance filter.
There is a good tutorial you can find HERE.
Now based on that, let us make an example. Suppose you have below data:
Now, let us say you want to filter out Data1 and Data2.
According, to the link you can use a formula as criteria but:
Note: always place a formula in a new column. Do not use a column label or use a column label that is not in your data set. Create a relative reference to the first cell in the column (B6). The formula must evaluate to TRUE or FALSE.
So in our case, our relative reference is A11(the first cell or item in the field you want filtered). Now we make a formula in B2 since we cannot use A2, it is a Column Label. Enter the formula: =A11<>"Data1".
Above took care of Data1 but we need to filter out Data2 as well.
So we make another formula in C2 which is: =A11<>"Data2"
Once properly set up, you can now apply Advance Filter manually or programmatically. A code similar to yours is found below:
With Sheets("Sheet1")
.Range("A10:A20").AdvancedFilter xlFilterInPlace, .Range("A1:C2")
End With
And Hola! We have successfully filtered out Data1 and Data2.
Result:
It took me a while to get a hang of it as well but thanks to that link above, I manage to pull it of. I have learned something new as well today :-). HTH.
Additional:
I see that you have your criteria on another Sheet so you have to just use that in your formula. So if in our example you have Data1 and Data2 in H2:H3 in Sheet2, your formula in B2 and C2 is: =A11<>Sheet2!H2 and =A11<>Sheet2!H3 respectively.
You don't really even need VBA for this... to achieve the same result:
Put the values into a separate spreadsheet, in the first column.
Create 2 new columns next to the data you want to filter in your original spreadsheet
In the first column next to your data to be filtered, use
=VLOOKUP(A2, [nameOfOtherSpreadSheet.xlsx/xlsm/xls/etc]sheetName!$A:$A,1, FALSE)
Where A2 is the value you're searching for, field 2 is the reference of the range in which you want to search for this value, 1 is the index of the column in which you're searching, and FALSE tells VLOOKUP to only return exact matches.
In the second column next to the data you want to filter, use
=IFERROR(G2, FALSE)
Where G2 is the reference of the function that might return an error, and FALSE is the value you want to return if that function throws an error.
Filter the second column next to the data you want to filter for FALSEs
This should return the original data set without the values you wanted to exclude.
Record a macro to do this it's one step instead of 5 for future uses.

Using VBA to populate a column based on matching values in two columns with a third columns value

The scenario is the following, I have a sheet which has two columns, one I want to match against, the other contains the values I want to copy in case of a match. I have a second sheet which contains the values to search for in the match column and the column to which to copy the value column if we have a match.
This looks like a prime candidate for VLOOKUP but I want to avoid having to hard code the column number as the data sheets contents can vary. So I Find the column based on the header's contents. If there is a way to VLOOKUP the results with this flexibility, then that also works. I can't use a formula, this needs to be in VBA.
Below there are 4 columns defined:
toFindCol: this contains the master list of values I am going to try
and find in the toMatch column
toMatchAgainstCol: this contains the list of values i want to match the toFindCol values against
valueCol: this contains the value I want to copy, in case there is a match, the value has to come from the row on which the match occurred
resultsCol: this is where i want to copy the value to, the value needs to be copied to the row of the toFind value
For some reason the code below gives a "Type Mismatch" error.
Eventually I want to wrap this into a function/subroutine so I can pass in the sheets and column headers and get it to work it's magic. Brownie points for who can do that :)
Dim toFindCol As Range
Dim toMatchAgainstCol As Range
Dim valueCol As Range
Dim resultsCol As Range
Dim match As Variant
Set toFindCol = cohortDataSetSht.Columns(1).EntireColumn
Set toMatchAgainstCol = userSht.Cells.Find("id", , xlValues, xlWhole).EntireColumn
Set valueCol = userSht.Cells.Find("cdate", , xlValues, xlWhole).EntireColumn
Set resultsCol = cohortDataSetSht.Columns(4)
For Each findMe In toFindCol
Set match = toMatchAgainstCol.Find(What:=findMe, LookIn:=xlValues, _
LookAt:=xlWhole, SearchOrder:=xlByRows)
If Not match Is Nothing Then
resultsCol.Cells(findMe.Row, 0).Value = valueCol.Cells(match.Row, 0).Value
End If
Next findMe
There is a way to do this in a VLOOKUP. The basic format of the vlookup is vlookup(1,2,3,4).
I pass A2 as the top cell with the value I want to lookup. This formula can then be copied down to fill the other cells. Substitute the appropriate cell reference.
Use the Match function to find the columns you want to set your
range in. Since we only want the column letters (i.e. -"C:G"), I
strip out the number (which will always be 1 character) using the
Len function, and then use the indirect function to convert that to
a useable range, and lookup against that.
INDIRECT(LEFT(ADDRESS(1,MATCH("ToMatch",$1:$1,0),4),LEN(ADDRESS(1,MATCH("ToMatch",$1:$1,0),4))-1)&":"&LEFT(ADDRESS(1,MATCH("ValueCol",$1:$1,0),4),LEN(ADDRESS(1,MATCH("ValueCol",$1:$1,0),4))-1))
I then use the
MATCH("ValueCol",$1:$1,0)-MATCH("ToMatch",$1:$1,0)+1 to calculate
the column number of that contains the value we want to lookup in
that range.
I use 0 to indicate an exact match rather than closest value
The whole thing looks like this:
=VLOOKUP(A2,INDIRECT(LEFT(ADDRESS(1,MATCH("ToMatch",$1:$1,0),4),LEN(ADDRESS(1,MATCH("ToMatch",$1:$1,0),4))-1)&":"&LEFT(ADDRESS(1,MATCH("ValueCol",$1:$1,0),4),LEN(ADDRESS(1,MATCH("ValueCol",$1:$1,0),4))-1)),MATCH("ValueCol",$1:$1,0)-MATCH("ToMatch",$1:$1,0)+1,0)
This code assumes the column headers are in row 1. If not, replace the $1:$1 above with the absolute reference to the row your headers are in (i.e. -Row 5 would be $5:$5).
The other caveat is that we have to assume that the value you want to lookup will always be to the right of the lookup column.
Ok solved. I ended up using Rows for the toFind column:
Set toFindCol = cohortDataSetSht.Columns(1).Rows("2:" & cohortDataSetSht.Columns(1).End(xlDown).Row)
And then on the match I used the value:
Set match = toMatchAgainstCol.Cells.Find(What:=findMe.Value2, LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows)

VBA Count cells in column containing specified value

I need to write a macro that searches a specified column and counts all the cells that contain a specified string, such as "19/12/11" or "Green" then associate this number with a variable,
Does anyone have any ideas?
Do you mean you want to use a formula in VBA? Something like:
Dim iVal As Integer
iVal = Application.WorksheetFunction.COUNTIF(Range("A1:A10"),"Green")
should work.
This isn't exactly what you are looking for but here is how I've approached this problem in the past;
You can enter a formula like;
=COUNTIF(A1:A10,"Green")
...into a cell. This will count the Number of cells between A1 and A10 that contain the text "Green". You can then select this cell value in a VBA Macro and assign it to a variable as normal.
one way;
var = count("find me", Range("A1:A100"))
function count(find as string, lookin as range) As Long
dim cell As Range
for each cell in lookin
if (cell.Value = find) then count = count + 1 '//case sens
next
end function
If you're looking to match non-blank values or empty cells and having difficulty with wildcard character, I found the solution below from here.
Dim n as Integer
n = Worksheets("Sheet1").Range("A:A").Cells.SpecialCells(xlCellTypeConstants).Count
Not what you asked but may be useful nevertheless.
Of course you can do the same thing with matrix formulas.
Just read the result of the cell that contains:
Cell A1="Text to search"
Cells A2:C20=Range to search for
=COUNT(SEARCH(A1;A2:C20;1))
Remember that entering matrix formulas needs CTRL+SHIFT+ENTER, not just ENTER.
After, it should look like :
{=COUNT(SEARCH(A1;A2:C20;1))}