VBA: Multiple Keyword vlookup - vba

I have a number of narrative descriptions that I need to categorize automatically in Excel:
Description Category
I updated the o.s.
I installed the o.s.
I cleaned valve a
I cleaned valve b
I installed valve a
Today the o.s. was updated
I have another worksheet with keywords and the category the keywords are associated with:
Keyword 1 Keyword 2 Keyword 3 Category
cleaned valve a A
installed valve a B
updated os C
installed os D
My code so far can only search one keyword at a time and therefore will report incorrect answers because some keywords are used in multiple narratives:
Public Function Test21(nar As Range, ky As Range) As String
Dim sTmp As String, vWrd As Variant, vWrds As Variant
'Splits Fsr Narrative into individual words so it can be searched for keywords'
vWrds = Split(nar)
For Each vWrd In vWrds
If Not IsError(Application.VLookup(vWrd, ky, 3, False)) Then
sTmp = Application.VLookup(vWrd, ky, 3, False)
Exit For
End If
Next vWrd
Test21 = sTmp
End Function
I've seen algorithms like this but I feel that my goal could be simpler to accomplish as all narratives are relatively simple.
Thanks for reading!

You can match multiple columns with a VLOOKUP by creating a "match column" that concatenates the multiple values together, then searching that column for a match.
So if you use this formula in column A:
=B1 & "|" & C1 & "|" & D1
You can then VLOOKUP against that match column:
=VLOOKUP("blah|bleh|ugh", 'Sheet2!A1:E100', 5, FALSE)
Which will match the one row that has "blah" in column B, "bleh" in column C, and "ugh" in column D, and return the value in column E.
For your data though, I think you might also want to have a step to clean up your input before trying to match a set of keywords. The method I described above works best if the keywords are in a particular order, and where you won't have any non-keywords cluttering up things. (It also works excellently for vlookups where you want to match multiple pieces of data, ie. first name, middle name, and last name in different columns)
Otherwise you could end up needing an incredibly huge number of rows in your category table to cover every possible combination and permutation of your keywords and the other random words they might be accompanied by.

This is what I was looking for:
Public Function Test22(nar As Range, key As Range, cat As Range) As String
For r = 1 To key.Height
If InStr(nar, key(r, 1)) And InStr(nar, key(r, 2)) Then
Test22 = cat(r)
Exit For
End If
Next r
End Function

Related

Find Text Duplicates in Excel

I'm an Excel/VBA newbie and I have a question.
Is it possible to tag partial string matches between two columns in Excel?
Let's say I have two columns, A and B, that have text values in them. I want to identify rows where the A cell and B cell has a partial match.
Here are some hypothetical cases of the 'partial matches' that I'm looking for.
Case 1: exact phrase match (Fictional Company Ltd) but one column has extra text
Cell A2: 123456789 Fictional Company Ltd
Cell B2: Fictional Company Ltd
Case 2: exact phrase match (Fictional Company Ltd) but both columns have extra text
Cell A3: 123456789 Fictional Company Ltd
Cell B3: Fictional Company Ltd, 1 Main Street, City, State 12345
Case 3: partial match
Cell A4: Fictional Ltd
Cell B4: Fictional Company Ltd
Case 4: word match
Cell A5: Fictional Company Ltd
Cell B5: Fictional
I would like to identify all of those cases above. However, I don't mind running >1 set of codes to cover them all.
Thanks a lot in advance for your help!
Update: when I first created the cases, I didn't realize that I put the first word in column B as the matching word with column A. It is not the case - sometimes it is the 3rd word in column B and the 5th word in column A that matches.. the data is all over the place!
*Update 2:** also want to clarify that the cases are reversible - for example, there are some rows where it's Case 1 but cell B has more info instead of cell A.
This function returns the number of times a word in Txt1 is contained anywhere (not just as a word) in Txt2:
Function CountMatches(text1 As String, text2 As String) As Long
Dim arr, x As Long
arr = Split(text2)
For x = 0 To UBound(arr)
If text1 Like "*" & arr(x) & "*" Then CountMatches = CountMatches + 1
Next x
End Function
...and this one does the same, but also counts each occurence of Txt2 anywhere within Txt1:
Function CountMatches2(text1 As String, text2 As String) As Long
Dim arr, x As Long
arr = Split(text1)
For x = 0 To UBound(arr)
If text2 Like "*" & arr(x) & "*" Then CountMatches2 = CountMatches2 + 1
Next x
arr = Split(text2)
For x = 0 To UBound(arr)
If text1 Like "*" & arr(x) & "*" Then CountMatches2 = CountMatches2 + 1
Next x
End Function
Both are susceptible to counting the same match twice, especially (obviously) the CountMatches2.
Sample Output:
I'm curious if this suits your needs (as it's obviously not a true "fuzzy match")...
It can be easily modified to return a TRUE/FALSE (ie., TRUE = One or more matches) or to look only for entire word matches as opposed to "anywhere".
Let me know if you have any questions!
Case 1 is possible, simply by truncating the length of the longer so that it matches the length of the shorter, and then seeing if they are the same. Use the LEFT function to trim the longer word to the length of the shorter one. (Use the LEN function on the shorter word to work out how long it is).
Case 2 is tricky but possible, because you effectively need to search the longer string for every possible combination of ordered words from the shorter. It's kind of a 'slightly simpler' version of Case 3.
Case 3 is damn tricky: it's pretty much a Fuzzy Match which is computationally expensive, and requires something called tokenisation to do efficiently. Microsoft has a free Fuzzy Match addin but it's kinda sucky...it returns many false positives to the point that you need to eyeball each and every result to make sure it is a valid one. Which completely defeats the purpose. I'm working on putting together a commercial offering in that space myself that returns far fewer false positives, but can't share code. Suffice to say that this is a very difficult thing to do efficiently.
Case 4 is trivial: you just use the SEARCH formula.
Add a whole 'nother layer of trickyness if you have multiple words in each list.
The above answer is enough to point you in the right direction for a Google search. Note that you can simplify things by substituting out things like "Ltd" and "Limited" and other sundry terms using the SUBSTITUTE formula, but you've still got a heck of a challenge on your hands.

Comparing two lists in excel and extracting values missing from 2nd list - cannot be duplicated (also over two sheets)

Im working on a project report for work and I'm trying to find a way to compare two lists of project codes i.e "123456" and see whether the 2nd list is missing any new values that would've been entered into the first list. The lists are thousands of records long and so far people have been doing it manually (it hurts me knowing this) so I'm trying to make it automatic.
What I have tried is using an Array with a Index(Match(CountIF))) formula but I just cant seem to get it working.
My problem is that when I get the array to fill with what i want I then can't get it to not duplicate values (I need it to check the masterlist so it doesnt output something more than once into the output list).
I've also tried to give it a go with other formulas - but the lists can be thousands of records long so I cant do a cell for cell match as the list would be huge (that or my excel knowledge isnt good enough to know the easy solution).
Any help would be hugely appreciated.
Array might not be the best solution
I've checked quite a few other solutions but they don't quite deal with my issue and I don't have the skill to adapt them.
Here is one approach using VBA and arrays which is quicker than doing via the sheet. It checks each item in H to see it is present in J (and not the other way round). I assume that's what you want.
Sub x()
Dim v1, v2, v3(), i As Long, j As Long
v1 = Range("H2", Range("H" & Rows.Count).End(xlUp)).Value
v2 = Range("J2", Range("J" & Rows.Count).End(xlUp)).Value
ReDim v3(1 To UBound(v1, 1))
For i = LBound(v1) To UBound(v1)
If IsError(Application.Match(v1(i, 1), v2, 0)) Then
j = j + 1
v3(j) = v1(i, 1)
End If
Next i
Range("K2").Resize(j) = Application.Transpose(v3)
End Sub
Using an input box
Sub x()
Dim v1, v2, v3(), i As Long, j As Long
v1 = Application.InputBox("First list", Type:=8)
v2 = Application.InputBox("Second list", Type:=8)
ReDim v3(1 To UBound(v1, 1))
For i = LBound(v1) To UBound(v1)
If IsError(Application.Match(v1(i, 1), v2, 0)) Then
j = j + 1
v3(j) = v1(i, 1)
End If
Next i
Range("K2").Resize(j) = Application.Transpose(v3)
End Sub
A formula solution.
Note that I turned the first two ranges into Tables and changed the names. The formula is using structured references. This enables the formula to auto update if you add rows in the future.
=IFERROR(INDEX(ProjList1[#Data],AGGREGATE(15,6,1/ISNA(MATCH(ProjList1[#Data],ProjList2[#Data],0))*ROW(ProjList1[#Data]),ROWS($1:1))-ROW(ProjList1[#Headers])),"")
How does it work? Briefly:
MATCH generates an array of #NA! errors or a number.
ISNA turns that into an array of TRUE/FALSE where TRUE indicates an entry in table 1 that is NOT in table 2
Multiplying that array by the array of project list rows returns an array of error message vs row number
AGGREGATE small function ignores the error returns to give an ascending list of row numbers
INDEX then returns the appropriate entry from Table 1
ROW(ProjList1[#Headers]) is a correction so that the table may be located anyplace on the worksheet, and still return the correct row.
Not sure if you're trying to set this up so it will autoupdate in future, but as a stopgap:
Countif column next to list 1 that checks whether they appear in list 2...
... Feeding into a pivot that only shows those where the countif value is 0, in the "row" field to remove duplication?

Find cell value, match, cut, move, ...vba

I am a beginner in VBA.
I have components which always consist from 2 parts. (Rotor and a stator, each has its own number). When work is with them it can be damaging some of these parts, however it is necessary to keep a list of damaged parts, where the result is inventory e.g. 200 rotors, stators 150 with different numbers. Before I could scrap it, I need to complete them as proper sets. I.e. rotor "a" stator "a", "b" with "b", etc. It's crazy to work with many numbers to compare them, copy …to find the result of sets qty.
It is possible to solve it with Macro, what I try to do, but I was stuck.
What is the task: In the column "A" I have a list of all damaged parts (mix of rotors, stators different numbers). In the column "C" an information only with help of VlookUP, what should be a counterpart number.
What do I need to solve: In row 5, column. „A“ I have component number , but I know that in the same column, somewhere from row 6 to xx I have a counterpart. What I need is … according to information from column C, same row(5) where is info about the counterpart num. to find counerpart in column A, when found, took it out and put into cell B5. Thus,I get a complete set. Then the next row (6), same action. Macro reading num. in „C“,searching in „A“, when found, cut, and put to „B“ next row 7,8,9,… The result should be a certain qty of pairs + some single numbers if not second part found.
The problem I have is that cycle is working until always found relared counterpart. If the counterpart in row A is not available (no match betwen C-A), the code will stop on that row.
What I need help with is, that if code did not find the counerpart based to info from C just skip this row, make it red and continue with next row till end, it means stop on first empty cell in C. Thanks a lot to everybody who is helping me.
Dim pn As Range,
Dim a
Dim x
x = 5
Dim i As Long, Dim radek As Long
a = Cells(x, 3)
For i = 1 To 500
Range("A:A").Select
Set pn = Selection.Find(What:=a)
If Not pn Is Nothing Then
pn.Select
End If
Selection.Cut
Cells(x, 2).Select
ActiveSheet.Paste
x = x + 1
Next
End Sub

Search through column in excel for specific strings where the string is random in each cell

I am working in excel with a datasheet that is 1000 rows and 15 columns. Currently, in one of the columns, I have a lot of data mixed in with people names (see below for an example). I want to see how many times each person's name appears in the datasheet, so I can use it in a pivot table. There is no particular format or order to the way names appear. It is random. Is there a way to code in excel to search through that whole column and give me a count of the amount of times each person's name appears?
Column D
21421Adam14234
2323xxx Bob 66
23 asjdxx Jacob 665
43 Tim 5935539
2394Bob 88
After some trial and error, I can generate a list of names, one per row and place them in a different column for comparison sake, if that makes it easier.
I know you have got your answer but why not use COUNTIF with Wild Cards? You don't need VBA for this :)
See this example
=COUNTIF($A$1:$A$5,"*"&C1&"*")
SNAPSHOT
You don't have VBA tagged, but I don't know if there is a way to do this without it. I've built a custom function below. To implement it, take the following steps.
1) List desired names starting at column E1.
2) Insert this function into VBA Editor
A) Presss Alt + F11
B) Click Insert > Module from menu bar
C) Copy this code into Module
Option Explicit
Function findString(rngString As Range, rngSearch As Range) As Long
Dim cel As Range
Dim i As Integer
i = 0
For Each cel In rngSearch
If InStr(1, cel.Text, rngString.Value) > 0 Then
cel.offset(,-1) = rngString.Value 'places the name in cell to right of search range
i = i + 1
End If
Next
findString = i
End Function
3) In F1 type the following formula
=findstring(E1,$D$1:$D$5)
4) Run the formula down column F to get the count of each desired name.

Is there a way to check for duplicate values in Excel WITHOUT using the CountIf function?

A lot of the solutions here on SO involve using CountIf to find duplicates. When I have a list of 100,000+ values however, it will often take minutes for CountIf to search for duplicates.
Is there a quicker way to search for duplicates within an Excel column WITHOUT using CountIf?
Thanks!
EDIT #1:
After reading the comments and replies I realize I need to go into greater detail. Let's pretend I'm a birdwatcher, and after I return from a birdwatching trip I input anywhere from 1 to 25 or 50 new birds that I saw on my trip into my "Master List of Birds Seen". This is really a dynamically growing list, and with each addition I want to make sure I'm not duplicating something that already exists in my list.
So, in column A of my file are the names of the birds. Column B-M might contain other attributes of the birds. I want to know if a bird that I just added in column A after my latest birdwatching trip ALREADY exists somewhere ELSE in my list. And, if it does, I would manually merge the data of the 2 entries and throw away some and keep some after careful review. I clearly don't want to have duplicate entries of the same bird in my database.
So, ultimately I want some indication that there is or isn't a duplicate somewhere else, and if there is duplicate please tell me what row to look in (or highlight or color both of the duplicates).
The fastest way that I know of (in case you are using Excel 2007/2010/2011) is to use Data (In Ribbon) | Remove Duplicates to find the total number of duplicates OR to remove duplicates. You might want to move data to a temp sheet before you test this.
The 2nd fastest way is to use Countif. Now Countif can be used in many ways to find duplicates. Here are two main ways.
1) Inserting a New Column next to the data and putting the formula and simply copying it down.
2) Using Countif in Conditional formatting to highlight cells which are duplicates. For more details, please see this link.
suggestions for a macro to find duplicates in a SINGLE column
EDIT:
My Apologies :)
Countif is the 3rd fastest way!
The 2nd fastest way is to use Pivot Tables ;)
What exactly is your main purpose of finding duplicates? Do you want to delete them? Or Do you want to highlight them? Or something else?
FOLLOWUP
Seems like I made a typo in the formula. Yes for large number of rows, CountIf does take minutes as you suggested.
Let me see if I can come up with a VBA code to suit your exact needs.
Sid
You can use VBA - the following function returns a list of unique entries within a list of 100,000 in less than a second. Usage: select a range, type the formula (=getUniqueListFromRange(YourRange)) and validate with CTRL+SHIFT+ENTER.
Public Function getUniqueListFromRange(parRange As Range) As Variant
' Returns a (1 to n,1 to 1) array with all the values without duplicates
Dim i As Long
Dim j As Long
Dim locKey As Variant
Dim locData As Variant
Dim locUniqueDict As Variant
Dim locUniqueList As Variant
On Error GoTo error_handler
locData = Intersect(parRange.Parent.UsedRange, parRange)
Set locUniqueDict = CreateObject("Scripting.Dictionary")
On Error Resume Next
For i = 1 To UBound(locData, 1)
For j = 1 To UBound(locData, 2)
locKey = UCase(locData(i, j))
If locKey <> "" Then locUniqueDict.Add locKey, locData(i, j)
Next j
Next i
If locUniqueDict.Count > 0 Then
ReDim locUniqueList(1 To locUniqueDict.Count, 1 To 1) As Variant
i = 1
For Each locKey In locUniqueDict
locUniqueList(i, 1) = locUniqueDict(locKey)
i = i + 1
Next
getUniqueListFromRange = locUniqueList
End If
error_handler: 'Empty range
End Function
If using Excel 2007 or later (which is likely from the 100,000+ values) you can choose:
Home Tab | Conditional Formatting > Highlight Cell Rules > Duplicate Values...
Right-click a highlighted cell and filter by selected cell color to show just the duplicates (be aware however this can be slow with conditional formatting).
Alternatively run this code and filter for colored cells which takes only a second on 100,000 cells:
Sub HighlightDupes()
Dim i As Long, dic As Variant, v As Variant
Application.ScreenUpdating = False
Set dic = CreateObject("Scripting.Dictionary")
i = 1
For Each v In Selection.Value2
If dic.exists(v) Then dic(v) = "" Else dic.Add v, i
i = i + 1
Next v
Selection.Font.Color = 255
For Each v In dic
If dic(v) <> "" Then Selection(dic(v)).Font.Color = 0
Next v
End Sub
Addendum:
To select only duplicate values without code or formulas, i have found this method useful:
Data Tab | Advanced Filter... Filter in Place, Unique Records Only, OK.
Now select the range of unique values and press Alt+; (Goto Special... Visible cells only). With this selection clear the filter and you will see that all unselected cells are duplicates, you can then press Ctrl+9 (Hide Rows) to show just the duplicates. These rows can be copied to another sheet if needed or marked with an "X".
You do not mention what you want to do when you find them. If you merely want to see where they are...
Sub HighLightCells()
ActiveSheet.UsedRange.Cells.FormatConditions.Delete
ActiveSheet.UsedRange.Cells.FormatConditions.Add Type:=xlCellValue, Operator:=xlEqual, Formula1:=ActiveCell
ActiveSheet.UsedRange.Cells.FormatConditions(1).Interior.ColorIndex = 4
End Sub
Preventing Duplicates with Data Validation
You can use Data Validation to prevent you entering duplicate bird names. See Debra Dalgelish's site here
Handling existing duplicates
My free Duplicate Master addin will let you
Select
Colour
List
Delete
duplicates.
But more importantly it will let you run more complex matching than exact strings, ie
Case Insensitive / Case Sensitive searches (sample below)
Trim/Clean data
Remove all blank spaces (including CHAR(160)) see the " mapgie" and "magpie" example below
Run regular expression matches (for example the sample below replaces s$ with "" to remove plurals)
Match on any combination of columns (ie Column A, all columns, Column A&B etc)
I'm surprised that no one has mentioned the RemoveDuplicates method.
ActiveSheet.Range("A:A").RemoveDuplicates Columns:=1
This will simply remove any duplicate entries on the active worksheet in column A. It takes milliseconds to run (tested with 200k rows). Mind you, this will strictly delete all the duplicate entries. Although that isn't how the original question was worded, I do believe that this still serves your purpose.
One simple way of finding unique values is to use the advance filter and filter for unique values only and copy and paste them into other sheet as when the pivot is removed you will get the whole data with the duplicate in them.
Sort the range
and in next column put `=if(a2=a1;1;if(a2=a3;1;0))
"1" will be displayed for duplicates.