Count unique string variants - vba

There could be quite a simple solution to this, but I am trying to find the number of times a unique variant (i.e. non-duplicates) of a string appears in a column. However this string is only part of the text contained in a cell, and not the entire cell. To illustrate:
EuropeSpainMadrid
EuropeSpainBarcelona
AsiaChinaShanghai
AsiaJapanTokyo
EuropeEnglandLondon
EuropeSpainMadrid
I would like to find how many unique instances there are of a string that contains "EuropeSpain". So using this example, I would find that a variant of "EuropeSpain" appears only twice (given that the second instance of "EuropeSpainMadrid" is a duplicate).
A solution to this is to use pivots to summarise the data and remove duplicated; however given that my underlying dataset changes often this would require manual adjustments and corrections. I would therefore like to avoid adding any intermediate steps (i.e. PivotTables, other data sets etc) between my data and the counts.
UPDATE: I now understand to use wildcards to solve the first part of my question (counting the occurrences of "EuropeSpain"), however I am not yet clear on the second part of my question (how to find the number of unique occurrences).
Is there a formula or VBA code that could do this?

Using wildcards:
=COUNTIF(A1:A6,"="&"*"&C1&"*")

For without VBA but with some versatility, I suggest with Text in ColumnA (labelled), ColumnB labelled Flag and EuropeSpain in C1:
=FIND(C$1,A2)
in B2 copied down.
Then pivot A:B with Flag for FILTERS (and 1 selected), Text for ROWS and Count of Text for Sigma VALUES.
Apply Distinct Values if required (and available!), alternatively a formula of the kind:
=MATCH("Grand Total",E:E)-4
would count uniques.

Related

How do you select irregular duplicates with Google Sheets queries?

I have a Google sheet with 186,000 rows. I have included a dummy spreadsheet to give you an idea of the data. I need to select ALL duplicates, that includes rows where the first names might not match (i.e. Cathy vs Catherine), but they still refer to the same individual. There are also instances where the addresses might be slightly different (like omitting "Ave" in one row but including it in another).
I need to write a query to account for all of these instances, including just regular duplicates. Or I could do multiple queries and just copy the results into one spreadsheet. In any case, I'm at a loss.
Dummy spreadsheet. I have included one example of each case I am trying to account for (3 total).
I have something that may be useful. See my example sheet here:
https://docs.google.com/spreadsheets/d/19h28go-nzunW6zexcMD61QjySUKJA3Q2Ci2Hu3OMuAg/edit?usp=sharing
Basically I build a key value for each record, along the lines you asked for.
All of the last name, part of the first name, part of the address, and the ZIP code. Other variations are easily added.
The formula is just a string concatenation of parts of these fields, as follows:
=ArrayFormula(
IF(ROW(A2:A)=2,"DupeKey",
IF(A2:A<>"",A2:A &LEFT(B2:B,$N$1) &LEFT(G2:G,$N$1) &K2:K,"")))
A valuable option is to allow varying the length of the required matching sub-string, from the first name and address. This is controlled for the formula by selecting a substring length of 1 to 6 in cell N1, and seeing how this changes the duplicate records that are found. The shorter the substring length, the more duplicate (or possibly duplicate) records will be found.
Conditional formating is used to highlight the duplicate records.
And you can use the column filters to sort by different data columns - to put all of the duplicates at the top, sort by column N, in Z-A order, and exclude blanks.
Note that this isn't perfect. If someone accidentally types a space, or anything else, at the start of a data field, it will not be considered a duplicate. Better logic would be required to catch those.
Let me know if this helps.
You can use these formulas:
If cell B3 match "John" write "match", if doesn't match write "no"
=IF(REGEXMATCH(B3,"John"), "match", "no")
If cell F2 contains content of cell B3, write "match", if doesn't match write "no"
=IF(SEARCH(B3, F2)>0,"match","no")
References:
REGEXMATCH
SEARCH

Excel formula not working as expected

I have a sheet that shows max values spent anywhere. So I need to find most expensive place and return it's name. Like this:
Whole sheet.
Function.
Function in text:
=IFS((A6=MAX(D2:D31)),(INDEX(C2:C31,MATCH(A6,D2:D31,0))),(A6=MAX(H2:H31)),(INDEX(G2:G31,MATCH(A6,H2:H31,0))),(A6=MAX(K2:K31)),(INDEX(K2:K31,MATCH(A6,L2:L31,0))))
Basically I need to find a word left to value, matching A6 cell.
Thanks in advance.
Ok.. Overcomplicated!
Firstly, why the three rows? it's a lot easier if you just have one long row with all the data (tell me if you actually need 3 I'll change my solution)
=LOOKUP(MAX(D2:D31);D2:D31;C2:C31)
The MAX formula will lookup the biggest value in the list, the Lookup formula will then match it to the name.
Please note: If more than one object has the maximum price, it will only return the first one. The only way I can think of to bypass that would be to build a macro.
EDIT:
Alright.. Multi Column solution is ugly and requires extra columns that you can just hide.
As you can see you'll need 2 new columns that will find the highest for each row, 2 new columns that will find the value for each of these "highest" (in this case tree and blueberries) and then your visible answer will simply be an if statement finding out which one is bigger and giving the final verdict. This can be expanded with an infinite number of columns but increases complexity.
Here are the formulas:
MAX(H2:H31)
LOOKUP(A5;H2:H31;G2:G31)
MAX(L2:L31)
LOOKUP(C5;L2:L6;K2:K6)
IF(A5>C5;B5;D5)

Text Manipulation nestled within a Query (Or ArrayFormula) (Google Sheets)

I'm trying to Query some data in my spreadsheet, returning a manufacturer based on product code. We code our products with a three digit suffix that corresponds to different customers. I know the codes but people viewing the sheet may not.
Right now, I'm trying to split the suffix from the product and perform the wuery in the same formula.
I can do this in two steps, splitting the suffix from the code and querying just the suffix, but I want to know if I can do this all in one code. My current formula returns the data I want but it does not fill the entire range of the sheet. I would rather have this happen automatically as the workbook will be dynamic.
My current formula is:
=QUERY(CxSeries,"select B where C CONTAINS '"&right(Code,3)&"' ")
https://docs.google.com/spreadsheets/d/190kom4q0XOJP4UdLTJpZf5tuJCQTflcuokRp_FJ4pBc/edit?usp=sharing
I'm not sure if query is the right way to go about this, but I'd prefer to stick to that (just because i honestly can't wrap my head around ArrayForumlas).
Thank you,
Clear all formulas you have in column C and enter in C7
=ArrayFormula(vlookup(regexextract(D7:D16,"-(\d+)$")+0, {Sheet5!C6:C,Sheet5!B6:B}, 2, 0))

Need a simple search function to display most common value in a column. (with ambiguous choices)

I have a very large array of data with many columns that display different outputs for the values presented. I would like to add a row above the data that will display the most common occurring value or word below.
Generally I would like to have each top of the column (right under the column label in row 1) have the most common value below. I will then use this value for various data analysis functions!
Is this possible, and if so, how? Preferably this will not require VBA, but simply a short code in the cell.
One caveat: The exact values may vary, so there is no set list where I can say "it will be one of these."
Any ideas appreciated!
Try a series of =COUNTIF(A:A,"VALUE TO SEARCH") functions if you want to stay away from VBA.
Otherwise, the best method would be to iterate through each column via VBA. With this method, you can even count the "varying" values and return the count and/or the value itself.
http://www.excel-easy.com/examples/most-frequently-occurring-word.html
This is a single formula you would write at the top of each column. Does not require VBA. You can replace the set range to an entire column, such as (A:A) instead of (A1:A7).
If you mean an array as in a data type, it could work differently but it depends what you're trying to do.
With data from A3 through A16, in A2 enter:
=INDEX($A$3:$A$16,MODE(MATCH($A$3:$A$16,$A$3:$A$16,0)))
This will work for text as well as numbers. Adjust this to match the column size.

Excel - How do I find all relevant rows by typing unique invoice# listed Col A

I have a Worksheet with 10 columns and data range from A1:J55. Col A has the invoice # and rest of the columns have other demographic data. Goal is to type the invoice number on a cell and display all the rows matching the invoice number from col A.
Besides auto filter function, the only thing comes to my mind is VBA. Please advice what is the best way to get the data. Thanks for your help in advance.
Alright, I'm pretty proud of this one. Again avoiding VBA, this one uses the volatile formula OFFSET to keep moving its VLOOKUP search down the table until it's found all matches. Just make sure you paste enough rows of the formula that if there are many matches, there's room for all of them to appear. If you put a border around your match area then it would be clear if you ever ran out of room and needed to copy down the formula some more.
Again, in the main section, it's just a single formula (using index):
=IFERROR(INDEX($A$1:$J$200,$M3,MATCH(N$2,$A$1:$J$1,0)),"")
This gets to be so simple because the hard work of the lookup is done by an initial column which looks up the next row that matches the invoice number. It has the formula:
=IFERROR(MATCH($L$2,OFFSET($A$1:$A$200,M2,0),0)+M2," ")
Here is the working example that goes with those formulas:
Let me know if you need any further description of how it works, but it mostly uses the same rules as above so that it's robust in copying and moving around.
I've uploaded the Excel file so you can play with it, but everything you need to reproduce this feature should be in this solution.
Google Docs - Click link and hit Ctrl+S to download and open in Excel.
A popular solution to this problem is a simple VLookup. Lookup the invoice the user types in on the table A1:J55, and then return an adjascent column's data.
Here's an example of it working:
The formula in the highlighted cell is:
=VLOOKUP($L3,$A:$J,MATCH(N$2,$1:$1,0),FALSE)
What's nice about this formula is you only need to type it once and then you can copy it across and it'll automatically pick out the correct column of the table (that's the match part). The rest is very simple:
The first part says lookup value $L3 (the invoice number typed in),
The second part says look it up in range $A:$J (which is where your table is located). I've shown how you can select the entire columns $A:$J so that you can add and remove data without worrying about adjustin the range in your lookups. (Excel takes care of optimizing the formula so that unused cells aren't checked)
The third part picks the column from which the resulting data will be drawn once a matching row is found.
The FALSE part is an indication that the invoice number must match exactly (no approximate matching allowed)
The $ signs ensure that fixed ranges like the location of your source table ($A:$J) and your lookup value ($L3) don't get automatically changed as you copy the formula across for multiple columns.
The formula is pretty easy to adapt if you want to move around your table and the area where you do your lookup. Here's an example:
Bonus
If you want to add a little spiff, you can add a dropdown to the Invoice # field so that the user gets auto-completion and the option to browse existing values like so: