Count Unique Text Based on 2 Criteria - vba

I was playing around with a couple ideas to count unique text based on two criteria. I was thinking Sumproduct would do it for me, but it doesn't seem to work. I may need some kind of VBA script. Basically, I want to do the following.
1) Look at everything in ColumnB and if that matches a criteria
2) Look at everything in ColumnW and if that matches a criteria
3) Count unique text in ColumnF.
It's almost like this:
=COUNTIFS(W:W,A1,B:B,B1)
THEN, based on the result of that, count uniques in ColumnF
I was thinking this should be pretty easy, but it's turning out to be really hard!

You can use a standard SUMPRODUCT-based pseudo-COUNTUNIQUE but you need to modify it by adding the criteria in the numerator and also the inverse criteria in the demononator; the latter to avoid #DIV/0! errors. This will generate a pseudo-COUNTIFSUNIQUE.
=SUMPRODUCT(SIGN((B2:B20="bee")*(W2:W20="double-you"))/
(COUNTIFS(B2:B20, "bee", W2:W20, "double-you", F2:F20, F2:F20)+(B2:B20<>"bee")+(W2:W20<>"double-you")))

Another solution is to adapt the formula from here: Count Unique Text Values in a Range.
Instead of using F:F, you modify it to match F:F if both Crit1 and Crit2 hold and "No Match" otherwise. That is,
IF((B:B="Crit1")*(W:W="Crit2"),F:F,"No Match")
Then you do a unique count on that array.
=SUMPRODUCT(--(FREQUENCY(MATCH(IF((B:B="Crit1")*(W:W="Crit2"),F:F,"No Match"),
IF((B:B="Crit1")*(W:W="Crit2"),F:F,"No Match"),0),
ROW(F:F)-ROW($F$1)+1)>0))
-NOT(PRODUCT((B:B="Crit1")*(W:W="Crit2")))
The -NOT(PRODUCT(...)) at the end is to subtract the unique count for the "No Match" entry if it exists (this can be replaced by just -1 if you know there will always be things that don't match both criteria).
Note that this is an array formula and must be entered using Ctrl+Shift+Enter.

Related

How do you select irregular duplicates with Google Sheets queries?

I have a Google sheet with 186,000 rows. I have included a dummy spreadsheet to give you an idea of the data. I need to select ALL duplicates, that includes rows where the first names might not match (i.e. Cathy vs Catherine), but they still refer to the same individual. There are also instances where the addresses might be slightly different (like omitting "Ave" in one row but including it in another).
I need to write a query to account for all of these instances, including just regular duplicates. Or I could do multiple queries and just copy the results into one spreadsheet. In any case, I'm at a loss.
Dummy spreadsheet. I have included one example of each case I am trying to account for (3 total).
I have something that may be useful. See my example sheet here:
https://docs.google.com/spreadsheets/d/19h28go-nzunW6zexcMD61QjySUKJA3Q2Ci2Hu3OMuAg/edit?usp=sharing
Basically I build a key value for each record, along the lines you asked for.
All of the last name, part of the first name, part of the address, and the ZIP code. Other variations are easily added.
The formula is just a string concatenation of parts of these fields, as follows:
=ArrayFormula(
IF(ROW(A2:A)=2,"DupeKey",
IF(A2:A<>"",A2:A &LEFT(B2:B,$N$1) &LEFT(G2:G,$N$1) &K2:K,"")))
A valuable option is to allow varying the length of the required matching sub-string, from the first name and address. This is controlled for the formula by selecting a substring length of 1 to 6 in cell N1, and seeing how this changes the duplicate records that are found. The shorter the substring length, the more duplicate (or possibly duplicate) records will be found.
Conditional formating is used to highlight the duplicate records.
And you can use the column filters to sort by different data columns - to put all of the duplicates at the top, sort by column N, in Z-A order, and exclude blanks.
Note that this isn't perfect. If someone accidentally types a space, or anything else, at the start of a data field, it will not be considered a duplicate. Better logic would be required to catch those.
Let me know if this helps.
You can use these formulas:
If cell B3 match "John" write "match", if doesn't match write "no"
=IF(REGEXMATCH(B3,"John"), "match", "no")
If cell F2 contains content of cell B3, write "match", if doesn't match write "no"
=IF(SEARCH(B3, F2)>0,"match","no")
References:
REGEXMATCH
SEARCH

Using Excel to to Return unique values for Identical lookup values

I am attempting to use VLookup to match a "Purchase Number" to a specific "Invoice Number". To accomplish this, I have several identifiers about the purchase that I put together to come up with a special "Concat ID". I then have a list of Invoice Numbers that also has the same list of identifiers to create the same "Concat ID'.
The problem I am running into is that the set of identifiers is not unique (aka a purchase of 10 Computers might happen multiple times a year, therefore it is in my list multiple times). Because of this, when I use Vlookup to match the 2 IDs, it always is giving me the same Purchase Number for each time the Concat ID is found (which is just the first occurrence of that Concat ID).
Since there is no other data that would allow for matching (because Invoice date and purchase date are not always the same date or even close to one another), I am just wanting to ensure that each Invoice Number has a unique purchase number.
I'm not sure if its possible, but I was hoping I would be able to perform the vlookup then just skip to the next time the Concat ID is found, allowing for no duplicates, but that hasn't been feasible for me. Because this is a file of 16000 rows, any insight is very appreciated.
I'm sure that's not the clearest explanation, so I've attached a screenshot of the 2 examples in case anyone has any insight. I've been using a simple VLookup, but I'm open to trying VBA or any other suggestions everyone has. As always, thank you Stack community in advance for any help/insight!
Purchase Info
Attempted Matchup with Invoice Info
I'm still not sure what you expect to do with the ConcatID purchase number, but to return the purchase numbers that match your specific ConcatID, generated in the manner you describe in your question, and "skipping to the next" in the case of identical ConcatID's, you can do something like the following;
Note that I made a Table out of your original data, and am using structured references. This allows a much smaller amount of data to be processed compared with referencing the entire column, and will also autoadjust the range as you add/remove rows
Also note that if your Table starts in other than Row 1, you will need to make an adjustment in the formula to account for that.
G2: =INDEX(PurchaseTbl[#All],AGGREGATE(15,6,1/1/(PurchaseTbl[Concat ID]=F2)*ROW(PurchaseTbl),COUNTIF($F$1:F2,F2)),7)
and fill down as far as needed
I've got a really dorky solution, but maybe it will help.
Use this formula to create a unique ID for each row. It will count how many times the specific Concat ID has been used previously in the table, then append it to the end. You can use the Concat ID Unique in your VLookup to get the correct Purchase Number.
=D2 & "_" & (COUNTIF(D$1:D1, "=" & D2))

Excel formula not working as expected

I have a sheet that shows max values spent anywhere. So I need to find most expensive place and return it's name. Like this:
Whole sheet.
Function.
Function in text:
=IFS((A6=MAX(D2:D31)),(INDEX(C2:C31,MATCH(A6,D2:D31,0))),(A6=MAX(H2:H31)),(INDEX(G2:G31,MATCH(A6,H2:H31,0))),(A6=MAX(K2:K31)),(INDEX(K2:K31,MATCH(A6,L2:L31,0))))
Basically I need to find a word left to value, matching A6 cell.
Thanks in advance.
Ok.. Overcomplicated!
Firstly, why the three rows? it's a lot easier if you just have one long row with all the data (tell me if you actually need 3 I'll change my solution)
=LOOKUP(MAX(D2:D31);D2:D31;C2:C31)
The MAX formula will lookup the biggest value in the list, the Lookup formula will then match it to the name.
Please note: If more than one object has the maximum price, it will only return the first one. The only way I can think of to bypass that would be to build a macro.
EDIT:
Alright.. Multi Column solution is ugly and requires extra columns that you can just hide.
As you can see you'll need 2 new columns that will find the highest for each row, 2 new columns that will find the value for each of these "highest" (in this case tree and blueberries) and then your visible answer will simply be an if statement finding out which one is bigger and giving the final verdict. This can be expanded with an infinite number of columns but increases complexity.
Here are the formulas:
MAX(H2:H31)
LOOKUP(A5;H2:H31;G2:G31)
MAX(L2:L31)
LOOKUP(C5;L2:L6;K2:K6)
IF(A5>C5;B5;D5)

Getting an Index/Match with a single criteria to sum its results

I am using an INDEX/MATCH to locate a facility number, then go to the appropriate column and return the first non-blank answer it receives. this is great. However, I can't figure out how to add a SUM function to this, so it will add all of the index match results, and not just stop at the first one it finds. I'd like to only use a formula, not VB. This is what my array currently looks like.
=INDEX(INDIRECT("$H"&(MATCH(M45,$B:$B,0))&":$H$10000"),MATCH(FALSE, ISBLANK(INDIRECT("$H"&(MATCH(M45,$B:$B,0))&":$H$10000")), 0))

Count unique string variants

There could be quite a simple solution to this, but I am trying to find the number of times a unique variant (i.e. non-duplicates) of a string appears in a column. However this string is only part of the text contained in a cell, and not the entire cell. To illustrate:
EuropeSpainMadrid
EuropeSpainBarcelona
AsiaChinaShanghai
AsiaJapanTokyo
EuropeEnglandLondon
EuropeSpainMadrid
I would like to find how many unique instances there are of a string that contains "EuropeSpain". So using this example, I would find that a variant of "EuropeSpain" appears only twice (given that the second instance of "EuropeSpainMadrid" is a duplicate).
A solution to this is to use pivots to summarise the data and remove duplicated; however given that my underlying dataset changes often this would require manual adjustments and corrections. I would therefore like to avoid adding any intermediate steps (i.e. PivotTables, other data sets etc) between my data and the counts.
UPDATE: I now understand to use wildcards to solve the first part of my question (counting the occurrences of "EuropeSpain"), however I am not yet clear on the second part of my question (how to find the number of unique occurrences).
Is there a formula or VBA code that could do this?
Using wildcards:
=COUNTIF(A1:A6,"="&"*"&C1&"*")
For without VBA but with some versatility, I suggest with Text in ColumnA (labelled), ColumnB labelled Flag and EuropeSpain in C1:
=FIND(C$1,A2)
in B2 copied down.
Then pivot A:B with Flag for FILTERS (and 1 selected), Text for ROWS and Count of Text for Sigma VALUES.
Apply Distinct Values if required (and available!), alternatively a formula of the kind:
=MATCH("Grand Total",E:E)-4
would count uniques.