Is there a way in Excel to filter out duplicates even if the items occur in a different order within cells? (VBA solutions are also welcome) - vba

What I am trying to do is some sort of smart "remove duplicates" in Excel.
I have a list of 200+ cells and each cell in the list potentially contains multiple items separated by a semi-colon (;).So, imagine I have a cell containing items (a,f,g) and another cell containing items (g,a,f).
Those cells are duplicates since they contain exactly the same items, but in a different order. However the order has no importance to me.
Is there a way that excel could recognize such cells as duplicates?
Many thanks in advance for you suggestions :)

If there is only one column, the solution is simple.
Use split to columns using ";" as delimiter
Sort the row, please note you have to sort row wise not column
Concatenate
use the highlight duplicate option.
In my opinion, this should be less time taking than going VBA way

Related

How do you select irregular duplicates with Google Sheets queries?

I have a Google sheet with 186,000 rows. I have included a dummy spreadsheet to give you an idea of the data. I need to select ALL duplicates, that includes rows where the first names might not match (i.e. Cathy vs Catherine), but they still refer to the same individual. There are also instances where the addresses might be slightly different (like omitting "Ave" in one row but including it in another).
I need to write a query to account for all of these instances, including just regular duplicates. Or I could do multiple queries and just copy the results into one spreadsheet. In any case, I'm at a loss.
Dummy spreadsheet. I have included one example of each case I am trying to account for (3 total).
I have something that may be useful. See my example sheet here:
https://docs.google.com/spreadsheets/d/19h28go-nzunW6zexcMD61QjySUKJA3Q2Ci2Hu3OMuAg/edit?usp=sharing
Basically I build a key value for each record, along the lines you asked for.
All of the last name, part of the first name, part of the address, and the ZIP code. Other variations are easily added.
The formula is just a string concatenation of parts of these fields, as follows:
=ArrayFormula(
IF(ROW(A2:A)=2,"DupeKey",
IF(A2:A<>"",A2:A &LEFT(B2:B,$N$1) &LEFT(G2:G,$N$1) &K2:K,"")))
A valuable option is to allow varying the length of the required matching sub-string, from the first name and address. This is controlled for the formula by selecting a substring length of 1 to 6 in cell N1, and seeing how this changes the duplicate records that are found. The shorter the substring length, the more duplicate (or possibly duplicate) records will be found.
Conditional formating is used to highlight the duplicate records.
And you can use the column filters to sort by different data columns - to put all of the duplicates at the top, sort by column N, in Z-A order, and exclude blanks.
Note that this isn't perfect. If someone accidentally types a space, or anything else, at the start of a data field, it will not be considered a duplicate. Better logic would be required to catch those.
Let me know if this helps.
You can use these formulas:
If cell B3 match "John" write "match", if doesn't match write "no"
=IF(REGEXMATCH(B3,"John"), "match", "no")
If cell F2 contains content of cell B3, write "match", if doesn't match write "no"
=IF(SEARCH(B3, F2)>0,"match","no")
References:
REGEXMATCH
SEARCH

Excel formula not working as expected

I have a sheet that shows max values spent anywhere. So I need to find most expensive place and return it's name. Like this:
Whole sheet.
Function.
Function in text:
=IFS((A6=MAX(D2:D31)),(INDEX(C2:C31,MATCH(A6,D2:D31,0))),(A6=MAX(H2:H31)),(INDEX(G2:G31,MATCH(A6,H2:H31,0))),(A6=MAX(K2:K31)),(INDEX(K2:K31,MATCH(A6,L2:L31,0))))
Basically I need to find a word left to value, matching A6 cell.
Thanks in advance.
Ok.. Overcomplicated!
Firstly, why the three rows? it's a lot easier if you just have one long row with all the data (tell me if you actually need 3 I'll change my solution)
=LOOKUP(MAX(D2:D31);D2:D31;C2:C31)
The MAX formula will lookup the biggest value in the list, the Lookup formula will then match it to the name.
Please note: If more than one object has the maximum price, it will only return the first one. The only way I can think of to bypass that would be to build a macro.
EDIT:
Alright.. Multi Column solution is ugly and requires extra columns that you can just hide.
As you can see you'll need 2 new columns that will find the highest for each row, 2 new columns that will find the value for each of these "highest" (in this case tree and blueberries) and then your visible answer will simply be an if statement finding out which one is bigger and giving the final verdict. This can be expanded with an infinite number of columns but increases complexity.
Here are the formulas:
MAX(H2:H31)
LOOKUP(A5;H2:H31;G2:G31)
MAX(L2:L31)
LOOKUP(C5;L2:L6;K2:K6)
IF(A5>C5;B5;D5)

extract data in exel sheet using macro

you most probably going to think "what an idiot" but remember i never done any type of coding before so this is all new to me,
My problem are that i'm working on a HUGE excel sheet with loads of data that is not needed. i need to sort the data into a few columns, i only need column "A,K,AN,AQ" but in column "AS" i only need certain values (yes,no,blank) i only want the yes and blank values. like i said never done any coding before but i know that you can use an macro to do it so please help, how do i go about this?
before trying to get into macros, try to use functions with if else statements. They are quite easy to handle. Like: If (yes) then put it into X. Later, you could select all needed. Also, check the, how the dollar sign is used
use this links to see, if it is something for you.
One quick and dirty way of getting this job done would be to:
Delete the columns you don't need.
Select all cells in the range you're interested in, click the Insert menu, and choose "Table". If your columns have titles, select the box for "My Table has Headers."
-This turns your data into an array so that Excel recognizes that each row is an entry (instead of thinking that the cells are unrelated).
Now you can use the filter icon in the column headers to select and display only the rows containing the values in column X that you're interested in.
Note that there are some limitations to what the table feature is good for, so, as always, whether this is a good solution for you depends on what you want to do with the data.

VBA - How to find frequencies of "x" and "y" in a list

I have a column of names on one page of an excel workbook, I need to find how often each of these names appears in that column and display it on another sheet. For example, the code needs to count "CS" however many times it appears in this column and display it on a separate sheet, then the same with "Grad" and so on. Any tips?
Thanks a lot
That's the primary use of Pivot tables.
Found the solution with a simple function
=COUNTIF(Individual_Stats!D6:D999,"CS")

How to Stack a range of values (from multiple tables in another sheet) into a single column

I'm working on a quarterly report that Auto-generates all fields.
I could really use some help building a formula that pulls values from the first column ([T6-TOC]) of three separate tables (ROVH_Jan, ROVH_Feb, ROVH_MAR) existing in another worksheet (RVH 1825). I need the three ranges of values to stack in a single column, but I do not want to eliminate duplicates values.
I've tried using =INDEX formula, and VBA but I can't get the syntax right.
Any suggestions?
These are sources I've viewed but didn't solve my problem.
https://superuser.com/questions/445410/pull-row-of-data-from-one-place-in-spreadsheet-to-another
http://forum.chandoo.org/threads/merge-stack-multiple-named-ranges-across-multiple-worksheets-in-a-master-sheet.11074/
Excel - Combine multiple columns into one column
http://www.mrexcel.com/forum/excel-questions/610527-how-do-i-stack-data-multiple-columns-into-one-column.html
Something like this should work for you:
=IF(ROW(A1)<=ROWS(ROVH_Jan),INDEX(ROVH_Jan[T6-TOC],ROW(A1)),IF(ROW(A1)<=ROWS(ROVH_Jan)+ROWS(ROVH_Feb),INDEX(ROVH_Feb[T6-TOC],ROW(A1)-ROWS(ROVH_Jan)),IF(ROW(A1)<=ROWS(ROVH_Jan)+ROWS(ROVH_Feb)+ROWS(ROVH_MAR),INDEX(ROVH_MAR[T6-TOC],ROW(A1)-ROWS(ROVH_Jan)-ROWS(ROVH_Feb)),"")))