Comparing rows to identify matches and mismatches - vba

I have a sheet with 10000+ rows, but the data came from 2 sources, so duplicates exist for the same combination of unique keys. So let's say that columns A and B are the unique identifier. Columns C to K have data about the item specified by the unique key. I need to check if there is a second occurrence of the unique key combination and if so, are the data in columns C to K in the second occurrence the same as in the first occurrence. If they are the same, then copy the row to sheet 2.
if a1 = a2 and b1 = b2 then check if c1:k1 equals c2:k2 -> copy to sheet 2
I need to create separate lists of matches and mismatches.

Have you tried using vlookup? When you do this in both directions, you know what the differences are between the two lists. When doing this, make sure that things like spaces are exactly the same. Try using trim on both lists first to remove extra spaces.

Related

How to lookup if my lookup data has duplicate values?

I am trying to lookup values from Table 1 to Table 2 based on Col1 in Table 1.
The catch is that Table 1 has duplicate values (for example, A is repeated 3 times) but I don't want to duplicate the returned value from Table 2.
How can this be done through either excel or sql (e.g. LEFT JOIN)?
What SQL are you using? Are you familiar with CTE and partition?
Have a look here: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/597b876e-eb00-4013-a613-97c377408668/rownumber-and-cte?forum=transactsql
and here: (answer and 2nd comment): Select the first instance of a record
You can use those ideas to create another field that tells you whether the row is the first, 2nd , 3rd etc occurrence of Col1. Eg you'd have something like
1 B Red 150
2 B Red 150
and you can then update col3 to be zero where this new field is not 1.
EDIT: since you asked about Excel: in Excel, sort by whatever criteria you may need (col 1 first, of course). Let's say that Col1 starts (excluding the heading) in cell C2. Set cell B2 =1. Then write this formula in cell B3:
=IF(C3=C2,B2+1,1)
and drag it all the way down. This will count the occurrences of col 1, ie it will tell you which is the first, 2nd etc time a given value appears in col1. You can then use it as as the basis to change the value in other columns.
Also, it is not good practice to have a column where the first cell has a different formula from the others. You can use the same formula nesting another IF and referencing the row, so as to set one formula for the first row and one for the others.

Find value in column, based on 2 criteria

I have a file with 3 columns. Column A contains 300,000 rows, with about 200 separate IDs, all duplicated at least 1,000 times. Column B contains the date for each of the rows. Column C contains the values that I need to extract.
Each of the 200 IDs in Col A can have multiple values (e.g. ID 1234 might have dates 1/1/2001, 1/3/2001, 1/2/2015, etc). Similarly, each date on Col B will have multiple IDs (e.g. 1/2/15 might have IDs of 1234, 1874, 1930, 6043, etc).
In a nutshell, I need to check the values in Col A and Col B to find the relevant ID in Col A and the maximum value in Col B, and return the value in the relevant cell in Col C.
I've looked at Index/Match examples, but they don't seem to be suitable. Is there any suggestions on a macro I could run, that would accomplish what is needed.
Use this array formula:
=INDEX($C$1:$C$300000,MATCH(1,IF(($A$1:$A$300000="1234")*($B$1:$B$300000=MAX(IF($A$1:$A$300000="1234",$B$1:$B$300000))),1,0),0))
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
Change the "1234" to a reference cell with the appropriate ID.
You can accomplish this using array formulas. To start, you can retrieve the maximum date in column B when column A is 1234 using the below formula. Keep in mind that you have to use Ctrl-Shift-Enter when you finish typing an array formula.
{=MAX(IF($A$2:$A$24=1234,$B$2:$B$24))}
Note that you will need to change the ranges to include all of your data, rather than my test data on rows 2-24.
Now that you have a formula to retrieve the max date, you can put that inside an index/match and, again using Ctrl-Shift-Enter, use the below array formula to retrieve the value in column C for a row matching 1234 and the maximum date.
{=INDEX($C$2:$C$24,MATCH(1234&MAX(IF($A$2:$A$24=1234,$B$2:$B$24)),$A$2:$A$24&$B$2:$B$24,0))}

Check if one of multiple values is present in a column

I have a table in Excel 2013 that has has thousands of records of food items (Beef-frozen, beef-chilled, beef-brisket, beef-ribs, chicken-fillet, chicken-whole, fish-skinned, fish-whole, yogurt, lettuce-imported, lettuce-frozen, tomato-fresh,tomato, water, milk,...etc) stored in column A. Notice the value may contain other content than the food item name.
I created column B next to column A. I want column B to hold the category of the food item in column A. For example, if A1 has in it "Beef" or "Chicken" or "Fish" then B1 should equal "Meat". If A1 has in it "Tomato" or "Lettuce" or "Onion" then B1 should equal "Vegetable".
What is the best way to achieve it?
Assuming you have column headers, enter this formula in cell B2:
=REPT("Meat",MAX(IFERROR(MATCH({"*beef*","*chicken*","*fish*"},A2,),))) & REPT("Vegetable",MAX(IFERROR(MATCH({"*tomato*","*lettuce*","*onion*"},A2,),)))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
Now copy B2 and select B3 down as far as you need and paste.
Note: please look closely at the big gap in the middle of the formula. You'll see that this is really two separate formulas concatenated together with an ampersand. You can easily extend this formula in the same way by adding another phrase similar to the first two for a new category. In fact, you could add many more categories in this fashion.
Set up a two column table. Name it, for example FoodTable. Have the first column Named Word (for keyword) and the second column Type, for the type of product. Something like this:
Then, with your data in column A, enter the following formula in B1 and fill down:
=LOOKUP(2,1/ISNUMBER(FIND(FoodTable[Word],A1)),FoodTable[Type])
Results:

How to compare a list of rows to another list of rows in Excel?

I am trying to figure out if there are any differences between a list of data with another. In order for a row of data to "match" with another row, the row must have the same values in their corresponding column. The rows themselves do not have to be in any particular order. In particular, I am dealing with a parts list, where there are part numbers, descriptions, etc. I am trying to figure out if any rows of data are different from rows of data from another list.
I found Compare two sheets using arrays, which may have the answer to my problem, but I am having trouble figuring out how to adapt to my code due to inexperience in Visual Basic.
I was able to get it to work for a single column of data, comparing one column of data from one sheet to another, but cannot get it to compare entire rows of data.
Here is an example of I want this to work:
Sheet 1 Sheet 2
Column 1 Column 2 Column 1 Column 2
Row 1 22a 33 11 11
Row 2 22a 33a 22a 33
Row 3 55 22b 55 23b
The code in the link will tell you what is not in sheet 1 but in sheet 2 and vice versa. In this example, I would like the code to tell me Sheet 1 Row 2 and Sheet 1 Row 3 are not in Sheet 2, and Sheet 2 Row 1 and Sheet 2 Row 3 are not in Sheet 1 (Sheet 1 Row 1 and Sheet 2 Row 2 match).
If that is ok by you, you can do it without VBA using the following formula:
={IF(IFERROR(MATCH(A1&"|"&B1;Sheet7!$A$1:$A$3&"|"&Sheet7!$B$1:$B$3;0);-1)=-1;"Unique";"")}
Assuming that each of your tables start in A1 (so that the tables with three entries span A1:B3), and entering this formula into C1 (and copying it down), press CTRL+SHIFT+ENTER when entering the formula to create an array formula, this will show the word "Unique" in column C if the pair in that row on that sheet is not in any of the row-pairs on sheet 2.
You can then use conditional formatting to highlight unique rows, filter on the tables to include only unique rows, or some other way of doing what you need.
NOTE 1: I have entered my numbers in Sheet6 and Sheet7 instead of 1 and 2. The formula written above goes into Sheet6.
NOTE 2: My language use ; instead of , as function separator, so if yours use , you need to change that.
NOTE 3: You will need to expand the ranges Sheet7!$A$1:$A$3 and Sheet7!$B$1:$B$3 if your set grows (this will happen automatically if new rows are inserted in between the old ones). The best is still probably to create named ranges for each of the 4 columns, exchange the references with those, and manage the named ranges instead of the formulas.
NOTE 4: If your data set contains the character "|", you need to change that as well, to match some character that you for sure do not have there.
Alternatively you could in column C on each cheet enter (assuming first entry in C1)
=A1&"|"&B1"
and copy this down, then run the solution from your copied example using that C column instead of on A1 and B1.

Highlight cells based on some criteria related column and rows to another table

I have two worksheets: the first one is main (Table 1) and the second is the report (Table 2) generated from values in two columns in the main table. When the conditional formatting is triggered the cell in Table 2 is highlighted:
In Table 1, the primary key is a compound key combining id-year columns.
In Table 2, the report checks whether the related column-row exists in Table 1 and if so the cell is to be highlighted.
How I can achieve it using conditional formatting?
I've some steps that will converted to conditional formatting in cell below:
In selected cell, the paired value year-id from table 2 will be
looked up in Table 1 in the relevant column pairing.
If the related paired-value exists, the cell in Table 2 is highlighted (the color differs between ids) and if not it won't be highlighted.
For Step 1. I can't find the right formula. If there is another solution I'll considered it.
Can Step 2. be achieved with VBA and if so, how?
[updated]
Based on pnuts's suggestion, the problems above I can achieve with some modification to get vary color but recently I get different format value that appear in table 1 that look like "2003-2004". In second table, the related column (2003 & 2004) must be highlighted.
How I can check "-" sign then highlight two related columns?
Assuming Table1 and Table2 are both in Cell B2, one way is to put =Sheet1!C5&Sheet1!D5 in your Table2 sheet in A5 and copy down until a cell appears blank, then apply CF to =$C$6:$M$11 with this rule:
=MATCH($B6&C$5,$A:$A,0)>0
This would only apply one colour throughout (which may be less confusing than 5 or more) but I take it you know how to break this down into separate rules for different colours by restricting the range for each to one row at a time.