How to lookup if my lookup data has duplicate values? - sql

I am trying to lookup values from Table 1 to Table 2 based on Col1 in Table 1.
The catch is that Table 1 has duplicate values (for example, A is repeated 3 times) but I don't want to duplicate the returned value from Table 2.
How can this be done through either excel or sql (e.g. LEFT JOIN)?

What SQL are you using? Are you familiar with CTE and partition?
Have a look here: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/597b876e-eb00-4013-a613-97c377408668/rownumber-and-cte?forum=transactsql
and here: (answer and 2nd comment): Select the first instance of a record
You can use those ideas to create another field that tells you whether the row is the first, 2nd , 3rd etc occurrence of Col1. Eg you'd have something like
1 B Red 150
2 B Red 150
and you can then update col3 to be zero where this new field is not 1.
EDIT: since you asked about Excel: in Excel, sort by whatever criteria you may need (col 1 first, of course). Let's say that Col1 starts (excluding the heading) in cell C2. Set cell B2 =1. Then write this formula in cell B3:
=IF(C3=C2,B2+1,1)
and drag it all the way down. This will count the occurrences of col 1, ie it will tell you which is the first, 2nd etc time a given value appears in col1. You can then use it as as the basis to change the value in other columns.
Also, it is not good practice to have a column where the first cell has a different formula from the others. You can use the same formula nesting another IF and referencing the row, so as to set one formula for the first row and one for the others.

Related

Extracting "hidden" data from expanding/collapsing pivot table - Excel

I'm not sure if this is possible but as you can see I have a pivot table with multiple dependent and expandable fields. I am trying to concatenate the data from columns A:D into one cell which works fine in row 2 but doesn't work with blank parent cells, as you can see in column F.
Any ideas for how to achieve this?
Pivot table
This answer assumes that you don't want to just Repeat All Item Labels in the PivotTable from the "Report Layout" drop-down on the Pivt Table Tools "Design" tab.
A formula to get the first non-blank value on or above the same row as the current cell from Column B can be constructed with a combination of AGGREGATE, SUMPRODUCT and OFFSET, like so:
=OFFSET($B2,SUMPRODUCT(AGGREGATE(14,6,ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0),1))-ROW(),0)
How does it work?
Starting with the outermost part, OFFSET($B2, VALUE, 0) - this will start in cell B2, then look up or down by VALUE rows to get the value.
Next we need to know how many rows we will need to look up-or-down. Now, if we can work out the bottom-most row with data, we can subtract the current ROW() from that, giving us OFFSET($B2, NON_BLANK-ROW(),0)
So, to finish up we need to work out which rows are not blank, AND which rows are on-or-above our current row, then take the largest of those. This is going to take an ArrayFormula, but we can use SUMPRODUCT to make that calculate properly. To find the largest number we could use MAX or LARGE - but we get less errors if we pick AGGREGATE(14,6,..,1). (The 14 means "we want the kth largest number", the 6 means "ignore error values", and the 1 is k - so "we want the largest number, ignoring errors")
But, what list of numbers are we going to look at, I don't hear you ask. Well, we want the ROW for output from our range (I'm using $B$1:$B$100, because using the whole column B would take far to long to calculate repeatedly), a comparison against the current ROW(), and check that the LENgth is > 0. Those last two are comparisons, so let's write them out first:
ROW($B$1:$B100)<=ROW()
and
LEN($B$1:$B$100)>0
We want to use -- to convert TRUE and FALSE to 1 and 0 - this means that any "bad" values become 0, and any "good" values are larger than 0:
ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0)
This gives us the Row number when the Row is on-or-before the current row AND Column B is not blank - if either of those are False, then we get 0 instead. Stick that in the AGGREGATE to find the largest number:
AGGREGATE(14, 6, ROW($B$1:$B$100)*--(ROW($B$1:$B$100)<=ROW())*--(LEN($B$1:$B$100)>0), 1)
Then put it in a SUMPRODUCT to force Excel to treat it as an ArrayFormula, and that's your NON_BLANK. This then gives you that first formula right at the top of the post

Creating new rows by combining existing rows excel

I am fairly new here, so if this go against the rules please tell me.
I have an issue that seems pretty simple but I wanted to check to make sure. I have been trying to see if I could create a new row by combining every variable from one column with another, like so:
Column 1 Column 2 Combined
A 1 A1
B 2 A2
3 A3
B1
B2
B3
But instead of typing the combinations manually, I wanted the combined column make this combination without user input and to update automatically whenever column 1 or 2 has a row added or removed. I have been trying to figure out if there is some way to use the concatenate function in excel or the & sign but neither methods seems to work. I was thinking trying to code this in visual basics.
The main question: is this possible to do in excel? If so which function(s) could I use?
This assumes your data has one header row (row 1), Column 1 is column 'A' and Column 2 is Column 'B'. Place the formula below in an empty cell and copy down as far as your data permits.
=INDEX(A:A,INT((ROW(A2)+1)/(COUNTA(B:B)-1))+1)&INDEX(B:B,MOD(ROW(A2)-2,3)+1+1)
now if you want to add a little flag to let you know you have more row than you need for your data you could add the following:
=IF(ROW(A2)-1>(COUNTA(A:A)-1)*(COUNTA(B:B)-1),"Data Exceeded",INDEX(A:A,INT((ROW(A2)+1)/(COUNTA(B:B)-1))+1)&INDEX(B:B,MOD(ROW(A2)-2,3)+1+1))
According to: https://www.extendoffice.com/documents/excel/3097-excel-list-all-possible-combinations.html
You can use this formula:
=IF(ROW()-ROW(**$D$1**)+1>COUNTA(**$A$1:$A$4**)*COUNTA(**$B$1:$B$3**),"",INDEX(**$A$1:$A$4**,INT((ROW()-ROW(**$D$1**))/COUNTA(**$B$1:$B$3**)+1))&INDEX(**$B$1:$B$3**,MOD(ROW()-ROW($D$1),COUNTA(**$B$1:$B$3**))+1))
In the above formula, $A$1:$A$4, are the first column values, and
$B$1:$B$3 are the second list values which you want to list all their
possible combinations, the $D$1 is the cell that you put the formula,
you can change the cell references to your need.
In your case, you should use:
=IF(ROW()-ROW($C$2)+1>COUNTA($A$2:$A$3)*COUNTA($B$2:$B$4),"",INDEX($A$2:$A$3,INT((ROW()-ROW($C$2))/COUNTA($B$2:$B$4)+1))&INDEX($B$2:$B$4,MOD(ROW()-ROW($C$2),COUNTA($B$2:$B$4))+1))

Find value in column, based on 2 criteria

I have a file with 3 columns. Column A contains 300,000 rows, with about 200 separate IDs, all duplicated at least 1,000 times. Column B contains the date for each of the rows. Column C contains the values that I need to extract.
Each of the 200 IDs in Col A can have multiple values (e.g. ID 1234 might have dates 1/1/2001, 1/3/2001, 1/2/2015, etc). Similarly, each date on Col B will have multiple IDs (e.g. 1/2/15 might have IDs of 1234, 1874, 1930, 6043, etc).
In a nutshell, I need to check the values in Col A and Col B to find the relevant ID in Col A and the maximum value in Col B, and return the value in the relevant cell in Col C.
I've looked at Index/Match examples, but they don't seem to be suitable. Is there any suggestions on a macro I could run, that would accomplish what is needed.
Use this array formula:
=INDEX($C$1:$C$300000,MATCH(1,IF(($A$1:$A$300000="1234")*($B$1:$B$300000=MAX(IF($A$1:$A$300000="1234",$B$1:$B$300000))),1,0),0))
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
Change the "1234" to a reference cell with the appropriate ID.
You can accomplish this using array formulas. To start, you can retrieve the maximum date in column B when column A is 1234 using the below formula. Keep in mind that you have to use Ctrl-Shift-Enter when you finish typing an array formula.
{=MAX(IF($A$2:$A$24=1234,$B$2:$B$24))}
Note that you will need to change the ranges to include all of your data, rather than my test data on rows 2-24.
Now that you have a formula to retrieve the max date, you can put that inside an index/match and, again using Ctrl-Shift-Enter, use the below array formula to retrieve the value in column C for a row matching 1234 and the maximum date.
{=INDEX($C$2:$C$24,MATCH(1234&MAX(IF($A$2:$A$24=1234,$B$2:$B$24)),$A$2:$A$24&$B$2:$B$24,0))}

How to compare a list of rows to another list of rows in Excel?

I am trying to figure out if there are any differences between a list of data with another. In order for a row of data to "match" with another row, the row must have the same values in their corresponding column. The rows themselves do not have to be in any particular order. In particular, I am dealing with a parts list, where there are part numbers, descriptions, etc. I am trying to figure out if any rows of data are different from rows of data from another list.
I found Compare two sheets using arrays, which may have the answer to my problem, but I am having trouble figuring out how to adapt to my code due to inexperience in Visual Basic.
I was able to get it to work for a single column of data, comparing one column of data from one sheet to another, but cannot get it to compare entire rows of data.
Here is an example of I want this to work:
Sheet 1 Sheet 2
Column 1 Column 2 Column 1 Column 2
Row 1 22a 33 11 11
Row 2 22a 33a 22a 33
Row 3 55 22b 55 23b
The code in the link will tell you what is not in sheet 1 but in sheet 2 and vice versa. In this example, I would like the code to tell me Sheet 1 Row 2 and Sheet 1 Row 3 are not in Sheet 2, and Sheet 2 Row 1 and Sheet 2 Row 3 are not in Sheet 1 (Sheet 1 Row 1 and Sheet 2 Row 2 match).
If that is ok by you, you can do it without VBA using the following formula:
={IF(IFERROR(MATCH(A1&"|"&B1;Sheet7!$A$1:$A$3&"|"&Sheet7!$B$1:$B$3;0);-1)=-1;"Unique";"")}
Assuming that each of your tables start in A1 (so that the tables with three entries span A1:B3), and entering this formula into C1 (and copying it down), press CTRL+SHIFT+ENTER when entering the formula to create an array formula, this will show the word "Unique" in column C if the pair in that row on that sheet is not in any of the row-pairs on sheet 2.
You can then use conditional formatting to highlight unique rows, filter on the tables to include only unique rows, or some other way of doing what you need.
NOTE 1: I have entered my numbers in Sheet6 and Sheet7 instead of 1 and 2. The formula written above goes into Sheet6.
NOTE 2: My language use ; instead of , as function separator, so if yours use , you need to change that.
NOTE 3: You will need to expand the ranges Sheet7!$A$1:$A$3 and Sheet7!$B$1:$B$3 if your set grows (this will happen automatically if new rows are inserted in between the old ones). The best is still probably to create named ranges for each of the 4 columns, exchange the references with those, and manage the named ranges instead of the formulas.
NOTE 4: If your data set contains the character "|", you need to change that as well, to match some character that you for sure do not have there.
Alternatively you could in column C on each cheet enter (assuming first entry in C1)
=A1&"|"&B1"
and copy this down, then run the solution from your copied example using that C column instead of on A1 and B1.

Highlight cells based on some criteria related column and rows to another table

I have two worksheets: the first one is main (Table 1) and the second is the report (Table 2) generated from values in two columns in the main table. When the conditional formatting is triggered the cell in Table 2 is highlighted:
In Table 1, the primary key is a compound key combining id-year columns.
In Table 2, the report checks whether the related column-row exists in Table 1 and if so the cell is to be highlighted.
How I can achieve it using conditional formatting?
I've some steps that will converted to conditional formatting in cell below:
In selected cell, the paired value year-id from table 2 will be
looked up in Table 1 in the relevant column pairing.
If the related paired-value exists, the cell in Table 2 is highlighted (the color differs between ids) and if not it won't be highlighted.
For Step 1. I can't find the right formula. If there is another solution I'll considered it.
Can Step 2. be achieved with VBA and if so, how?
[updated]
Based on pnuts's suggestion, the problems above I can achieve with some modification to get vary color but recently I get different format value that appear in table 1 that look like "2003-2004". In second table, the related column (2003 & 2004) must be highlighted.
How I can check "-" sign then highlight two related columns?
Assuming Table1 and Table2 are both in Cell B2, one way is to put =Sheet1!C5&Sheet1!D5 in your Table2 sheet in A5 and copy down until a cell appears blank, then apply CF to =$C$6:$M$11 with this rule:
=MATCH($B6&C$5,$A:$A,0)>0
This would only apply one colour throughout (which may be less confusing than 5 or more) but I take it you know how to break this down into separate rules for different colours by restricting the range for each to one row at a time.