SQL version of VLOOKUP - sql

I am new to SQL and if you have a spare moment, I was wondering whether anybody could help me replicate the Excel Vlookup function in SQL please?
From some research, I am suspecting that it is one of the join functions that I require, however, I don't want to just select data that is contained in both tables - I just want to lookup the value in 1 table against another.
If the data is contained in the lookup table then return the value and if not, just return NULL.
I have given a couple of example tables below to help illustrate my question.
Please note that Products 'C' and 'D' are not in Table2 but they are still in the result table but with NULL value.
Also I have a large number of unique products, so I am not looking for an answer which includes hard-coding, for example; CASE WHEN [Product] = 'A' THEN...
TABLE1
Product Quantity
-------------------
A 10
B 41
D 2
C 5
B 16
A 19
C 17
A 21
TABLE 2
Product Cost
-----------------
A £31.45
B £97.23
RESULT TABLE
Product Quantity Cost
-----------------------------
A 10 £31.45
B 41 £97.23
D 2 NULL
C 5 NULL
B 16 £97.23
A 19 £31.45
C 17 NULL
A 21 £31.45

It looks as if you need an outer join, I'll use a left one in my example:
select t1.Product, t1.Quantity, t2.Cost
from table1 as t1
left outer join table2 as t2
on t1.Product = t2.Product
You can also leave out the outer keyword:
select t1.Product, t1.Quantity, t2.Cost
from table1 as t1
left join table2 as t2
on t1.Product = t2.Product

Here's an updated version of Lennart's answer which really works great.
select *
from table1 as t1
left outer join table2 as t2
on t1.Product = t2.Product
and t2.Product <> ''
left outer join table3 as t3
on t1.Product = t3.Product2
and t3.Product2 <> ''
The point is, you need to exclude rows where the join table column is blank, otherwise you will return way too many rows then table1 has. A true vlookup does not add any rows to the left table.
I even added a third table for effect.

Related

If I left join table2 to table1, how I can I call on the IDs in table1 that don't appear in table2 within a CASE or IF Statement?

I am writing an IF/Case statement that requires me to identify all the Ids from and ID column in Table1 that don't appear in a 2nd table Table2 which is left joined on to Table1 on the ID Column, And based on that IF statement I would like to produce a binary column called Missing with 1s, 0s.
Table1
ID
Region
a
US
b
US
c
Mexico
d
Japan
Table2
ID
Years
a
5
d
10
After joining this is what I have:
ID
Region
Years
a
US
5
b
US
null
c
Mexico
null
d
Japan
10
The final outcome should be:
ID
Region
Years
Missing
a
US
5
0
b
US
null
1
c
Mexico
null
1
d
Japan
10
0
I don't know how to Identify those specific Ids in the IF or CASE statement but the rest of the query I can write. I tried to write
IF(Table1.ID NOT IN Table2.ID, 1, 0) As Missing
but that did not work (some sort of unnest issue)
You may try:
SELECT t1.*, t2.ID IS NULL AS Missing
FROM Table1 t1
LEFT JOIN Table2 t2
ON t2.ID = t1.ID;
Using the IF() function we can try:
SELECT t1.*, IF(t2.ID IS NULL, 1, 0) AS Missing
FROM Table1 t1
LEFT JOIN Table2 t2
ON t2.ID = t1.ID;

SQL Server - Setting multiple columns from another table

I have two tables.
Table 1
ID Code1 Code2 Code3
10 1.1 1.2 1.3
Table 2
Code Group Category
1.1 a cat1
1.2 b cat1
1.3 c cat2
1.4 d cat3
Now I need to get the outputs in two different forms from these two tables tables
Output 1
ID Group1 Group2 Group3
10 a b c
Output 2
ID cat1 cat2 cat3
10 1 1 0
Here the cat1, cat2, cat3 columns are Boolean in nature since the table 1 did not have any code corresponding to cat3 so the value for this is 0.
I was thinking of doing this with case statements but there are about 1000 codes mapped to about 50 categories. Is their a way to do this? I am struggling to come up with a query for this.
First off, I strongly suggest you look into an alternative. This will get messy very fast, as you're essentially treating rows as columns. It doesn't help much that Table1 is already denormalized - though if it really only has 3 columns, it's not that big of a deal to normalize it again.:
CREATE VIEW v_Table1 AS
SELECT Id, Code1 as Code FROM Table1
UNION SELECT Id, Code2 as Code FROM Table1
UNION SELECT Id, Code3 as Code FROM Table1
If we take you second query, it appears you want all possible combinations of ID and Category, and a boolean of whether that combination appears in Table2 (using Code to get back to ID in Table1).
Since there doesn't appear to be a canonical list of ID and Category, we'll generate it:
CREATE VIEW v_AllCategories AS
SELECT DISTINCT ID, Category FROM v_Table1 CROSS JOIN Table2
Getting the list of represented ID and Category is pretty straightforward:
CREATE VIEW v_ReportedCategories AS
SELECT DISTINCT ID, Category FROM Table2
JOIN v_Table1 ON Table2.Code = v_Table1.Code
Put those together, and we can then get the bool to tell us which exists:
CREATE VIEW v_CategoryReports AS
SELECT
T1.ID, T1.Category, CASE WHEN T2.ID IS NULL THEN 0 ELSE 1 END as Reported
FROM v_AllCategories as T1
LEFT OUTER JOIN v_ReportedCategories as T2 ON
T1.ID = T2.ID
AND T1.Category = T2.Category
That gets you your answer in a normalized form:
ID | Category | Reported
10 | cat1 | 1
10 | cat2 | 1
10 | cat3 | 0
From there, you'd need to do a PIVOT to get your Category values as columns:
SELECT
ID,
cat1,
cat2,
cat3
FROM v_CategoryReports
PIVOT (
MAX([Reported]) FOR Category IN ([cat1], [cat2], [cat3])
) p
Since you mentioned over 50 'Categories', I'll assume they're not really 'cat1' - 'cat50'. In which case, you'll need to code gen the pivot operation.
SqlFiddle with a self-contained example.
These answers assume that all 3 codes are available in table 2. If not, then you should use OUTER joins instead of INNER.
Output 1 can be achieved like this:
select t1.ID,
cd1.Group as Group1,
cd2.Group as Group2,
cd3.Group as Group3
from table1 t1
inner join table2 cd1
on t1.Code1 = cd1.Code
inner join table2 cd2
on t1.Code2 = cd2.Code
inner join table2 cd3
on t1.Code3 = cd3.Code
Output 2 is trickier. Since you want a column for every row in Table2, you could write SQL that writes SQL.
Basically start with this base statement:
select t1.ID,
//THE BELOW WILL BE GENERATED ONCE PER ROW
Case when cd1.Category = '' OR
cd2.Category = '' OR
cd3.Category = '' then convert(bit,1) else 0 end as '',
//END GENERATED CODE
from table1 t1
inner join table2 cd1
on t1.Code1 = cd1.Code
inner join table2 cd2
on t1.Code2 = cd2.Code
inner join table2 cd3
on t1.Code3 = cd3.Code
then you can generate the code in the middle like this:
select distinct 'Case when cd1.Category = '''+t2.Category+''' OR
cd2.Category = '''+t2.Category+''' OR
cd3.Category = '''+t2.Category+''' then convert(bit,1) else 0 end as ['+t2.Category+'],'
from table2 t2
Paste those results into the original SQL statement (strip off the trailing comma) and you should be good to go.
We can use the Pivot feature and build the query dynamically. Some what like below:
Query 1
Select * from
(SELECT Id, Code, GroupCode
FROM Table2 join Table1
ON Table1.Code1 = Table2.Code
OR Table1.Code2 = Table2.Code
OR Table1.Code3 = Table2.Code
) ps
PIVOT
(
Max (GroupCode)
FOR Code IN
( [1.1], [1.2], [1.3])
) AS Result
Query 2
Select * from
(SELECT Id, GroupCode, Category
FROM Table2 join Table1
ON Table1.Code1 = Table2.Code
OR Table1.Code2 = Table2.Code
OR Table1.Code3 = Table2.Code
) ps
PIVOT
(
Count (GroupCode)
FOR Category IN
( [cat1], [cat2], [cat3])
) AS Result
Unfortunately your stuck with a bad design for Table1. A better approach would have been to have 3 rows for ID 10.
But, given your current design, your query will look something like this:
SELECT ID, G1.Group Group1, G2.Group Group2, G3.Group Group3
FROM Table1 T1
INNER JOIN Table2 G1 ON T1.Code1 = G1.Code
INNER JOIN Table2 G2 ON T1.Code2 = G2.Code
INNER JOIN Table2 G3 ON T1.Code3 = G3.Code
and
SELECT ID, G1.Category Cat1, G2.Category Cat2, G3.Category Cat3
FROM Table1 T1
INNER JOIN Table2 G1 ON T1.Code1 = G1.Code
INNER JOIN Table2 G2 ON T1.Code2 = G2.Code
INNER JOIN Table2 G3 ON T1.Code3 = G3.Code
The PIVOT and CROSS APPLY keywords within MSSQL would help you out. Though it's not exactly clear what you are trying to accomplish. CROSS APPLY for performing a join on a correlated subquery and displaying different output for each join, and PIVOT for doing a crosstab on your data.
For table 1 it might be easier if you mash it together into a more normalized style.
WITH cteTab1 (Id, Code) AS
(
SELECT Id, Code1 FROM Table1
UNION ALL
SELECT Id, Code2 FROM Table1
UNION ALL
SELECT Id, Code3 FROM Table1)
SELECT *
FROM Table2 INNER JOIN cteTab1 ON Table2.Code = cteTab1.Code

SQL joining 4 tables issue

I have four tables:
T1
ID ID1 TITLE
1 100 TITLE1
2 100 TITLE2
3 100 TITLE3
T2
ID TEXT
1 LONG1
2 LONG2
T3
ID1 ID2
100 200
T4
ID4 ID2 SUBJECT
1 200 A
2 200 B
3 200 C
4 200 D
5 200 E
I want output in this result format:
TITLE TEXT SUBJECT
TITLE1 LONG1 A
TITLE2 LONG2 B
TITLE3 null C
null null D
null null E
So I made this query but it gives me much more results than it should be.On example titles asre displayed more times than just once etc.
SELECT
t1.title,
t2.text,
t4.subject
FROM t1
LEFT OUTER JOIN t2 ON t1.id=t2.id
INNER JOIN t3 ON t1.id1=t3.id1
LEFT OUTER JOIN t4 ON t4.id2=t3.id2
WHERE
t1.id1=100
Thanks for help
Disclaimer: I don't work with DB2. After some browsing through documentation I have found that DB2 supports row_number() and full outer join, but I might easily be wrong.
To get rid of n:m relationship one has to build additional key. In this case simple solution is to add row number to each record in t1 and t4 and use it as join condition. Row_number does just that, produces numbers for groups of data defined by partition by in ascending sequence in order defined by order by.
As there is difference in number of records in t1 and t4, and it is unknown which one always has more records, I use full outer join to join them.
You can see the test (Sql Server version) # Sql Fiddle.
select t1_rn.title,
t2.[text],
t4_rn.subject
from
(
select t1.id,
t1.title,
t1.id1,
t3.id2,
row_number() over(partition by t1.id1
order by id) rn
from t1
inner join t3
on t1.id1 = t3.id1
) t1_rn
full outer join
(
select t4.subject,
t3.id1,
t4.id2,
row_number() over(partition by t4.id2
order by id4) rn
from t4
inner join t3
on t4.id2 = t3.id2
) t4_rn
on t1_rn.id1 = t4_rn.id1
and t1_rn.id2 = t4_rn.id2
and t1_rn.rn = t4_rn.rn
left join t2
on t1_rn.id = t2.id
This kind of work should definitely be done on presentation side of an application, but I believe that software you are using requires already prepared data.
try this :
select t1.title,t2.text,t4.subject
from t4
left join t3
on t4.id2=t3.id2
left join t1
on t1.id1=t3.id1
left join t2
on t1.id=t2.id
where t1.id=100
You should change your tables. Your last join does that to your output -just analyze your query. for every record from T1 you have every record from T4.
Outer joins are guaranteed to replicate rows, instead of matching only the ones you need. You may want to look at this:
http://blog.sqlauthority.com/2009/04/13/sql-server-introduction-to-joins-basic-of-joins/
To understand what the join types are, and how you can use them.
You are looking for a list of subjects, with associated text and title, but this may not be unique; more than one null exist for each of the titles. You want to drive the join from table 4, and get a list of subjects, with associated titles for each.
Looking at your ouput it appears you want all subjects displayed. Knowing this you should first off build everything off this table.
SELECT columns
FROM T4
Next build up your inner joins.
SELECT columns
FROM T4 subjectTable
INNER JOIN T3 mapTable
ON mapTable.ID2 = subjectTable.ID2
When happy with them, add on your optional columns with the outer join.
SELECT columns
FROM T4 subjectTable
INNER JOIN T3 mapTable
ON mapTable.ID2 = subjectTable.ID2
LEFT OUTER JOIN T2 textTable
ON textTable.ID = subjectTable.ID4
LEFT OUTER JOIN T1 titleTable
ON titleTable.ID1 = mapTable.ID1
WHERE
subjectTable.ID = 100;

SQL join format - nested inner joins

I have the following SQL statement in a legacy system I'm refactoring. It is an abbreviated view for the purposes of this question, just returning count(*) for the time being.
SELECT COUNT(*)
FROM Table1
INNER JOIN Table2
INNER JOIN Table3 ON Table2.Key = Table3.Key AND Table2.Key2 = Table3.Key2
ON Table1.DifferentKey = Table3.DifferentKey
It is generating a very large number of records and killing the system, but could someone please explain the syntax? And can this be expressed in any other way?
Table1 contains 419 rows
Table2 contains 3374 rows
Table3 contains 28182 rows
EDIT:
Suggested reformat
SELECT COUNT(*)
FROM Table1
INNER JOIN Table3
ON Table1.DifferentKey = Table3.DifferentKey
INNER JOIN Table2
ON Table2.Key = Table3.Key AND Table2.Key2 = Table3.Key2
For readability, I restructured the query... starting with the apparent top-most level being Table1, which then ties to Table3, and then table3 ties to table2. Much easier to follow if you follow the chain of relationships.
Now, to answer your question. You are getting a large count as the result of a Cartesian product. For each record in Table1 that matches in Table3 you will have X * Y. Then, for each match between table3 and Table2 will have the same impact... Y * Z... So your result for just one possible ID in table 1 can have X * Y * Z records.
This is based on not knowing how the normalization or content is for your tables... if the key is a PRIMARY key or not..
Ex:
Table 1
DiffKey Other Val
1 X
1 Y
1 Z
Table 3
DiffKey Key Key2 Tbl3 Other
1 2 6 V
1 2 6 X
1 2 6 Y
1 2 6 Z
Table 2
Key Key2 Other Val
2 6 a
2 6 b
2 6 c
2 6 d
2 6 e
So, Table 1 joining to Table 3 will result (in this scenario) with 12 records (each in 1 joined with each in 3). Then, all that again times each matched record in table 2 (5 records)... total of 60 ( 3 tbl1 * 4 tbl3 * 5 tbl2 )count would be returned.
So, now, take that and expand based on your 1000's of records and you see how a messed-up structure could choke a cow (so-to-speak) and kill performance.
SELECT
COUNT(*)
FROM
Table1
INNER JOIN Table3
ON Table1.DifferentKey = Table3.DifferentKey
INNER JOIN Table2
ON Table3.Key =Table2.Key
AND Table3.Key2 = Table2.Key2
Since you've already received help on the query, I'll take a poke at your syntax question:
The first query employs some lesser-known ANSI SQL syntax which allows you to nest joins between the join and on clauses. This allows you to scope/tier your joins and probably opens up a host of other evil, arcane things.
Now, while a nested join cannot refer any higher in the join hierarchy than its immediate parent, joins above it or outside of its branch can refer to it... which is precisely what this ugly little guy is doing:
select
count(*)
from Table1 as t1
join Table2 as t2
join Table3 as t3
on t2.Key = t3.Key -- join #1
and t2.Key2 = t3.Key2
on t1.DifferentKey = t3.DifferentKey -- join #2
This looks a little confusing because join #2 is joining t1 to t2 without specifically referencing t2... however, it references t2 indirectly via t3 -as t3 is joined to t2 in join #1. While that may work, you may find the following a bit more (visually) linear and appealing:
select
count(*)
from Table1 as t1
join Table3 as t3
join Table2 as t2
on t2.Key = t3.Key -- join #1
and t2.Key2 = t3.Key2
on t1.DifferentKey = t3.DifferentKey -- join #2
Personally, I've found that nesting in this fashion keeps my statements tidy by outlining each tier of the relationship hierarchy. As a side note, you don't need to specify inner. join is implicitly inner unless explicitly marked otherwise.

How do I Write a SQL Query With a Condition Involving a Second Table?

Table1
...
LogEntryID *PrimaryKey*
Value
ThresholdID - - - Link to the appropriate threshold being applied to this log entry.
...
Table2
...
ThresholdID *PrimaryKey*
Threshold
...
All fields are integers.
The "..." thingies are there to show that these tables hold a lot more imformation than just this. They are set up this way for a reason, and I can't change it at this point.
I need write a SQL statement to select every record from Table1 where the Value field in that particular log record is less than the Threshold field in the linked record of Table2.
I'm newish to SQL, so I know this is a basic question.
If anyone can show me how this SQL statement would be structured, it would be greatly appreciated.
SELECT T1.*
FROM Table1 T1
JOIN Table2 T2 ON T2.ThresholdID = T1.ThresholdID
WHERE T2.Threshold > T1.Value
SELECT t1.*
FROM dbo.Table1 t1 INNER JOIN dbo.Table2 t2 ON t1.ThresholdID = t2.ThresholdID
WHERE t2.Threshold > t1.Value
SELECT * from table1 t1 join table2 t2 on (t1.thresholdId = t2.thresholdId)
where t1.value < t2.threshold;
SELECT t1.LogEntryID, t1.Value, t1.ThresholdID
FROM Table1 t1
INNER JOIN Table2 t2 ON t1.ThresholdID = t2.ThresholdID
WHERE t1.Value < t2.threshold
SELECT * FROM Table1
JOIN Table2
ON table1.ThresholdID = table2.ThresholdID --(assuming table 2 holds the same value to link them together)
WHERE
value < thresholdvalue
A 'JOIN' connects 2 tables based on the 'ON' clause (which can be multipart, using 'AND' and 'OR')
If you have 3 entries in table 2 which share table1's primary key (a one-to-many association) you will receive 3 rows in your result set.
for the tables below, for example:
Table 1:
Key Value
1 Hi
2 Bye
Table 2:
Table1Key 2nd_word
1 You
1 fellow
1 friend
2 now
this query:
SELECT * FROM Table1
JOIN Table2
on table1.key = table2.table1key
gets this result set:
Key Value Table1Key 2nd_word
1 Hi 1 You
1 Hi 1 fellow
1 Hi 1 friend
2 Bye 2 now
Note that JOIN will only return results when there is a match in the 2nd table, it will not return a result if there is no match. You can LEFT JOIN for that (all fields from the second table will be NULL).
JOINs can also be strung together, the result from the previous JOIN is used in place of the original table.