Outer Joining SQL Tables? - sql

I have three table in the Database -
Activity table with activity_id, activity_type
Category table with category_id, category_name
Link table with mapping between activity_id and category_id
I need to write a select statement to get the following data:
activity_id, activity_type, Category_name.
The issue is some of the activity_id have no entry in the link table.
If I write:
select a.activity_id, a.activity_type, c.category_name
from activity a, category c, link l
where a.activity_id = l.activity_id and c.category_id = l.category_id
then I do not get the data for the activity_ids that are not present in the link table.
I need to get data for all the activities with empty or null value as category_name for those not having any linking for category_id.
Please help me with it.
PS. I am using MS SQL Server DB

I believe you're looking for a LEFT OUTER JOIN for your activity table to return all rows.
SELECT
a.activity_id, a.activity_type, c.category_name
FROM activity a
LEFT OUTER JOIN link l
ON a.activity_id = l.activity_id
LEFT OUTER JOIN category c
ON c.category_id = l.category_id;

You should use proper explicit joins:
select a.activity_id, a.activity_type, c.category_name
from activity a
LEFT JOIN link l
ON a.activity_id = l.activity_id
LEFT JOIN category c
ON l.category_id = c.category_id

If writing this type of logic will be part of your ongoing responsibilities, I would strongly suggest that you do some research on joins, including the interactions between joins and where clauses. Joins and where clauses combine to form the backbone of query writing, regardless of the technology used to retrieve the data.
Most critical join information to understand:
Left Outer Join: retrieves all information from the 'left' table and any records that exist in the joined table
Inner Join: retrieves only records that exist in both tables
Where clauses: used to limit data, regardless of inner or outer join definitions.
In the example you posted, the where clause is limiting your overall data to rows that exist in all 3 tables. Replacing the where clause with appropriate join logic will do the trick:
select a.activity_id, a.activity_type, c.category_name
from activity a
left outer join link l --return all activity rows regardless of whether the link exists
on a.activity_id = l.activity_id
left outer join category c --return all activity rows regardless of whether the link exists
on c.category_id = l.category_id
Best of luck!

What about?
select a.activity_id, a.activity_type, c.category_name from category c
left join link l on c.category_id = l.category_id
left join activity a on l.activity_id = a.activity_id
Actually, the first join seems that it could be an inner join, because you didn't mention that there might be some missing elements there

Related

Inner joins vs left joins

I have been using LEFT JOIN in all my queries so far and just came across the ability to specify multiple tables in FROM which gives the same result. I didn't know you could refer to multiple tables in from and do joins in where. I used where strictly for filtering data and kept from to one table.
Where can I learn more about option 2 and when it is more useful? Or should I stick to option 1?
Option 1: LEFT JOIN
select
f.title,
count(r.rental_id)
from rental r
left join inventory i on i.inventory_id=r.inventory_id
left join film f on f.film_id=i.film_id
Group by title
order by count(r.rental_id) desc
Option 2: JOIN in FROM
select
f.title,
count(r.rental_id)
from film f, rental r, inventory i
where f.film_id=i.film_id
and i.inventory_id=r.inventory_id
group by 1
order by count(r.rental_id) desc

Joining two different columns from one table to the same column in a different table?

I am working on a query that has fields called ios_app_id, android_app_id and app_id.
The app_id from downloads table can be either ios_app_id or android_app_id in the products table.
Is it correct that because of that I cannot just run a simple join of downloads and products table on on p.ios_app_id = d.app_id and then join again on on p.android_app_id = d.app_id? Would that cause an incorrect number of records?
select p.product_id, d.date, d.downloads,
from products p
inner join download d
on p.ios_app_id = d.app_id
UNION
select p.product_id, d.date, d.downloads
from products p
inner join download d
on p.android_app_id = d.app_id
I would try:
select p.product_id, d.date, d.downloads,
from products p
inner join downloads d
on p.ios_app_id = d.app_id
inner join downloads d
on p.android_app_id = d.app_id
Basically I am trying to understand why the union here is needed instead of just joining the two fields twice? Thank you
Just join twice:
select p.product_id,
coalesce(di.date, da.date),
coalesce(di.downloads, da.downloads)
from products p left join
downloads di
on p.ios_app_id = di.app_id left join
downloads da
on p.android_app_id = da.app_id;
This should be more efficient than your method with union. Basically, it attempts joining using the two ids. The coalesce() combines the results into a single column.
Remember that the purpose of an INNER JOIN is to get the values that exists on BOTH sets of data (lets called them table A and table B), using a specific column to join them. In your example, if you try to do the INNER JOIN twice, what would happen is that the first time you execute the INNER JOIN, the complete PRODUCTS table is your table A, and you obtain all the products that have downloaded the ios_app, but now (and this is the key part) this result becomes your new dataset, so it becomes your new table A for the next inner join. And thats the issue, cause what you would want is to join the whole table again, not just the result of the first join, but thats not how it works. This is why you need to use the UNION, cause you need to obtain your results independently and then add them.
An alternative would be to use LEFT JOIN, but you could get null values and duplicates -and its not too "clean"-. So, for your particular case, I think using UNION is much clearer and easier to understand.
If you do left join in first query it will work.
create table all_products as (select p.product_id, d.date, d.downloads,
from products p
left join downloads d
on p.ios_app_id = d.app_id)
select a.product_id, d.date, d.downloads from all_products a left join downloads d
on a.android_app_id = d.app_id inner join

Large Anti JOIN in SQL

I am working with a training database and I am trying to find the best way to create a list of all the users and all the classes they HAVEN'T taken. I have two tables a courses master table of 150 classes and a registration table that is 175K. I can create an anti join for a single person:
SELECT *
FROM tblCourses C --150 rows
Left JOIN tblTraineeCourses TC -- 175K
ON C.CourseId = TC.CourseId AND TraineeId = 2283539
WHERE TC.CourseID IS NULL
And an Anti Join to see what classes have never been taken
SELECT *
FROM tblCourses C --150 rows
Left JOIN tblTraineeCourses TC
ON C.CourseId = TC.CourseId --175K
WHERE TC.CourseID IS NULL
but i am looking for a way to produce a list table of every User ID and the classes they haven't taken.
I fully realize that this will produce a massive table but I need to do it for an analysis
Use a cross join to generate all combinations. Then filter out the ones that exist:
SELECT T.TraineeId, C.CourseId
FROM tblTrainees t CROSs JOIN
tblCourses C LEFT JOIN
tblTraineeCourses TC
ON TC.CourseId = C.CourseId AND
TC.TraineeId = T.TraineeId
WHERE TC.CourseID IS NULL;
This assumes that you have a list of trainees, in tblTrainees. You could replace that with:
(SELECt DISTINCT TraineeId FROM tblTraineeCourses
) t
If you don't have such a table.

Is it true that all joins following a left join in a SQL query must also be left joins? Why or why not?

I remember this rule of thumb from back in college that if you put a left join in a SQL query, then all subsequent joins in that query must also be left joins instead of inner joins, or else you'll get unexpected results. But I don't remember what those results are, so I'm wondering if maybe I'm misremembering something. Anyone able to back me up on this or refute it? Thanks! :)
For instance:
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id -- this inner join might be bad mojo
Not that they have to be. They should be (or perhaps a full join at the end). It is a safer way to write queries and express logic.
Your query is:
select *
from customer c left join
ledger l
on c.id = l.customerid inner join
order o
on l.orderid = o.id
The left join says "keep all customers, even if there is no matching record in ledger. The second says, "I have to have a matching ledger record". So, the inner join converts the first to an inner join.
Because you presumably want all customers, regardless of whether there is a match in the other two tables, you would use a left join:
select *
from customer c left join
ledger l
on c.id = l.customerid left join
order o
on l.orderid = o.id
You remember correctly some parts of it!
The thing is, when you chain join tables like this
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id
The JOIN is executed sequentialy, so when customer left join ledger happens, you are making sure all joined keys from customer return (because it's a left join! and you placed customers to the left).
Next,
The results of the former JOIN are joined with order (using inner join), forcing the "the first join keys" to match (1 to 1) with the keys from order so you will end up only with records that were matched in order table as well
Bad mojo? it really depends on what you are trying to accomplish.
If you want to guarantee all records from customers return, you should keep "left joining" to it.
You can, however, make this a little more intuitive to understand (not necessarily a better way of writing SQL!) by writing:
SELECT * FROM
(
(SELECT * from customer) c
LEFT JOIN
(SELECT * from ledger) l
ON
c.id= l.customerid
) c_and_l
INNER JOIN (OR PERHAPS LEFT JOIN)
(SELECT * FROM order) as o
ON c_and_l.orderid (better use c_and_l.id as you want to refer to customerid from customers table) = o.id
So now you understand that c_and_l is created first, and then joined to order (you can imagine it as 2 tables are joining again)

T-SQL Left-Join with 1 row (limi, subselect)

I already read a lot on that topic but I´m unable to get it to work for my case.
I have the following situation:
A list of orderitems (the main datasets I want to get)
Articles which have a 1:1 relation to an order item
A n:m Jointable "Articlesupplier" which creates a relation between an article and a
partner
A Partner table with detailed information about partners.
Target:
One dataset per OrderItem and from the suppliers I only want to get the first one found in the join. No priorization required.
Tables:
Table IDX_ORDERITEM
id,article_id
Table IDX_ARTICLE
id,name
Table IDX_ARTICLESUPPLIER
article_id,partner_id
Table IDX_PARTNER
id,abbr
My actual statement (short version):
SELECT IDX_ORDERITEM.id
FROM
dbo.IDX_ORDERITEM AS IDX_ORDERITEM
-- ARTICLE --
INNER JOIN dbo.IDX_ARTICLE AS IDX_ARTICLE
ON IDX_ORDERITEM.article_id=IDX_ARTICLE.id
-- SUPPLIER VIA ARTICLE --
LEFT JOIN
(SELECT TOP(1) IDX_PARTNER.id, IDX_PARTNER.abbr
FROM IDX_PARTNER, IDX_ARTICLESUPPLIER
WHERE IDX_PARTNER.id = IDX_ARTICLESUPPLIER.partner_id
AND IDX_ARTICLESUPPLIER.article_id=IDX_ARTICLE.id) AS IDX_PARTNER_SUPPLIER
ON IDX_PARTNER_SUPPLIER.id=IDX_ARTICLE.supplier_partner_id
WHERE 1>0
ORDER BY orderitem.id DESC
But it seems I can´t access IDX_ARTICLE.id in the subquery. I get the following error message:
The multi-part identifier "IDX_ARTICLE.id" could not be bound.
Is the problem that the Article alias has the same name as the table name?
Thanks a lot in advance for possible ideas,
Mike
Well, I changed your aliases, and the subquery to which you were joining (I also modified that subquery so it doesn't use implicit joins anymore), though this changes where mostly cosmetics. The actual important change was the use of OUTER APPLY instead of LEFT JOIN:
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
OUTER APPLY
(SELECT TOP(1) P.id, P.abbr
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
WHERE SUP.article_id = A.id
AND P.id = A.supplier_partner_id) AS PS
ORDER BY OI.id DESC
The error is thrown because the below piece of query
(SELECT TOP(1) IDX_PARTNER.id, IDX_PARTNER.abbr
FROM IDX_PARTNER, IDX_ARTICLESUPPLIER
WHERE IDX_PARTNER.id = IDX_ARTICLESUPPLIER.partner_id
AND IDX_ARTICLESUPPLIER.article_id=IDX_ARTICLE.id) AS IDX_PARTNER_SUPPLIER
cannot be considered as a correlated sub-query and IDX_ARTICLE.id is referenced in it in the same manner we reference a field of outer query in a correlated sub-query.
I see two problems.
According to your DDLs there is no IDX_ARTICLE.supplier_partner_id which you refer to in the left join on clause.
Second, I'm quite sure you cannot use IDX_ARTICLE.id in your derived table. Simply add IDX_ARTICLESUPPLIER.article_id to your derived table selected fields and use it in your left join on clause against IDX_ARTICLE.id.
I prefer to avoid nested queries. If I can, I will always rewrite it using CTE.
WITH Part_Sup
AS (
SELECT TOP ( 1 ) P.id
,P.abbr
,SUP.article_id
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
)
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
LEFT OUTER JOIN Part_Sup AS PS
ON PS.article_id = A.Id
AND PS.id = A.supplier_partner_id
ORDER BY OI.id DESC;
Next I rewritten the first query to use ROW_NUMBER() function instead of using TOP (1) using ROW_NUMBER you can control which results you want and what you don't want.
WITH Part_Sup
AS (
SELECT P.id
,P.abbr
,SUP.article_id
,ROW_NUMBER() OVER ( PARTITION BY P.id, P.abbr ) AS RowNum
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
)
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
LEFT OUTER JOIN Part_Sup AS PS
ON PS.article_id = A.Id
AND PS.id = A.supplier_partner_id
AND RowNum = 1
ORDER BY OI.id DESC;
Thanks Lamak - you solved it :)
I used your input to extract the basic solution to make it a bit easier to read for others which have the same problem:
Using OUTER APPLY (without ORDER_ITEM Table here):
SELECT IDX_ARTICLE.id AS AR_ID, IDX_PARTNER_SUPPLIER.id, IDX_PARTNER_SUPPLIER.abbr
FROM
dbo.IDX_ARTICLE AS IDX_ARTICLE
OUTER APPLY
(SELECT TOP(1) _PARTNER.id, _PARTNER.abbr
FROM IDX_PARTNER AS _PARTNER
INNER JOIN IDX_ARTICLESUPPLIER AS _ARTICLESUPPLIER
ON _PARTNER.id = _ARTICLESUPPLIER.partner_id
WHERE _ARTICLESUPPLIER.article_id=IDX_ARTICLE.id
AND _ARTICLESUPPLIER.deleted IS NULL) AS IDX_PARTNER_SUPPLIER
WHERE IDX_ARTICLE.id=67