Inner joins vs left joins - sql

I have been using LEFT JOIN in all my queries so far and just came across the ability to specify multiple tables in FROM which gives the same result. I didn't know you could refer to multiple tables in from and do joins in where. I used where strictly for filtering data and kept from to one table.
Where can I learn more about option 2 and when it is more useful? Or should I stick to option 1?
Option 1: LEFT JOIN
select
f.title,
count(r.rental_id)
from rental r
left join inventory i on i.inventory_id=r.inventory_id
left join film f on f.film_id=i.film_id
Group by title
order by count(r.rental_id) desc
Option 2: JOIN in FROM
select
f.title,
count(r.rental_id)
from film f, rental r, inventory i
where f.film_id=i.film_id
and i.inventory_id=r.inventory_id
group by 1
order by count(r.rental_id) desc

Related

3 way right joins in SQL

I don't quite understand my why code works but how does SQL do the joining process with 3 tables?
Heres the code with the table diagram below:
select category.category_id, name, count(film_id)
from films
right join film_category using (film_id)
right join category using (category_id)
group by category.category_id, name
order by count desc, name
I selected films in the 2nd line and right joined it to film_category, can someone confirm that keeps the 'film_category' information if 'films' doesn't contain the same id? Does SQL just magically know it should join 'film_category' with 'category'? Does that mean I can shuffle the order of joins around then?
Thanks
Most people don't recommend using RIGHT [OUTER] JOIN, for they are usually less readable than LEFT [OUTER] JOIN, as you must read the FROM clause backwards. I, too, advise not to use them.
You select from category, outer join film_category and then outer join films. Thus you select all categories with their films keeping categories in the result that have no associated film.
The query is wrong in only two regards:
Semantical: With USING(film_id) the film_id is never null, so COUNT(film_id) will not count zero when there is no film, but one.
Syntactical: The COUNT in ORDER BY lacks its parameter.
The query would be more readable with left outer joins as mentioned. With ON instead of USING and all columns qualified with their table names and with alias names for readability:
select c.category_id, c.name, count(f.film_id)
from category c
left join film_category fc on fc.category_id = c.category_id
left join films f on f.film_id = fc.film_id
group by c.category_id, c.name
order by count(f.film_id) desc, c.name;

Is it true that all joins following a left join in a SQL query must also be left joins? Why or why not?

I remember this rule of thumb from back in college that if you put a left join in a SQL query, then all subsequent joins in that query must also be left joins instead of inner joins, or else you'll get unexpected results. But I don't remember what those results are, so I'm wondering if maybe I'm misremembering something. Anyone able to back me up on this or refute it? Thanks! :)
For instance:
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id -- this inner join might be bad mojo
Not that they have to be. They should be (or perhaps a full join at the end). It is a safer way to write queries and express logic.
Your query is:
select *
from customer c left join
ledger l
on c.id = l.customerid inner join
order o
on l.orderid = o.id
The left join says "keep all customers, even if there is no matching record in ledger. The second says, "I have to have a matching ledger record". So, the inner join converts the first to an inner join.
Because you presumably want all customers, regardless of whether there is a match in the other two tables, you would use a left join:
select *
from customer c left join
ledger l
on c.id = l.customerid left join
order o
on l.orderid = o.id
You remember correctly some parts of it!
The thing is, when you chain join tables like this
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id
The JOIN is executed sequentialy, so when customer left join ledger happens, you are making sure all joined keys from customer return (because it's a left join! and you placed customers to the left).
Next,
The results of the former JOIN are joined with order (using inner join), forcing the "the first join keys" to match (1 to 1) with the keys from order so you will end up only with records that were matched in order table as well
Bad mojo? it really depends on what you are trying to accomplish.
If you want to guarantee all records from customers return, you should keep "left joining" to it.
You can, however, make this a little more intuitive to understand (not necessarily a better way of writing SQL!) by writing:
SELECT * FROM
(
(SELECT * from customer) c
LEFT JOIN
(SELECT * from ledger) l
ON
c.id= l.customerid
) c_and_l
INNER JOIN (OR PERHAPS LEFT JOIN)
(SELECT * FROM order) as o
ON c_and_l.orderid (better use c_and_l.id as you want to refer to customerid from customers table) = o.id
So now you understand that c_and_l is created first, and then joined to order (you can imagine it as 2 tables are joining again)

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

With multiple related authors, how to display only one when joining on tables

I want to return a result set listing all books with a column including the first author (alphabetically for each book).
I've got a SQL query that returns all books, joins on author and orders by book author name but this falls over when I then try to group by book ID to remove duplicates...
SELECT
ct.entry_id AS SKU,
c.channel_title,
ct.title, ct.status,
bat.title AS author_title
FROM exp_store_products sp
JOIN exp_channel_titles ct ON ct.entry_id=sp.entry_id
LEFT JOIN exp_channels c ON c.channel_id=ct.channel_id
LEFT JOIN exp_playa_relationships pl ON pl.parent_entry_id=ct.entry_id
LEFT JOIN exp_channel_titles bat ON bat.entry_id=pl.child_entry_id
WHERE ct.channel_id IN(18,33,43)
ORDER BY bat.title
Am I going to have to run a subquery? I'd rather not if I can avoid it but
I'm at a loss as to why adding:
GROUP BY ct.entry_id
messes things up... :?
You have to break ties between authors to pick a "winner"; max() or min() are two easy ways to do this:
SELECT
ct.entry_id AS SKU,
c.channel_title,
ct.title,
ct.status,
min(bat.title) AS author_title
FROM exp_store_products sp
JOIN exp_channel_titles ct ON ct.entry_id=sp.entry_id
LEFT JOIN exp_channels c ON c.channel_id=ct.channel_id
LEFT JOIN exp_playa_relationships pl ON pl.parent_entry_id=ct.entry_id
LEFT JOIN exp_channel_titles bat ON bat.entry_id=pl.child_entry_id
WHERE ct.channel_id IN(18,33,43)
GROUP BY ct.entry_id, c.channel_title, ct.title, ct.status
Using min() (as here) will give you the alphabetically first name.
You can try to avoid dealing with grouping/aggregation by moving author-title logic to SELECT part.
This solution is more effective than aggregation if you plan to limit your result set ( pagination, lazy feed ). In this case author title will be resolved for visible rows only.
SELECT ct.entry_id AS SKU,
c.channel_title, ct.title, ct.status,
(
SELECT bat.title
FROM exp_channel_titles bat
JOIN exp_playa_relationships pl ON bat.entry_id=pl.child_entry_id
WHERE pl.parent_entry_id=ct.entry_id
ORDER BY bat.title
LIMIT 1
) AS author_title
FROM exp_store_products sp
JOIN exp_channel_titles ct ON ct.entry_id=sp.entry_id
LEFT JOIN exp_channels c ON c.channel_id=ct.channel_id
WHERE ct.channel_id IN(18,33,43)

Outer Joining SQL Tables?

I have three table in the Database -
Activity table with activity_id, activity_type
Category table with category_id, category_name
Link table with mapping between activity_id and category_id
I need to write a select statement to get the following data:
activity_id, activity_type, Category_name.
The issue is some of the activity_id have no entry in the link table.
If I write:
select a.activity_id, a.activity_type, c.category_name
from activity a, category c, link l
where a.activity_id = l.activity_id and c.category_id = l.category_id
then I do not get the data for the activity_ids that are not present in the link table.
I need to get data for all the activities with empty or null value as category_name for those not having any linking for category_id.
Please help me with it.
PS. I am using MS SQL Server DB
I believe you're looking for a LEFT OUTER JOIN for your activity table to return all rows.
SELECT
a.activity_id, a.activity_type, c.category_name
FROM activity a
LEFT OUTER JOIN link l
ON a.activity_id = l.activity_id
LEFT OUTER JOIN category c
ON c.category_id = l.category_id;
You should use proper explicit joins:
select a.activity_id, a.activity_type, c.category_name
from activity a
LEFT JOIN link l
ON a.activity_id = l.activity_id
LEFT JOIN category c
ON l.category_id = c.category_id
If writing this type of logic will be part of your ongoing responsibilities, I would strongly suggest that you do some research on joins, including the interactions between joins and where clauses. Joins and where clauses combine to form the backbone of query writing, regardless of the technology used to retrieve the data.
Most critical join information to understand:
Left Outer Join: retrieves all information from the 'left' table and any records that exist in the joined table
Inner Join: retrieves only records that exist in both tables
Where clauses: used to limit data, regardless of inner or outer join definitions.
In the example you posted, the where clause is limiting your overall data to rows that exist in all 3 tables. Replacing the where clause with appropriate join logic will do the trick:
select a.activity_id, a.activity_type, c.category_name
from activity a
left outer join link l --return all activity rows regardless of whether the link exists
on a.activity_id = l.activity_id
left outer join category c --return all activity rows regardless of whether the link exists
on c.category_id = l.category_id
Best of luck!
What about?
select a.activity_id, a.activity_type, c.category_name from category c
left join link l on c.category_id = l.category_id
left join activity a on l.activity_id = a.activity_id
Actually, the first join seems that it could be an inner join, because you didn't mention that there might be some missing elements there