How does a SQL statement containing mutiple joins work? - sql

I'm learning joins in my class, but I'm not fully grasping some of the concepts. Can somebody explain how a statement with multiple joins works?
SELECT B.TITLE, O.ORDER#, C.STATE FROM BOOKS B
LEFT OUTER JOIN ORDERITEMS OI ON B.ISBN = OI.ISBN
LEFT OUTER JOIN ORDERS O ON O.ORDER# = OI.ORDER#
LEFT OUTER JOIN CUSTOMERS C ON C.CUSTOMER# = O.CUSTOMER#;
I believe I understand that the BOOKS table is the left table in the first outer join connecting BOOKS and ORDERITEMS. All BOOKS will be shown, even if there is not an ORDERITEM for a book. After the first join, I'm not sure what is really happening.
When ORDERS is joined, which is the left table and which is the right table? The same for Customers. This is where I get lost.

First thing what executor will perform — take a first pair of tables that are eligible to be joined and perform the join. On the following steps, the result of the previous join is treated as a virtual relation, therefore you again have a construct similar to ... FROM virt_tab LEFT JOIN real_tab .... This behavior is based on the closure concept used in Relational Algebra, which means that any operation on the relation produces relation, i.e. operations can be nested. And RDBMS stands for Relational DBMS, take a look at the linked wikipedia article.
So far I find PostgreSQL's docs being most definitive in this matter, take a look at them. In the linked article a generic overview on how joins are performed by the databases is given with some PostrgeSQL-specific stuff, which is expected.

One of my favorite online resources is : http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
As to your question.
All books will be displayed and only those orderitems which match a book
all only those orders which have a related record in orderitems which relate to a book will be displayed
Only customers who have orders with items in the books table will be listed.
So customers who don't have orders would not be listed
Customers who have orders but for items that are not books will NOT be listed
Fun stuff. Hope you enjoy it.
As to your second question: Right/left only matter because of the ORDER of the tables in your from statement. You could make every join a left one if you re-arrange a table order. All right/left do is specify the table from which you want ALL records.
Consider: you could just as easily right your select statement as:
SELECT B.TITLE, O.ORDER#, C.STATE
FROM CUSTOMERS C
RIGHT OUTER JOIN ORDERS O ON C.CUSTOMER# = O.CUSTOMER#
RIGHT OUTER JOIN ORDERITEMS OI ON O.ORDER# = OI.ORDER#
RIGHT OUTER JOIN BOOKS B ON B.ISBN = OI.ISBN
In this case right is saying that I want all the records from the table on the right since books is last in the list you'll get all books and only those ordereditems related to a book, only those orders for which the ordered item was a book and only those customers with orders for ordered items which were books. Thus the left / right are the same except for order. I avoid right joins for readability. I find it easier to go top down when thinking about whats included and what will not be.
Those records which are excluded will have NULL values in these types of joins.
Hope this helps.

Related

SQL query wrong index when where on join

I have a query with joins that is not using the index that would be the best match and I am looking for help to correct this.
I have the following query:
select
equipment.name,purchaselines.description,contacts.name,vendors.accountNumber
from purchaselines
left join vendors on vendors.id = purchaselines.vendorId
left join contacts on contacts.id = vendors.contactId
left join equipment on equipment.id = purchaselines.equipmentId
where contacts.id = 12345
The table purchaselines has an index on the column vendorId, which is the proper index to use. When the query is run, I know the value of contacts.id which is joined to vendors.contactId which is joined to purchaselines.vendorId.
What is the proper way to run this query? Currently, no index is used on the table purchaselines.
If you are intending to query a specific contact, I would put THAT first since that is the primary basis. Additionally, you had left-joins to the other tables (vendors, contacts, equipment). So by having a WHERE clause to the CONTACTS table forces the equation to become an INNER JOIN, thus REQUIRING.
That said, I would try to rewrite the query as (also using aliases for simplified readability of longer table names)
select
equipment.name,
purchaselines.description,
contacts.name,
vendors.accountNumber
from
contacts c
join vendors v
on c.id = v.contactid
join purchaselines pl
on v.id = pl.vendorid
join equipment e
on pl.equipmentid = e.id
where
c.id = 12345
Also notice the indentation of the JOINs helps readability (IMO) to see how/where each table gets to the next in a more hierarchical manner. They are all regular inner JOIN context.
So, the customer ID will be the first / fastest, then to vendors by that contact ID which should optimize the join to that. Then, I would expect the purchase lines to have an index on vendorid optimizing that. And finally, the equipment table on ITs PK.
FEEDBACK Basic JOIN clarification.
JOIN is just the explicit statement of how two tables are related. By listing them left-side and right-side and the join condition showing on what relationship is between them is all.
Now, in your data example, each table is subsequently nested under the one prior. It is quite common though that one table may link to multiple other tables. For example an employee. A customer could have an ethnicity ID linking to an ethnicity lookup table, but also, a job position id also linking to a job position lookup table. That might look something like
select
e.name,
eth.ethnicity,
jp.jobPosition
from
employee e
join ethnicitiy eth
on e.ethnicityid = eth.id
join jobPosition jp
on e.jobPositionID = jp.id
Notice here that both ethnicity and jobPosition are at the same hierarchical level to the employee table scenario. If, for example, you wanted to further apply conditions that you only wanted certain types of employees, you can just add your logical additional conditions directly at the location of the join such as
join jobPosition jp
on e.jobPositionID = jp.id
AND jp.jobPosition = 'Manager'
This would get you a list of only those employees who are managers. You do not need to explictily add a WHERE condition if you already include it directly at the JOIN/ON criteria. This helps keeping the table-specific criteria at the join if you ever find yourself needing LEFT JOINs.

How the SQL query with two left join works?

I need help in understanding how left join is working in below query.
There are total three tables and two left joins.
So my question is, the second left join is between customer and books table or between result of first join and books table?
SELECT c.id, c.first_name, c.last_name, s.date AS sale,
b.name AS book, b.genre
FROM customers c
LEFT JOIN sales s
ON c.id = s.customer_id
LEFT JOIN books b
ON s.book_id = b.id;
Good question.
When it comes to outer-joined tables, it depends on the predicates in the ON clause. The engine is free to reorder the fetch and scans on indexes or tables as long as the predicates are respected.
In this particular case there are three tables:
customers (c)
sales (s)
books (b)
customers is inner joined so it becomes the driving table; there are other considerations, but for simplicity you can consider that this is the table that is read first. Now, which one is second? sales or books?
The first join predicate c.id = s.customer_id doesn't establish any relationship between the secondary tables; therefore it doesn't affect which table is joined first.
The second join predicate s.book_id = b.id makes books dependent on sales. Therefore, it decides sales is the second table, and books is the last one.
A final note: if you understand the concept of dependency there are several dirty tricks you can use to force the engine to walk the tables in the order you want. I would not recommend to do this to a novice, but if at some point you realise the engine is not doing what you want, you can tweak the queries.
The second join statement specifies to join on s.book_id = b.id where s is sales and b is books. However, a record in the books table will not be returned unless it has a corresponding record in the sales AND customers tables, which is what a left join does by definition https://www.w3schools.com/sql/sql_join_left.asp. put another way, this query will return all books that have been purchased by at least one customer (and books that have been purchased multiple times will appear in the results multiple times).

Need assistance with SQL statement having trouble with the JOINS and WHERE clause

The company is performing an analysis of their inventory. They are considering purging books that are not popular with their customers. To do this they need a list of books that have never been purchased. Write a query using a join that provides this information. Your results should include all the book details and the order number column. Sort your results by the book title.
SELECT o.order_nbr, b.*
FROM orders o JOIN books
WHERE
ORDER BY book_title
This is all I could come up with, I'm still learning Joins and struggling to figure out what the correct statement should be. Wasn't sure what to put in the WHERE clause and don't really know how to properly join these tables.
You need an ON clause to specify what you are joining on. Also, your WHERE clause is empty, and you are not specifying the type of JOIN you are using. Looking at the way the tables are set up, the expectation is you are going to join the BOOKS table on ORDER_ITEMS, which also contains ORDER_NBR.
In the question, it's asking to find books with no orders, so correct join would be a LEFT JOIN between BOOKS and ORDER_ITEMS, as that will include every book, even those without orders, which will have an ORDER_NBR of NULL
The SQL would look like
SELECT o.order_nbr, b.*
FROM books b
LEFT JOIN order_items o on b.book_id = o.book_id
WHERE o.order_nbr is null
ORDER BY book_title
This would return only the books with no orders.

Improve SQL query by replacing inner query

I'm trying to simplify this SQL query (I replaced real table names with metaphorical), primarily get rid of the inner query, but I'm brain frozen can't think of any way to do it.
My major concern (aside from aesthetics) is performance under heavy loads
The purpose of the query is to count all books grouping by genre found on any particular shelve where the book is kept (hence the inner query which is effectively telling which shelve to count books on).
SELECT g.name, count(s.book_id) occurances FROM genre g
LEFT JOIN shelve s ON g.shelve_id=s.id
WHERE s.id=(SELECT genre_id FROM book WHERE id=111)
GROUP BY s.genre_id, g.name
It seems like you want to know many books that are on a shelf are in the same genre as book 111: if you liked book "X", we have this many similar books in stock.
One thing I noticed is the WHERE clause in the original required a value for the shelve table, effectively converting it to an INNER JOIN. And speaking of JOINs, you can JOIN instead of the nested select.
SELECT g.name, count(s.book_id) occurances
FROM genre g
INNER JOIN shelve s ON s.id = b.shelve_id
INNER JOIN book b on b.genre_id = s.id
WHERE b.id=111
GROUP BY g.id, g.name
Thinking about it more, I might also start with book rather than genre. In the end, the only reason you need the genre table at all is to find the name, and therefore matching to it by id may be more effective.
SELECT g.name, count(s.book_id) occurances
FROM book b
INNER JOIN shelve s ON s.id = b.genre_id
INNER JOIN genre g on g.shelve_id = s.id
WHERE b.id=111
GROUP BY g.id, g.name
Not sure they meet your idea of "simpler" or not, but they are alternatives.
... unless matching shelve.id with book.genre_id is a typo in the question. It seems very odd the two tables would share the same id values, in which case these will both be wrong.

LEFT JOIN using a link table in PostgreSQL

There are tables people, people_has_book and books. I want all people in my list but not everybody has a book, so I use a LEFT JOIN to link people to people_has_book, but linking people_has_book to books should not be a LEFT JOIN.
How would I do this?
You can use parentheses to prioritize joins. Like:
SELECT *
FROM people p
LEFT JOIN ( people_has_book pb JOIN books b USING (book_id) ) USING (people_id);
This is subtly different from two LEFT JOINs:
SELECT *
FROM people p
LEFT JOIN people_has_book pb USING (people_id)
LEFT JOIN books b USING (book_id);
The latter would show rows from people_has_book even if there is no related entry in books. However, in a classic many-to-many implementation with FK constraints enforcing referential integrity, there is typically no effective difference for your particular query, since all people_has_book.book_id must reference an existing row in books anyway - with the exotic exception of NULL values. (If (people_id, book_id) is the PK of people_has_book, both columns are NOT NULL automatically.)
Related:
Join four tables involving LEFT JOIN without duplicates
How to implement a many-to-many relationship in PostgreSQL?