LEFT JOIN using a link table in PostgreSQL - sql

There are tables people, people_has_book and books. I want all people in my list but not everybody has a book, so I use a LEFT JOIN to link people to people_has_book, but linking people_has_book to books should not be a LEFT JOIN.
How would I do this?

You can use parentheses to prioritize joins. Like:
SELECT *
FROM people p
LEFT JOIN ( people_has_book pb JOIN books b USING (book_id) ) USING (people_id);
This is subtly different from two LEFT JOINs:
SELECT *
FROM people p
LEFT JOIN people_has_book pb USING (people_id)
LEFT JOIN books b USING (book_id);
The latter would show rows from people_has_book even if there is no related entry in books. However, in a classic many-to-many implementation with FK constraints enforcing referential integrity, there is typically no effective difference for your particular query, since all people_has_book.book_id must reference an existing row in books anyway - with the exotic exception of NULL values. (If (people_id, book_id) is the PK of people_has_book, both columns are NOT NULL automatically.)
Related:
Join four tables involving LEFT JOIN without duplicates
How to implement a many-to-many relationship in PostgreSQL?

Related

SQL query wrong index when where on join

I have a query with joins that is not using the index that would be the best match and I am looking for help to correct this.
I have the following query:
select
equipment.name,purchaselines.description,contacts.name,vendors.accountNumber
from purchaselines
left join vendors on vendors.id = purchaselines.vendorId
left join contacts on contacts.id = vendors.contactId
left join equipment on equipment.id = purchaselines.equipmentId
where contacts.id = 12345
The table purchaselines has an index on the column vendorId, which is the proper index to use. When the query is run, I know the value of contacts.id which is joined to vendors.contactId which is joined to purchaselines.vendorId.
What is the proper way to run this query? Currently, no index is used on the table purchaselines.
If you are intending to query a specific contact, I would put THAT first since that is the primary basis. Additionally, you had left-joins to the other tables (vendors, contacts, equipment). So by having a WHERE clause to the CONTACTS table forces the equation to become an INNER JOIN, thus REQUIRING.
That said, I would try to rewrite the query as (also using aliases for simplified readability of longer table names)
select
equipment.name,
purchaselines.description,
contacts.name,
vendors.accountNumber
from
contacts c
join vendors v
on c.id = v.contactid
join purchaselines pl
on v.id = pl.vendorid
join equipment e
on pl.equipmentid = e.id
where
c.id = 12345
Also notice the indentation of the JOINs helps readability (IMO) to see how/where each table gets to the next in a more hierarchical manner. They are all regular inner JOIN context.
So, the customer ID will be the first / fastest, then to vendors by that contact ID which should optimize the join to that. Then, I would expect the purchase lines to have an index on vendorid optimizing that. And finally, the equipment table on ITs PK.
FEEDBACK Basic JOIN clarification.
JOIN is just the explicit statement of how two tables are related. By listing them left-side and right-side and the join condition showing on what relationship is between them is all.
Now, in your data example, each table is subsequently nested under the one prior. It is quite common though that one table may link to multiple other tables. For example an employee. A customer could have an ethnicity ID linking to an ethnicity lookup table, but also, a job position id also linking to a job position lookup table. That might look something like
select
e.name,
eth.ethnicity,
jp.jobPosition
from
employee e
join ethnicitiy eth
on e.ethnicityid = eth.id
join jobPosition jp
on e.jobPositionID = jp.id
Notice here that both ethnicity and jobPosition are at the same hierarchical level to the employee table scenario. If, for example, you wanted to further apply conditions that you only wanted certain types of employees, you can just add your logical additional conditions directly at the location of the join such as
join jobPosition jp
on e.jobPositionID = jp.id
AND jp.jobPosition = 'Manager'
This would get you a list of only those employees who are managers. You do not need to explictily add a WHERE condition if you already include it directly at the JOIN/ON criteria. This helps keeping the table-specific criteria at the join if you ever find yourself needing LEFT JOINs.

How the SQL query with two left join works?

I need help in understanding how left join is working in below query.
There are total three tables and two left joins.
So my question is, the second left join is between customer and books table or between result of first join and books table?
SELECT c.id, c.first_name, c.last_name, s.date AS sale,
b.name AS book, b.genre
FROM customers c
LEFT JOIN sales s
ON c.id = s.customer_id
LEFT JOIN books b
ON s.book_id = b.id;
Good question.
When it comes to outer-joined tables, it depends on the predicates in the ON clause. The engine is free to reorder the fetch and scans on indexes or tables as long as the predicates are respected.
In this particular case there are three tables:
customers (c)
sales (s)
books (b)
customers is inner joined so it becomes the driving table; there are other considerations, but for simplicity you can consider that this is the table that is read first. Now, which one is second? sales or books?
The first join predicate c.id = s.customer_id doesn't establish any relationship between the secondary tables; therefore it doesn't affect which table is joined first.
The second join predicate s.book_id = b.id makes books dependent on sales. Therefore, it decides sales is the second table, and books is the last one.
A final note: if you understand the concept of dependency there are several dirty tricks you can use to force the engine to walk the tables in the order you want. I would not recommend to do this to a novice, but if at some point you realise the engine is not doing what you want, you can tweak the queries.
The second join statement specifies to join on s.book_id = b.id where s is sales and b is books. However, a record in the books table will not be returned unless it has a corresponding record in the sales AND customers tables, which is what a left join does by definition https://www.w3schools.com/sql/sql_join_left.asp. put another way, this query will return all books that have been purchased by at least one customer (and books that have been purchased multiple times will appear in the results multiple times).

Select name for authors that havent written a book

We have to select authors that havent written a book but there are 3 different tables which makes me confused about how to write the join expression.
We have tables:
authors: author_id
authorships: author_id, book_id
books: book_id.
Obviously I selected the names from authors and tried inner join but it wont work for me. Help would be appreciated!
Since this sounds like a school assignment I won't give the full answer.
Try using an outer join between authors and authorship. Make sure you retrieve the book I'd from the authorship.
Try to work out what an author who has not published looks like the. You can use this to formulate the query for the answer you are looking for with an appropriate where clause.
This is a good spot to use the LEFT JOIN antipattern:
SELECT a.*
FROM authors a
LEFT JOIN authorships s ON s.author_id = a.author_id
WHERE s.author_id IS NULL
Rationale: when the LEFT JOIN comes up empty, it means that the author has no corresponding record in the authorships table. The WHERE clause filters out on unmatched authors records only (ie authors that have no books). This is called an antipattern because the purpose of a JOIN is usually to match records, whereas here we use it to detect unmatched records.
Its really easy, just check which column seems to be having common value between all this three tables if something is common atleast within two tables then put inner join on those two and an outer join on the uncommon data table.
Remember your Aliases will always matter when you join between different tables, also the ON and WHERE should be properly mentioned.

How does a SQL statement containing mutiple joins work?

I'm learning joins in my class, but I'm not fully grasping some of the concepts. Can somebody explain how a statement with multiple joins works?
SELECT B.TITLE, O.ORDER#, C.STATE FROM BOOKS B
LEFT OUTER JOIN ORDERITEMS OI ON B.ISBN = OI.ISBN
LEFT OUTER JOIN ORDERS O ON O.ORDER# = OI.ORDER#
LEFT OUTER JOIN CUSTOMERS C ON C.CUSTOMER# = O.CUSTOMER#;
I believe I understand that the BOOKS table is the left table in the first outer join connecting BOOKS and ORDERITEMS. All BOOKS will be shown, even if there is not an ORDERITEM for a book. After the first join, I'm not sure what is really happening.
When ORDERS is joined, which is the left table and which is the right table? The same for Customers. This is where I get lost.
First thing what executor will perform — take a first pair of tables that are eligible to be joined and perform the join. On the following steps, the result of the previous join is treated as a virtual relation, therefore you again have a construct similar to ... FROM virt_tab LEFT JOIN real_tab .... This behavior is based on the closure concept used in Relational Algebra, which means that any operation on the relation produces relation, i.e. operations can be nested. And RDBMS stands for Relational DBMS, take a look at the linked wikipedia article.
So far I find PostgreSQL's docs being most definitive in this matter, take a look at them. In the linked article a generic overview on how joins are performed by the databases is given with some PostrgeSQL-specific stuff, which is expected.
One of my favorite online resources is : http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
As to your question.
All books will be displayed and only those orderitems which match a book
all only those orders which have a related record in orderitems which relate to a book will be displayed
Only customers who have orders with items in the books table will be listed.
So customers who don't have orders would not be listed
Customers who have orders but for items that are not books will NOT be listed
Fun stuff. Hope you enjoy it.
As to your second question: Right/left only matter because of the ORDER of the tables in your from statement. You could make every join a left one if you re-arrange a table order. All right/left do is specify the table from which you want ALL records.
Consider: you could just as easily right your select statement as:
SELECT B.TITLE, O.ORDER#, C.STATE
FROM CUSTOMERS C
RIGHT OUTER JOIN ORDERS O ON C.CUSTOMER# = O.CUSTOMER#
RIGHT OUTER JOIN ORDERITEMS OI ON O.ORDER# = OI.ORDER#
RIGHT OUTER JOIN BOOKS B ON B.ISBN = OI.ISBN
In this case right is saying that I want all the records from the table on the right since books is last in the list you'll get all books and only those ordereditems related to a book, only those orders for which the ordered item was a book and only those customers with orders for ordered items which were books. Thus the left / right are the same except for order. I avoid right joins for readability. I find it easier to go top down when thinking about whats included and what will not be.
Those records which are excluded will have NULL values in these types of joins.
Hope this helps.

How to combine two tables, one with 1 row and one with n rows?

I have a database with two tables
One with games
and one with participants
A game is able to have more participants and these are in a different table.
Is there a way to combine these two into one query?
Thanks
You can combine them using the JOIN operator.
Something like
SELECT *
FROM games g
INNER JOIN participants p ON p.gameid = g.gameid
Explanation on JOIN operators
INNER JOIN - Match rows between the two tables specified in the INNER
JOIN statement based on one or more
columns having matching data.
Preferably the join is based on
referential integrity enforcing the
relationship between the tables to
ensure data integrity.
o Just to add a little commentary to the basic definitions
above, in general the INNER JOIN
option is considered to be the most
common join needed in applications
and/or queries. Although that is the
case in some environments, it is
really dependent on the database
design, referential integrity and data
needed for the application. As such,
please take the time to understand the
data being requested then select the
proper join option.
o Although most join logic is based on matching values between
the two columns specified, it is
possible to also include logic using
greater than, less than, not equals,
etc.
LEFT OUTER JOIN - Based on the two tables specified in the join
clause, all data is returned from the
left table. On the right table, the
matching data is returned in addition
to NULL values where a record exists
in the left table, but not in the
right table.
o Another item to keep in mind is that the LEFT and RIGHT OUTER
JOIN logic is opposite of one another.
So you can change either the order of
the tables in the specific join
statement or change the JOIN from left
to right or vice versa and get the
same results.
RIGHT OUTER JOIN - Based on the two tables specified in the join
clause, all data is returned from the
right table. On the left table, the
matching data is returned in addition
to NULL values where a record exists
in the right table but not in the left
table.
Self -Join - In this circumstance, the same table is
specified twice with two different
aliases in order to match the data
within the same table.
CROSS JOIN - Based on the two tables specified in the join clause, a
Cartesian product is created if a
WHERE clause does filter the rows.
The size of the Cartesian product is
based on multiplying the number of
rows from the left table by the number
of rows in the right table. Please
heed caution when using a CROSS JOIN.
FULL JOIN - Based on the two tables specified in the join clause,
all data is returned from both tables
regardless of matching data.
example
table Game has columns (gameName, gameID)
table Participant has columns (participantID, participantName, gameID)
the GameID column is the "link" between the 2 tables. you need a common column you can join between 2 tables.
SELECT gameName, participantName
FROM Game g
JOIN Participat p ON g.gameID = p.gameID
This will return a data set of all games and the participants for those games.
The list of games will be redundant unless you structure it some other way due to multiple participants to that game.
sample data
WOW Bob
WOW Jake
StarCraft2 Neal
Warcraft3 James
Warcraft3 Rich
Diablo Chris