Refactor SQL (workaround RIGHT OUTER JOIN) - sql

Since SQLite does not support RIGHT OUTER JOINS I pose the following challenge (read: invitation to do my work for me):
Refactor this query so it no longer utilises SQLite-unsupported constructs like RIGHT/FULL OUTER JOINs.
SELECT strings.*, translations.text
FROM translations INNER JOIN
language ON translations.language_id = language.id RIGHT OUTER JOIN
strings ON translations.string_id = strings.id
WHERE (language.handle = 'english')
I sense it can be achieved with subqueries or by pivoting the tables and performing a LEFT OUTER JOIN but my attempts have failed; my SQL's not what it used to be.
Here's a query builder outline showing the applicable schema: http://dl.getdropbox.com/u/264612/sql-refactor.PNG
First to crack it gets an e-hug from dekz

The following is untested.
select strings.*, translations.text
from strings left outer join translations
on translations.string_id = strings.id
and translations.language_id = (select id
from language
where language.handle = 'english')
I think this will give you all strings with the matching translation text where a suitable translation exists in English. Is that what you are trying to get?

Intriguing that SQLite allows LEFT OUTER JOINs but not RIGHT OUTER JOINs. Well, since it does allow LEFT OUTER JOINs, you're right, you can just rearrange the join order:
SELECT strings.*, translations.text
FROM strings LEFT OUTER JOIN (
translations INNER JOIN language ON translations.language_id = language.id
) tr ON tr.string_id = strings.id
WHERE (language.handle = 'english')
[EDIT: Applied Blorgbeard's suggestion of naming the joined table to get the query to parse -- hope it works now!]

Related

SQL: JOIN vs LEFT OUTER JOIN?

I have multiple SQL queries that look similar where one uses JOIN and another LEFT OUTER JOIN. I played around with SQL and found that it the same results are returned. The codebase uses JOIN and LEFT OUTER JOIN interchangeably. While LEFT JOIN seems to be interchangeable with LEFT OUTER JOIN, I cannot I cannot seem to find any information about only JOIN. Is this good practice?
Ex Query1 using JOIN
SQL
SELECT
id,
name
FROM
u_users customers
JOIN
t_orders orders
ON orders.status=='PAYMENT PENDING'
Ex. Query2 using LEFT OUTER JOIN
SQL
SELECT
id,
name
FROM
u_users customers
LEFT OUTER JOIN
t_orders orders
ON orders.status=='PAYMENT PENDING'
As previously noted above:
JOIN is synonym of INNER JOIN. It's definitively different from all
types of OUTER JOIN
So the question is "When should I use an outer join?"
Here's a good article, with several great diagrams:
https://www.sqlshack.com/sql-outer-join-overview-and-examples/
The short answer your your question is:
Prefer JOIN (aka "INNER JOIN") to link two related tables. In practice, you'll use INNER JOIN most of the time.
INNER JOIN is the intersection of the two tables. It's represented by the "green" section in the middle of the Venn diagram above.
Use an "Outer Join" when you want the left, right or both outer regions.
In your example, the result set happens to be the same: the two expressions happen to be equivalent.
ALSO: be sure to familiarize yourself with "Show Plan" (or equivalent) for your RDBMS: https://www.sqlshack.com/execution-plans-in-sql-server/
'Hope that helps...
First the theory:
A join is a subset of the left join (all other things equal). Under some circumstances they are identical
The difference is that the left join will include all the tuples in the left hand side relation (even if they don't match the join predicate), while the join will only include the tuples of the left hand side that match the predicate.
For instance assume we have to relations R and S.
Say we have to do R JOIN S (and R LEFT JOIN S) on some predicate p
J = R JOIN S on (p)
Now, identify the tuples of R that are not in J.
Finally, add those tuples to J (padding any attribute in J not in R with null)
This result is the left join:
R LEFT JOIN S (p)
So when all the tuples of the left hand side of the relation are in the JOIN, this result will be identical to the Left Join.
back to you problem:
Your JOIN is very likely to include all the tuples from Users. So the query is the same if you use JOIN or LEFT JOIN.
The two are exactly equivalent, because the WHERE clause turns the LEFT JOIN into an INNER JOIN.
When filtering on all but the first table in a LEFT JOIN, the condition should usually be in the ON clause. Presumably, you also have a valid join condition, connecting the two tables:
SELEC id, name
FROM u_users u LEFT JOIN
t_orders o
ON o.user_id = u.user_id AND o.status = 'PAYMENT PENDING';
This version differs from the INNER JOIN version, because this version returns all users even those with no pending payments.
Both are the same, there is no difference here.
You need to use the ON clause when using Join. It can match any data between two tables when you don't use the ON clause.
This can cause performance issue as well as map unwanted data.
If you want to see the differences you can use "execution plans".
for example, I used the Microsoft AdventureWorks database for the example.
LEFT OUTER JOIN :
LEFT JOIN :
If you use the ON clause as you wrote, there is a possibility of looping.
Example "execution plans" is below.
You can access the correct mapping and data using the ON clause complement.
select
id,
name
from
u_users customers
left outer join
t_orders orders on customers.id = orders.userid
where orders.status=='payment pending'

Odd Outer Joining Structure for a View

I am importing a view from an old database as part of a software upgrade and I saw this FROM clause as I was working.
FROM
t_store_master AS STM
RIGHT OUTER JOIN
dbo.t_shipping AS SHP
INNER JOIN
t_hu_master AS HUM
INNER JOIN
t_stored_item AS STO ON HUM.hu_id = STO.hu_id
ON SHP.shipment_id = HUM.reserved_for
AND SHP.location_id = HUM.location_id
INNER JOIN
dbo.t_carrier AS CAR ON SHP.carrier_code = CAR.carrier_code
LEFT OUTER JOIN
dbo.t_order_detail AS ORD
INNER JOIN
dbo.t_order AS ORM ON ORD.order_number = ORM.order_number
INNER JOIN
dbo.t_pick_detail PDL ON ORD.order_number = PDL.order_number
ON STO.type = PDL.pick_id
ON STM.store_id = HUM.control_number_2
It is my understanding that OUTER JOIN requires an ON clause in order to be interpreted properly by the compiler. However, there must be some edge case scenarios I am unaware of because the code above returns good data without throwing any errors.
After some goggling and reading up on MSDN standards for OUTER JOINs, I am still at a loss for what the LEFT OUTER JOIN and RIGHT OUTER JOIN are doing in this query.
I do know that the multiple ON clauses beneath the INNER JOINs are necessary. There seems to be some kind of hidden or implied mapping being done between the ON clauses and the OUTER JOINs. Past that I cannot tell what the purpose of writing a query this way would be.
Could someone shed some insight on how this works and why it would be written this way?
This is allowed. It makes slightly more sense with parentheses:
FROM t_store_master STM RIGHT OUTER JOIN
(dbo.t_shipping SHP INNER JOIN
(t_hu_master HUM INNER JOIN
t_stored_item STO
ON HUM.hu_id = STO.hu_id
)
ON SHP.shipment_id = HUM.reserved_for AND
SHP.location_id = HUM.location_id
) INNER JOIN
(dbo.t_carrier CAR
ON SHP.carrier_code = CAR.carrier_code LEFT OUTER JOIN
((dbo.t_order_detail ORD INNER JOIN
dbo.t_order ORM
ON ORD.order_number = ORM.order_number
) INNER JOIN
dbo.t_pick_detail PDL
ON ORD.order_number = PDL.order_number
)
ON STO.type = PDL.pick_id
)
ON STM.store_id = HUM.control_number_2
That said, I would recommend never writing a query like this and rewriting the query ASAP if it is in production code. At the very least, add the parentheses!
Parentheses are almost never needed in the FROM clause to express JOINs (there is one case where I do happen to use them). Non-interleaved ON clauses are never needed -- or at least, I have never had occasion to use them or think they were the best way to write a query. But, both are allowed.

Condition on main table in inner join vs in where clause

Is there any reason on an INNER JOIN to have a condition on the main table vs in the WHERE clause?
Example in INNER JOIN:
SELECT
(various columns here from each table)
FROM dbo.MainTable AS m
INNER JOIN dbo.JohnDataRecord AS jdr
ON m.ibID = jdr.ibID
AND m.MainID = #MainId -- question here
AND jdr.SentDate IS NULL
LEFT JOIN dbo.PTable AS p1
ON jdr.RecordID = p1.RecordID
LEFT JOIN dbo.DataRecipient AS dr
ON jdr.RecipientID = dr.RecipientID
(more left joins here)
WHERE
dr.lastRecordID IS NOT NULL;
Query with condition in WHERE clause:
SELECT
(various columns here from each table)
FROM dbo.MainTable AS m
INNER JOIN dbo.JohnDataRecord AS jdr
ON m.ibID = jdr.ibID
AND jdr.SentDate IS NULL
LEFT JOIN dbo.PTable AS p1
ON jdr.RecordID = p1.RecordID
LEFT JOIN dbo.DataRecipient AS dr
ON jdr.RecipientID = dr.RecipientID
(more left joins here)
WHERE
m.MainID = #MainId -- question here
AND dr.lastRecordID IS NOT NULL;
Difference in other similar questions that are more general whereas this is specific to SQL Server.
In the scope of the question,
Is there any reason on an INNER JOIN to have a condition on the main
table vs in the WHERE clause?
This is a STYLE choice for the INNER JOIN.
From a pure style reflection point of view:
While there is no hard and fast rule for STYLE, it is generally observed that this is a less often used style choice. For example that might generally lead to more challenging maintenance such as if someone where to remove the INNER JOIN and all the subsequent ON clause conditions, it would effect the primary table result set, OR make the query more difficult to debug/understand when it is a very complex set of joins.
It might also be noted that this line might be placed on many INNER JOINS further adding to the confusion.

SQL: Chaining Joins Efficiency

I have a query in my WordPress plugin like this:
SELECT users.*, U.`meta_value` AS first_name,M.`meta_value` AS last_name
FROM `nwp_users` AS users
LEFT JOIN `nwp_usermeta` U
ON users.`ID`=U.`user_id`
LEFT JOIN `nwp_usermeta` M
ON users.`ID`=M.`user_id`
LEFT JOIN `nwp_usermeta` C
ON users.`ID`=C.`user_id`
WHERE U.meta_key = 'first_name'
AND M.meta_key = 'last_name'
AND C.meta_key = 'nwp_capabilities'
ORDER BY users.`user_login` ASC
LIMIT 0,10
I'm new to using JOIN and I'm wondering how efficient it is to use so many JOIN in one query. Is it better to split it up into multiple queries?
The database schema can be found here.
JOIN usually isn't so bad if the keys are indexed. LEFT JOIN is almost always a performance hit and you should avoid it if possible. The difference is that LEFT JOIN will join all rows in the joined table even if the column you're joining is NULL. While a regular (straight) JOIN just joins the rows that match.
Post your table structure and we can give you a better query.
See this comment:
http://forums.mysql.com/read.php?24,205080,205274#msg-205274
For what it's worth, to find out what MySQL is doing and to see if you have indexed properly, always check the EXPLAIN plan. You do this by putting EXPLAIN before your query (literally add the word EXPLAIN before the query), then run it.
In your query, you have a filter AND C.meta_key = 'nwp_capabilities' which means that all the LEFT JOINs above it can be equally written as INNER JOINs. Because if the LEFT JOINS fail (LEFT OUTER is intended to preserve the results from the left side), the result will 100% be filtered out by the WHERE clause.
So a more optimal query would be
SELECT users.*, U.`meta_value` AS first_name,M.`meta_value` AS last_name
FROM `nwp_users` AS users
JOIN `nwp_usermeta` U
ON users.`ID`=U.`user_id`
JOIN `nwp_usermeta` M
ON users.`ID`=M.`user_id`
JOIN `nwp_usermeta` C
ON users.`ID`=C.`user_id`
WHERE U.meta_key = 'first_name'
AND M.meta_key = 'last_name'
AND C.meta_key = 'nwp_capabilities'
ORDER BY users.`user_login` ASC
LIMIT 0,10
(note: "JOIN" (alone) = "INNER JOIN")
Try explaining the query to see what is going on and if your select if optimized. If you haven't used explain before read some tutorials:
http://www.learn-mysql-tutorial.com/OptimizeQueries.cfm
http://www.databasejournal.com/features/mysql/article.php/1382791/Optimizing-MySQL-Queries-and-Indexes.htm

Queries that implicit SQL joins can't do?

I've never learned how joins work but just using select and the where clause has been sufficient for all the queries I've done. Are there cases where I can't get the right results using the WHERE clause and I have to use a JOIN? If so, could someone please provide examples? Thanks.
Implicit joins are more than 20 years out-of-date. Why would you even consider writing code with them?
Yes, they can create problems that explicit joins don't have. Speaking about SQL Server, the left and right join implicit syntaxes are not guaranteed to return the correct results. Sometimes, they return a cross join instead of an outer join. This is a bad thing. This was true even back to SQL Server 2000 at least, and they are being phased out, so using them is an all around poor practice.
The other problem with implicit joins is that it is easy to accidentally do a cross join by forgetting one of the where conditions, especially when you are joining too many tables. By using explicit joins, you will get a syntax error if you forget to put in a join condition and a cross join must be explicitly specified as such. Again, this results in queries that return incorrect values or are fixed by using distinct to get rid of the cross join which is inefficient at best.
Moreover, if you have a cross join, the maintenance developer who comes along in a year to make a change doesn't know if it was intended or not when you use implicit joins.
I believe some ORMs also now require explicit joins.
Further, if you are using implied joins because you don't understand how joins operate, chances are high that you are writing code that, in fact, does not return the correct result because you don't know how to evaluate what the correct result would be since you don't understand what a join is meant to do.
If you write SQL code of any flavor, there is no excuse for not thoroughly understanding joins.
Yes. When doing outer joins. You can read this simple article on joins. Joins are not hard to understand at all so you should start learning (and using them where appropriate) right away.
Are there cases where I can't get the right results using the WHERE clause and I have to use a JOIN?
Any time your query involves two or more tables, a join is being used. This link is great for showing the differences in joins with pictures as well as sample result sets.
If the join criteria is in the WHERE clause, then the ANSI-89 JOIN syntax is being used. The reason for the newer JOIN syntax in the ANSI-92 format, is that it made LEFT JOIN more consistent across various databases. For example, Oracle used (+) on the side that was optional while in SQL Server you had to use =*.
Implicit join syntax by default uses Inner joins. It is sometimes possible to modify the implicit join syntax to specify outer joins, but it is vendor dependent in my experience (i know oracle has the (-) and (+) notation, and I believe sqlserver uses *= ). So, I believe your question can be boiled down to understanding the differences between inner and outer joins.
We can look at a simple example for an inner vs outer join using a simple query..........
The implicit INNER join:
select a.*, b.*
from table a, table b
where a.id = b.id;
The above query will bring back ONLY rows where the 'a' row has a matching row in 'b' for it's 'id' field.
The explicit OUTER JOIN:
select * from
table a LEFT OUTER JOIN table b
on a.id = b.id;
The above query will bring back EVERY row in a, whether or not it has a matching row in 'b'. If no match exists for 'b', the 'b' fields will be null.
In this case, if you wanted to bring back EVERY row in 'a' regardless of whether it had a corresponding 'b' row, you would need to use the outer join.
Like I said, depending on your database vendor, you may still be able to use the implicit join syntax and specify an outer join type. However, this ties you to that vendor. Also, any developers not familiar wit that specialized syntax may have difficulty understanding your query.
Any time you want to combine the results of two tables you'll need to join them. Take for example:
Users table:
ID
FirstName
LastName
UserName
Password
and Addresses table:
ID
UserID
AddressType (residential, business, shipping, billing, etc)
Line1
Line2
City
State
Zip
where a single user could have his home AND his business address listed (or a shipping AND a billing address), or no address at all. Using a simple WHERE clause won't fetch a user with no addresses because the addresses are in a different table. In order to fetch a user's addresses now, you'll need to do a join as:
SELECT *
FROM Users
LEFT OUTER JOIN Addresses
ON Users.ID = Addresses.UserID
WHERE Users.UserName = "foo"
See http://www.w3schools.com/Sql/sql_join.asp for a little more in depth definition of the different joins and how they work.
Using Joins :
SELECT a.MainID, b.SubValue AS SubValue1, b.SubDesc AS SubDesc1, c.SubValue AS SubValue2, c.SubDesc AS SubDesc2
FROM MainTable AS a
LEFT JOIN SubValues AS b ON a.MainID = b.MainID AND b.SubTypeID = 1
LEFT JOIN SubValues AS c ON a.MainID = c.MainID AND b.SubTypeID = 2
Off-hand, I can't see a way of getting the same results as that by using a simple WHERE clause to join the tables.
Also, the syntax commonly used in WHERE clauses to do left and right joins (*= and =*) is being phased out,
Oracle supports LEFT JOIN and RIGHT JOIN using their special join operator (+) (and SQL Server used to support *= and =* on join predicates, but no longer does). But a simple FULL JOIN can't be done with implicit joins alone:
SELECT f.title, a.first_name, a.last_name
FROM film f
FULL JOIN film_actor fa ON f.film_id = fa.film_id
FULL JOIN actor a ON fa.actor_id = a.actor_id
This produces all films and their actors including all the films without actor, as well as the actors without films. To emulate this with implicit joins only, you'd need unions.
-- Inner join part
SELECT f.title, a.first_name, a.last_name
FROM film f, film_actor fa, actor a
WHERE f.film_id = fa.film_id
AND fa.actor_id = a.actor_id
-- Left join part
UNION ALL
SELECT f.title, null, null
FROM film f
WHERE NOT EXISTS (
SELECT 1
FROM film_actor fa
WHERE fa.film_id = f.film_id
)
-- Right join part
UNION ALL
SELECT null, a.first_name, a.last_name
FROM actor a
WHERE NOT EXISTS (
SELECT 1
FROM film_actor fa
WHERE fa.actor_id = a.actor_id
)
This will quickly become very inefficient both syntactically as well as from a performance perspective.