SQL Inner Join. ON condition vs WHERE clause - sql

I am busy converting a query using the old style syntax to the new join syntax. The essence of my query is as follows :
Original Query
SELECT i.*
FROM
InterestRunDailySum i,
InterestRunDetail ird,
InterestPayments p
WHERE
p.IntrPayCode = 187
AND i.IntRunCode = p.IntRunCode AND i.ClientCode = p.ClientCode
AND ird.IntRunCode = p.IntRunCode AND ird.ClientCode = p.ClientCode
New Query
SELECT i.*
FROM InterestPayments p
INNER JOIN InterestRunDailySum i
ON (i.IntRunCode = p.IntRunCode AND i.ClientCode = p.ClientCode)
INNER JOIN InterestRunDetail ird
ON (ird.IntRunCode = p.IntRunCode AND ird.IntRunCode = p.IntRunCode)
WHERE
p.IntrPayCode = 187
In this example, "Original Query" returns 46 rows, where "New Query" returns over 800
Can someone explain the difference to me? I would have assumed that these queries are identical.

The problem is with your join to InterestRunDetail. You are joining on IntRunCode twice.
The correct query should be:
SELECT i.*
FROM InterestPayments p
INNER JOIN InterestRunDailySum i
ON (i.IntRunCode = p.IntRunCode AND i.ClientCode = p.ClientCode)
INNER JOIN InterestRunDetail ird
ON (ird.IntRunCode = p.IntRunCode AND ird.ClientCode = p.ClientCode)
WHERE
p.IntrPayCode = 187

The "new query" is the one compatible with the current ANSI SQL standard for JOINs.
Also, I find query #2 much cleaner:
you are almost forced to think about and specify the join condition(s) between two tables - you will not accidentally have cartesian products in your query. If you happen to list ten tables, but only six join conditions in your WHERE clause - you'll get a lot more data back than expected!
your WHERE clause isn't cluttered with join conditions and thus it's cleaner, less messy, easier to read and understand
the type of your JOIN (whether INNER JOIN, LEFT OUTER JOIN, CROSS JOIN) is typically a lot easier to see - since you spell it out. With the "old-style" syntax, the difference between those join types is rather hard to see, buried somewhere in your lots of WHERE criteria.....
Functionally, the two are identical - #1 might be deprecated sooner or later by some query engines.
Also see Aaron Bertrand's excellent Bad Habits to Kick - using old-style JOIN syntax blog post for more info - and while you're at it - read all "bad habits to kick" posts - all very much worth it!

Related

Why does Django QuerySet produce a query with 2 inner joins with AND clauses, instead or a single inner join with OR, when chaining filters?

I'm following the Django docs on making queries, and this example showed up:
Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)
I was hoping that the corresponding SQL query to this involved only one inner join and a OR clause, since the corresponding results are entries that either satisfy one or both of the conditions.
Nevertheless, this is what an inspection to the queryset's query returned:
>>> qs = Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)
>>> print(qs.query)
SELECT "blog_blog"."id", "blog_blog"."name", "blog_blog"."tagline"
FROM "blog_blog"
INNER JOIN "blog_entry" ON ("blog_blog"."id" = "blog_entry"."blog_id")
INNER JOIN "blog_entry" T3 ON ("blog_blog"."id" = T3."blog_id")
WHERE ("blog_entry"."headline" LIKE %Lennon% ESCAPE '\'
AND T3."pub_date" BETWEEN 2008-01-01 AND 2008-12-31)
You can see that it makes TWO inner joins.
Wouldn't the result be the same as:
SELECT "blog_blog"."id", "blog_blog"."name", "blog_blog"."tagline"
FROM "blog_blog"
INNER JOIN "blog_entry" ON ("blog_blog"."id" = "blog_entry"."blog_id")
WHERE ("blog_entry"."headline" LIKE %Lennon% ESCAPE '\' OR "blog_entry"."pub_date" BETWEEN 2008-01-01 AND 2008-12-31)
?
And this query is faster.
Well, after some research I've learned a few things.
First, from this answer, the way to implement an OR query is:
Blog.objects.filter(Q(entry__headline__contains='Lennon') | Q(entry__pub_date__year=2008))
And, from this fiddling:
https://www.db-fiddle.com/f/f8SGzTLeyr7DNZUaCx9HVL/0
I realized that the two queries in the OP don't get the same results: in the 2 inner joins, there is a cartesian product of results (2 "Lennon" * 2 "2008"), while in the second one there are 3 results (and it's indeed faster).
I assume, it is because you do filter() twice.
Try
qs = Blog.objects.filter(entry__headline__contains='Hello', entry__pub_date__year=2008)

script optimization in sql

I am running simple select script, which inner join with other 3 table . all the tables are big ( lots of data ) its taking around 20 sec to run. want to optimized it.
I tried to used nolock , but not much deference
SELECT RR.ReportID,
RR.RequestFormat,
RRP.SequenceNumber,
RRP.ParameterName,
RRP.ParameterValue
CASE WHEN RP.ParameterLabelOvrrd IS NULL THEN P.ParameterLabel ELSE .ParameterLabelOvrrd END AS ParameterLabelChosen,
RRP.ParameterValueEntered
FROM ReportRequestParameters AS RRP WITH (NOLOCK)
INNER JOIN ReportRequests AS RR WITH (NOLOCK) ON RRP.RequestID = RR.RequestID
INNER JOIN ReportParameter AS RP WITH (NOLOCK) ON RP.ReportID = RR.ReportID
AND RP.SequenceNumber = RRP.SequenceNumber
INNER JOIN Parameter AS P WITH (NOLOCK) ON P.ParameterID = RP.ParameterID
WHERE RRP.RequestID = '2226765'
ORDER BY SequenceNumber;
Please advice.
This is your query:
SELECT RR.ReportID, RR.RequestFormat, RRP.SequenceNumber,
RRP.ParameterName, RRP.ParameterValue
COALESCE(RP.ParameterLabelOvrrd, P.ParameterLabel) as ParameterLabelChosen,
RRP.ParameterValueEntered
FROM ReportRequestParameters RRP JOIN
ReportRequests RR
ON RRP.RequestID = RR.RequestID JOIN
ReportParameter RP
ON RP.ReportID = RR.ReportID AND
RP.SequenceNumber = RRP.SequenceNumber JOIN
Parameter P
ON P.ParameterID = RP.ParameterID
WHERE RRP.RequestID = 2226765
ORDER BY RRP.SequenceNumber;
I have removed the single quotes on 2226765, assuming that the id is a number. Mixing types can impede the optimizer.
Then, I recommend an index on ReportRequestParameters(RequestID, SequenceNumber). I assume the other tables have indexes on the appropriate columns, but these are:
ReportRequests(RequestID, ReportID, SequenceNumber)
ReportParameter(ReportID, SequenceNumber, ParameterID)
Parameter(ParameterID)
I strongly advise you not to use nolock, unless you know what you are doing. Aaron Bertrand has a good blog post on this subject.
I would suggest running with the execution plan turned on and see if SSMS can advise you on additional indexing.
Other than that your query looks straight-forward, nothing code wise that is going to help make it faster, other than perhaps getting rid of the case statement and definitely getting rid of the NOLOCK statements.

Convert Legacy SQL Outer JOIN *=, =* to ANSI

I need to convert a legacy SQL outer Join to ANSI.
The reason for that being, we're upgrading from a legacy DB instance (2000/5 ?) to SQL 2016.
Legacy SQL query :-
SELECT
--My Data to Select--
FROM counterparty_alias ca1,
counterparty_alias ca2,
counterparty cp,
party p
WHERE cp.code *= ca1.counterparty_code AND
ca1.alias = 'Party1' AND
cp.code *= ca2.counterparty_code AND
ca2.alias = 'Party2' AND
cp.code *= p.child_code AND
cp.category in ('CAT1','CAT2')
Here, Party1 and Party2 Are the party type codes and CAT1 and CAT2 are the category codes. They're just data; I have abstracted it, because the values don't really matter.
Now, when I try to replace the *= with a LEFT OUTER JOIN, I get a huge mismatch on the Data, both in terms of the number of rows, as well as the Data itself.
The query I'm using is this :
What am I doing wrong ?
SELECT
--My Data to Select--
FROM
counterparty cp
LEFT OUTER JOIN counterparty_alias ca1 ON cp.code = ca1.counterparty_code
LEFT OUTER JOIN counterparty_alias ca2 ON cp.code = ca2.counterparty_code
LEFT OUTER JOIN party p ON cp.code = p.child_code
WHERE
ca1.alias = 'Party1' AND
ca2.alias = 'Party2' AND
cp.category in ('CAT1','CAT2')
Clearly , in all the three legacy joins , the cp (counterparty) table is on the Left hand Side of the *=. So that should translate to a LEFT OUTER JOIN WITH all the three tables. However, my solution doesn't seem to to be working
How can I fix this ? What am I doing wrong here ?
Any help would be much appreciated. Thanks in advance :)
EDIT
I also have another query like this :
SELECT
--My Data to Select--
FROM dbo.deal d,
dbo.deal_ccy_option dvco,
dbo.deal_valuation dv,
dbo.strike_modifier sm
WHERE d.deal_id = dvco.deal_id
AND d.deal_id = dv.deal_id
AND dvco.base + dvco.quoted *= sm.ccy_pair
AND d.maturity_date *= sm.expiry_date
In this case, both the dvco and d tables seem to be doing a LEFT OUTER JOIN on the same table sm. How do I proceed about this ?
Maybe join in on the same table and use an alias sm1 and sm2 ?
Or should I use sm as the central table and change the join to RIGHT OUTER JOIN on dvco and d tables ?
I think the problem with your translation is that you are using conditions on the right tables in the where clause instead of in the on clause.
When I tried to translate it, this is the translation I've got:
FROM counterparty cp
LEFT JOIN counterparty_alias ca1 ON cp.code = ca1.counterparty_code
AND ca1.alias = 'Party1'
LEFT JOIN counterparty_alias ca2 ON cp.code *= ca2.counterparty_code
AND ca2.alias = 'Party2'
LEFT JOIN party p ON cp.code = p.child_code
WHERE cp.category in ('CAT1','CAT2')
However, it's hard to know if I'm correct since you didn't provide sample data, desired results, or even a complete query.
If you're doing a conversion, it has been my experience that *= is a RIGHT OUTER JOIN and =* is a LEFT OUTER JOIN in terms of a straight conversion.
I am converting hundreds of stored procs and views now and through testing this is what matches. I run the query as the original first, then make the changes and re-run it with the ANSI compliant code.
The data returned needs to be the same for consistency in our application.
So for your second query I think it would look something like this:
FROM dbo.deal d
INNER JOIN dbo.deal_ccy_option dvco ON d.deal_id = dvco.deal_id
INNER JOIN dbo.deal_valuation dv ON d.deal_id = dv.deal_id
RIGHT OUTER JOIN dbo.strike_modifier sm ON d.maturity_date = sm.expiry_date
AND (dvco.base + dvco.quoted) = sm.ccy_pair
Thanks for the help and sorry for the late post, but I got it to work with a quick hack, using the Query Designer Tool inbuilt in SSMS. It simply refactored all my queries and put in the correct Join, Either Left or Right , and the Where condition as an AND condition on the Join itself, so I was getting the correct data result set for both pre and post, only sometimes the data sorting/ordering was a little off.
I got lost with deadlines and couldnt update with the solution earlier. Thanks again for the help. Hope this helps someone else too !!
Still a little bit unsure though why the ordering/sorting was a little off if the Join condition was the same and the filters as well, because data was a 100 % match.
To get the query Designer to Work , just select your legacy SQL, and
open the Query Designer by pressing Ctrl + Shift + Q or Goto Main Menu
ToolBar => Query => Design Query in Editor.
Thats it. This will refactor your legacy code to new ANSI standards. You wll get the converted query with the new Joins that you can copy and test. Worked 100% of the time for me, except in some cases where the sorting was not matching, which you can check by adding a simple order by clause to both pre and post to compare the data.
For reference, I cross checked with this post :
http://sqlblog.com/blogs/john_paul_cook/archive/2013/03/02/using-the-query-designer-to-convert-non-ansi-joins-to-ansi.aspx

Ignore null values in select statement

I'm trying to retrieve a list of components via my computer_system, BUT if a computer system's graphics card is set to null (I.e. It has an onboard), the row isn't returned by my select statement.
I've been trying to use COALESCE without results. I've also tried with and OR in my WHERE clause, which then just returns my computer system with all different kinds of graphic cards.
Relevant code:
SELECT
computer_system.cs_id,
computer_system.cs_name,
motherboard.name,
motherboard.price,
cpu.name,
cpu.price,
gfx.name,
gfx.price
FROM
public.computer_case ,
public.computer_system,
public.cpu,
public.gfx,
public.motherboard,
public.ram
WHERE
computer_system.cs_ram = ram.ram_id AND
computer_system.cs_cpu = cpu.cpu_id AND
computer_system.cs_mb = motherboard.mb_id AND
computer_system.cs_case = computer_case.case_id AND
computer_system.cs_gfx = gfx.gfx_id; <-- ( OR computer_system.cs_gfx IS NULL)
Returns:
1;"Computer1";"Fractal Design"; 721.00; "MSI Z87"; 982.00; "Core i7 I7-4770K "; 2147.00; "Crucial Gamer"; 1253.00; "ASUS GTX780";3328.00
Should I use Joins? Is there no easy way to say return the requested row, even if there's a bloody NULL value. Been struggling with this for at least 2 hours.
Tables will be posted if needed.
EDIT: It should return a second row:
2;"Computer2";"Fractal Design"; 721.00; "MSI Z87"; 982.00; "Core i7 I7-4770K "; 2147.00; "Crucial Gamer"; 1253.00; "null/nothing";null/nothing
You want a LEFT OUTER JOIN.
First, clean up your code so you use ANSI joins so it's readable:
SELECT
computer_system.cs_id,
computer_system.cs_name,
motherboard.name,
motherboard.price,
cpu.name,
cpu.price,
gfx.name,
gfx.price
FROM
public.computer_system
INNER JOIN public.computer_case ON computer_system.cs_case = computer_case.case_id
INNER JOIN public.cpu ON computer_system.cs_cpu = cpu.cpu_id
INNER JOIN public.gfx ON computer_system.cs_gfx = gfx.gfx_id
INNER JOIN public.motherboard ON computer_system.cs_mb = motherboard.mb_id
INNER JOIN public.ram ON computer_system.cs_ram = ram.ram_id;
Then change the INNER JOIN on public.gfx to a LEFT OUTER JOIN:
LEFT OUTER JOIN public.gfx ON computer_system.cs_gfx = gfx.gfx_id
See PostgreSQL tutorial - joins.
I very strongly recommend reading an introductory tutorial to SQL - at least the PostgreSQL tutorial, preferably some more material as well.
It looks like it's just a bracket placement issue. Pull the null check and the graphics card id comparison into a clause by itself.
...
computer_system.cs_case = computer_case.case_id AND
(computer_system.cs_gfx IS NULL OR computer_system.cs_gfx = gfx.gfx_id)
Additionally, you ask if you should use joins. You are in fact using joins, by virtue of having multiple tables in your FROM clause and specifying the join criteria in the WHERE clause. Changing this to use the JOIN ON syntax might be a little easier to read:
FROM sometable A
JOIN someothertable B
ON A.somefield = B.somefield
JOIN somethirdtable C
ON A.somefield = C.somefield
etc
Edit:
You also likely want to make the join where you expect the null value to be a left outer join:
SELECT * FROM
first_table a
LEFT OUTER JOIN second_table b
ON a.someValue = b.someValue
If there is no match in the join, the row from the left side will still be returned.

SQL joins vs nested query

This two SQL statements return equal results but first one is much slower than the second:
SELECT leading.email, kstatus.name, contacts.status
FROM clients
JOIN clients_leading ON clients.id_client = clients_leading.id_client
JOIN leading ON clients_leading.id_leading = leading.id_leading
JOIN contacts ON contacts.id_k_p = clients_leading.id_group
JOIN kstatus on contacts.status = kstatus.id_kstatus
WHERE (clients.email = 'some_email' OR clients.email1 = 'some_email')
ORDER BY contacts.date DESC;
SELECT leading.email, kstatus.name, contacts.status
FROM (
SELECT *
FROM clients
WHERE (clients.email = 'some_email' OR clients.email1 = 'some_email')
)
AS clients
JOIN clients_leading ON clients.id_client = clients_leading.id_client
JOIN leading ON clients_leading.id_leading = leading.id_leading
JOIN contacts ON contacts.id_k_p = clients_leading.id_group
JOIN kstatus on contacts.status = kstatus.id_kstatus
ORDER BY contacts.date DESC;
But I'm wondering why is it so? It looks like in the firt statement joins are done first and then WHERE clause is applied and in second is just the opposite. But will it behave the same way on all DB engines (I tested it on MySQL)?
I was expecting DB engine can optimize queries like the fors one and firs apply WHERE clause and then make joins.
There are a lot of different reasons this could be (keying etc), but you can look at the explain mysql command to see how the statements are being executed. If you can run that and if it still is a mystery post it.
You always can replace join with nested query... It's always faster but lot messy...