Request Optimization (CTE, multiple LEFT JOIN, WHERE with OR) - sql

Can anyone give me advice on how to optimize this request?
More left joins than in this example (20+), principally to get values with foreign key, what optimization is possible?
CTE used to create aggregates but CTE tables are used in principal request, so is it useful?
Where condition with a simple condition on the principal table and a second condition OR with fields of several tables, could it be better to add a column with a max date of the 3 fields and have a simple second condition (without OR)?
SQL Server 2015+
WITH
cte AS
(
SELECT
e_ofcte.id,
SUM(CASE WHEN f_ofcte.lib='G' THEN 1 ELSE 0 END) AS n1,
SUM(CASE WHEN f_ofcte.lib='H' THEN 1 ELSE 0 END) AS n2
FROM e_ofcte
INNER JOIN f_ofcte ON f_ofcte.id=e_ofcte.id
WHERE f_ofcte.lib IN ('G','H')
AND e_ofcte.date>=DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP)-2,1,1)
GROUP BY
e_ofcte.id
)
SELECT
a.id,
b.sid,
c.sid,
cte.n1,
cte.n2
FROM a
LEFT JOIN cte ON a.id=cte.id
LEFT JOIN b ON a.id=b.id
LEFT JOIN c ON a.id=c.id
LEFT JOIN e_ofcte ON a.id=e_ofcte.id
LEFT JOIN i ON a.id=i.id
LEFT JOIN j ON a.id=j.id
LEFT JOIN f_ofcte ON a.id=f_ofcte.id
WHERE a.code='A'
AND
(
a.date>=>=DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP)-2,1,1)
OR
b.date>=>=DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP)-2,1,1)
OR
c.date>=>=DATEFROMPARTS(YEAR(CURRENT_TIMESTAMP)-2,1,1)
)

If you move "OR" conditions to JOIN it will return different results. My answer would be "NO", unless you exactly know what you are doing.
There are multiple possible approaches you can try to fight performance:
Move CTE to temporary table if you can. It makes query smaller?
which will help optimizer to come up with the best plan. Also, you
can tune two parts separately.
Possibly build filtered index on table A(id,date) with "WHERE
code='A'" - that would work if number of filtered records is
relatively small
Possibly build filtered index on table f_ofcte(id) with "WHERE
lib IN ('G','H')"
Build indexes on other tables on (id,date)
Not sure if you provided full query, but it looks like following
part is completely unused:
LEFT JOIN e_ofcte ON a.id=e_ofcte.id
LEFT JOIN i ON a.id=i.id
LEFT JOIN j ON a.id=j.id
LEFT JOIN f_ofcte ON a.id=f_ofcte.id

Related

Difference between Where and Join on Id

I recently saw this query, which finds all the party a client can go to:
SELECT *
FROM Party
INNER JOIN Organizer on Organizer.OrganizerId = Party.OrganizerId
LEFT JOIN Client on Client.ClientID = 1
LEFT JOIN PartyRegistration on PartyRegistration.PartyId = Party.PartyId
WHERE Party.OrganizerId = 0
AND (Party.HasGuestList = 0 OR PartyRegistration.ClientId = Client.ClientId)
I had never seen a join on a specific value before. Is it normal to see SQL code like this?
I don't have much knowledge of left joins but it can apply to any join, for example, how would this:
SELECT *
FROM Party
INNER JOIN Organizer on Organizer.OrganizerId = 0
compare to that since the results are the same:
SELECT *
FROM Party
INNER JOIN Organizer on Organizer.OrganizerId = Party.OrganizerId
WHERE Organizer.OrganizerId = 0
This is very good practice -- in fact, you cannot (easily) get this logic in a WHERE clause.
A LEFT JOIN returns all rows in the first table -- even when there are no matches in the second.
So, this returns all rows in the preceding tables -- and any rows from Client where ClientId = 1. If there is no match on that ClientId, then the columns will be NULL, but the rows are not filtered.
This can only be a matter of good/bad practice if you compare to an alternative. Putting a test in a left join on vs a where does two different things--so it's not a matter of good/bad practice.
If that is the correct left join condition, meaning you want inner join rows on that condition plus unmatched left table rows, then that is the left join condition. It wouldn't go anywhere else.
Learn what left join on returns: inner join on rows plus unmatched left table rows extended by nulls. Always know what inner join you want as part of a left join.
That is true. Left join on a specific value is really bad practice. But some times, we may need to all the column from one table though we don't have common columns to join and required to join by specific condition like A="some value". In this case adding LEFT JOIN on specific condition bad practice, though we can little better way, below is updated code, Please let me know if you have any questions, I would be happy to help you on this.
SELECT *
FROM Party
INNER JOIN Organizer on Organizer.OrganizerId = Party.OrganizerId
LEFT JOIN Client USING(CLIENTID)
LEFT JOIN PartyRegistration on PartyRegistration.PartyId = Party.PartyId
WHERE CLIENTID=1 AND Party.OrganizerId = 0
AND (Party.HasGuestList = 0 OR PartyRegistration.ClientId = Client.ClientId)

How to reduce execution time of a select query

I have a query which is given below. My doubt is regarding its record fetching time. Is there any better way to fetch records than this method?
select product_code,product_name,price,taxPercentage,discount_type,discount_amount,prof it_type,profit_amount,purchase_code, qty
from (
select distinct p.product_code,p.product_name,pid.price,t.percentage as taxPercentage,p.discount_type,p.discount_amount,p.profit_type,p.profit_amount,
pu.purchase_code,pid.quantity+isnull(sum(sri.quantity),0) -isnull(sum(si.quantity),0) -isnull(sum(pri.quantity),0) as qty
from tbl_product p
left join tbl_purchase_item pid on p.product_code=pid.product_code
left join tbl_purchase pu on pu.purchase_code=pid.purchase_code
left join tbl_tax t on t.tax_code=p.tax_code
left join tbl_sale_item si on si.product_code=p.product_code
left join tbl_sale s on s.sale_code=si.sale_code
left join tbl_sale_return sr on sr.sale_code=s.sale_code
left join tbl_sale_return_item sri on sri.sale_return_code=sr.sale_return_code
left join tbl_purchase_return_item pri on pri.purchase_code=pu.purchase_code
group by p.product_code,p.product_name,pid.price,t.percentage,p.discount_type,p.discount_amount,p.profit_type,p.profit_amount,pu.purchase_code,pid.quantity
) as abc
where qty >0
I do not know how your database looks like. You have too many joins and I guess that is the root of the slowness.
First, make sure you have indexed all the columns used in the joins.
If that does not help, try to do some Denormalization. That way you will introduce some redundancy in your database, but the read time will improve.
Join Smaller table with larger table
consider an index on the table

What's the difference between filtering in the WHERE clause compared to the ON clause?

I would like to know if there is any difference in using the WHERE clause or using the matching in the ON of the inner join.
The result in this case is the same.
First query:
with Catmin as
(
select categoryid, MIN(unitprice) as mn
from production.Products
group by categoryid
)
select p.productname, mn
from Catmin
inner join Production.Products p
on p.categoryid = Catmin.categoryid
and p.unitprice = Catmin.mn;
Second query:
with Catmin as
(
select categoryid, MIN(unitprice) as mn
from production.Products
group by categoryid
)
select p.productname, mn
from Catmin
inner join Production.Products p
on p.categoryid = Catmin.categoryid
where p.unitprice = Catmin.mn; // this is changed
Result both queries:
My answer may be a bit off-topic, but I would like to highlight a problem that may occur when you turn your INNER JOIN into an OUTER JOIN.
In this case, the most important difference between putting predicates (test conditions) on the ON or WHERE clauses is that you can turn LEFT or RIGHT OUTER JOINS into INNER JOINS without noticing it, if you put fields of the table to be left out in the WHERE clause.
For example, in a LEFT JOIN between tables A and B, if you include a condition that involves fields of B on the WHERE clause, there's a good chance that there will be no null rows returned from B in the result set. Effectively, and implicitly, you turned your LEFT JOIN into an INNER JOIN.
On the other hand, if you include the same test in the ON clause, null rows will continue to be returned.
For example, take the query below:
SELECT * FROM A
LEFT JOIN B
ON A.ID=B.ID
The query will also return rows from A that do not match any of B.
Take this second query:
SELECT * FROM A
LEFT JOIN B
WHERE A.ID=B.ID
This second query won't return any rows from A that don't match B, even though you think it will because you specified a LEFT JOIN. That's because the test A.ID=B.ID will leave out of the result set any rows with B.ID that are null.
That's why I favor putting predicates in the ON clause rather than in the WHERE clause.
The results are exactly same.
Using "ON" clause is more suggested due to increasing performance of the query.
Instead of requesting the data from tables then filtering, by using on clause, you first filter first data-set and then join the data to other tables. So, lesser data to match and faster result is given.
There is no difference between the above two queries outputs both of them result same.
When you are using On Clause the join operation joins only those rows that matches the codidtion specified on ON Clause
Where as in case of Where Clause, the join opeartion joins all the rows and then filters out based on where condidtion Specified
So, obviously On Clause is more effective and should be preferred over where condidtion

When I add a LEFT OUTER JOIN, the query returns only a few rows

The original query returns 160k rows. When I add the LEFT OUTER JOIN:
LEFT OUTER JOIN Table_Z Z WITH (NOLOCK) ON A.Id = Z.Id
the query returns only 150 rows. I'm not sure what I'm doing wrong.
All I need to do is add a column to the query, which will bring back a code from a different table. The code could be a number or a NULL. I still have to display NULL, hence the reason for the LEFT join. They should join on the "id" columns.
SELECT <lots of stuff> + the new column that I need (called "code").
FROM
dbo.Table_A A WITH (NOLOCK)
INNER JOIN
dbo.Table_B B WITH (NOLOCK) ON A.Id = B.Id AND A.version = B.version
--this is where I added the LEFT OUTER JOIN. with it, the query returns 150 rows, without it, 160k rows.
LEFT OUTER JOIN
Table_Z Z WITH (NOLOCK) ON A.Id = Z.Id
LEFT OUTER JOIN
Table_E E WITH (NOLOCK) ON A.agent = E.agent
LEFT OUTER JOIN
Table_D D WITH (NOLOCK) ON E.location = D.location
AND E.type = 'Organization'
AND D.af_type = 'agent_location'
INNER JOIN
(SELECT X , MAX(Version) AS MaxVersion
FROM LocalTable WITH (NOLOCK)
GROUP BY agemt) P ON E.agent = P.location AND E.Version = P.MaxVersion
Does anyone have any idea what could be causing the issue?
When you perform a LEFT OUTER JOIN between tables A and E, you are maintaining your original set of data from A. That is to say, there is no data, or lack of data, in table E that can reduce the number of rows in your query.
However, when you then perform an INNER JOIN between E and P at the bottom, you are indeed opening yourself up to the possibility of reducing the number of rows returned. This will treat your subsequent LEFT OUTER JOINs like INNER JOINs.
Now, without your exact schema and a set of data to test against, this may or may not be the exact issue you are experiencing. Still, as a general rule, always put your INNER JOINs before your OUTER JOINs. It can make writing queries like this much, much easier. Your most restrictive joins come first, and then you won't have to worry about breaking any of your outer joins later on.
As a quick fix, try changing your last join to P to a LEFT OUTER JOIN, just to see if the Z join works.
You have to be very careful once you start with LEFT JOINs.
Let's suppose this model: You have tables Products, Orders and Customers. Not all products necessarily have been ordered, but every order must have customer entered.
Task: Show all products, and if the product was ordered, list the ordering customers; i.e., product without orders will be shown as one row, product with 10 orders will have 10 rows in the resultset. This calls for a query designed around FROM Products LEFT JOIN Orders.
Now someone could think "OK, Customer is always entered into orders, so I can make inner join from orders to customers". Wrong. Since the table Customers is joined through left-joined table Orders, it has to be left-joined itself... otherwise the inner join will propagate into the previous level(s) and as a result, you will lose all products that have no orders.
That is, once you join any table using LEFT JOIN, any subsequent tables that are joined through this table, need to keep LEFT JOINs. But it does not mean that once you use LEFT JOIN, all joins have to be of that type... only those that are dependent on the first performed LEFT JOIN. It would be perfectly fine to INNER JOIN the table Products with another table Category for example, if you only want to see Products which have a category set.
(Answer is based on this answer: http://www.sqlservercentral.com/Forums/Topic247971-8-1.aspx -> last entry)

Adding more condition while joining or in where which is better?

SELECT C.*
FROM Content C
INNER JOIN ContentPack CP ON C.ContentPackId = CP.ContentPackId
AND CP.DomainId = #DomainId
...and:
SELECT C.*
FROM Content C
INNER JOIN ContentPack CP ON C.ContentPackId = CP.ContentPackId
WHERE CP.DomainId = #DomainId
Is there any performance difference between this 2 queries?
Because both queries use an INNER JOIN, there is no difference -- they're equivalent.
That wouldn't be the case if dealing with an OUTER JOIN -- criteria in the ON clause is applied before the join; criteria in the WHERE is applied after the join.
But your query would likely run better as:
SELECT c.*
FROM CONTENT c
WHERE EXISTS (SELECT NULL
FROM CONTENTPACK cp
WHERE cp.contentpackid = c.contentpackid
AND cp.domainid = #DomainId)
Using a JOIN risks duplicates if there's more than one CONTENTPACK record related to a CONTENT record. And it's pointless to JOIN if your query is not using columns from the table being JOINed to... JOINs are not always the fastest way.
There's no performance difference but I would prefer the inner join because I think it makes very clear what is it that you are trying to join on both tables.