Query vs query - why is this one quicker than the other?

Query vs query - why is this one quicker than the other? - sql

I have the following two queries - the original is the first and the second is my slight "upgrade". The first takes nearly a second to run and the second finishes before I can get my finger completely off of the refresh button.
My question: Why?
The only difference between the first and second is that the first uses coalesce to get a value to compare berp.ID_PART_SESSION with and the second uses a union to put two select statements together to accomplish the same thing.
I still think the first one should be quicker (the original reason why I used coalesce) since it seems like it should be doing less work to get to the same result. Considering how weak I am deciphering execution plans, could someone please explain why the second query is so much better than the first?
declare #animator varchar
SELECT TOP 1 #animator = FULL_NAME
FROM T_BERP berp
INNER JOIN dbo.T_INTERV i ON i.ID_INTERV = berp.ID_INTERV
WHERE berp.VERSION = 1
AND berp.PRINCIPAL = 1
AND berp.DELETED = 0
AND berp.CANCELLED = 0
AND berp.ID_PART_SESSION = (
select coalesce(pss.ID_PART_SESSION, psst.ID_PART_SESSION)
from t_bersp b
LEFT JOIN T_PART_SESSION pss ON b.ID_PART_SESSION = pss.ID_PART_SESSION
LEFT JOIN T_PSS_TEMP psst ON b.ID_PSS_TEMP = psst.ID_PSS_TEMP
where ID_BERSP = 4040)
vs
declare #animator varchar
SELECT TOP 1 #animator = FULL_NAME
FROM dbo.T_BERP berp
INNER JOIN dbo.T_INTERV i ON i.ID_INTERV = berp.ID_INTERV
WHERE berp.VERSION = 1
AND berp.PRINCIPAL = 1
AND berp.DELETED = 0
AND berp.CANCELLED = 0
AND berp.ID_PART_SESSION IN (
select pss.ID_PART_SESSION
from dbo.t_bersp b
LEFT JOIN dbo.T_PART_SESSION pss ON b.ID_PART_SESSION = pss.ID_PART_SESSION
where ID_BERSP = 4040
union
select psst.ID_PART_SESSION
from dbo.t_bersp b
LEFT JOIN dbo.T_PSS_TEMP psst ON b.ID_PSS_TEMP = psst.ID_PSS_TEMP
where ID_BERSP = 4040)

It would be difficult to provide a definitive answer without understanding the relative sizes and indices of the various tables in your queries. One possibility: if t_part_session and t_pss_temp are both large, the query optimizer might be doing something inefficient with the two LEFT JOINs in the inner SELECT of your first query.
EDIT to clarify: Yes there are LEFT JOINs in both queries, but my speculation was that having two together (query 1) might adversely affect performance vs the UNION (query 2). Sorry if that wasn't clear initially.
Also, I highly recommend a tool such as the Instant SQL Formatter (combined with the {} icon in StackOverflow's editor) to make the queries in your question easier to read:
DECLARE #animator VARCHAR
SELECT TOP 1 #animator = full_name
FROM t_berp berp
INNER JOIN dbo.t_interv i
ON i.id_interv = berp.id_interv
WHERE berp.version = 1
AND berp.principal = 1
AND berp.deleted = 0
AND berp.cancelled = 0
AND berp.id_part_session = (SELECT
Coalesce(pss.id_part_session, psst.id_part_session)
FROM t_bersp b
LEFT JOIN t_part_session pss
ON b.id_part_session =
pss.id_part_session
LEFT JOIN t_pss_temp psst
ON b.id_pss_temporaire =
psst.id_pss_temporaire
WHERE id_bersp = 4040)
vs
DECLARE #animator VARCHAR
SELECT TOP 1 #animator = full_name
FROM dbo.t_berp berp
INNER JOIN dbo.t_interv i
ON i.id_interv = berp.id_interv
WHERE berp.version = 1
AND berp.principal = 1
AND berp.deleted = 0
AND berp.cancelled = 0
AND berp.id_part_session IN (SELECT pss.id_part_session
FROM dbo.t_bersp b
LEFT JOIN dbo.t_part_session pss
ON b.id_part_session =
pss.id_part_session
WHERE id_bersp = 4040
UNION
SELECT psst.id_part_session
FROM dbo.t_bersp b
LEFT JOIN dbo.t_pss_temp psst
ON b.id_pss_temporaire =
psst.id_pss_temporaire
WHERE id_bersp = 4040)

I'm betting it's the coalesce statement. I believe that coalesce will end up getting applied before the where clause. So, it's actually going through each combination of the two tables, and THEN filtering on those that match the where clause.

You can put both queries into the same batch in SSMS and show execution plan - this will not only let you see them side by side, it will show a relative cost.
I suspect that the IN (UNION) means the second can be easily parallelized.

Related

Need help in optimizing sql query

I am new to sql and have created the below sql to fetch the required results.However the query seems to take ages in running and is quite slow. It will be great if any help in optimization is provided.
Below is the sql query i am using:
SELECT
Date_trunc('week',a.pair_date) as pair_week,
a.used_code,
a.used_name,
b.line,
b.channel,
count(
case when b.sku = c.sku then used_code else null end
)
from
a
left join b on a.ma_number = b.ma_number
and (a.imei = b.set_id or a.imei = b.repair_imei
)
left join c on a.used_code = c.code
group by 1,2,3,4,5

I would rewrite the query as:
select Date_trunc('week',a.pair_date) as pair_week,
a.used_code, a.used_name, b.line, b.channel,
count(*) filter (where b.sku = c.sku)
from a left join
b
on a.ma_number = b.ma_number and
a.imei in ( b.set_id, b.repair_imei ) left join
c
on a.used_code = c.code
group by 1,2,3,4,5;
For this query, you want indexes on b(ma_number, set_id, repair_imei) and c(code, sku). However, this doesn't leave much scope for optimization.
There might be some other possibilities, depending on the tables. For instance, or/in in the on clause is usually a bad sign -- but it is unclear what your intention really is.

Why do multiple EXISTS break a query

I am attempting to include a new table with values that need to be checked and included in a stored procedure. Statement 1 is the existing table that needs to be checked against, while statement 2 is the new table to check against.
I currently have 2 EXISTS conditions that function independently and produce the results I am expecting. By this I mean if I comment out Statement 1, statement 2 works and vice versa. When I put them together the query doesn't complete, there is no error but it times out which is unexpected because each statement only takes a few seconds.
I understand there is likely a better way to do this but before I do, I would like to know why I cannot seem to do multiple exists statements like this? Are there not meant to be multiple EXISTS conditions in the WHERE clause?
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
OR
(
--Statement 2
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
)
EDIT: I have included the query details. Table 1-5 represent different tables, there are no repeated tables.

Too long to comment.
Your query as written seems correct. The timeout will only be able to be troubleshot from the execution plan, but here are a few things that could be happening or that you could benefit from.
Parameter sniffing on #Date. Try hard-coding this value and see if you still get the same slowness
No covering index on P.OTHER_ID or P.DATE or P.ID or SA.ID which would cause a table scan for these predicates
Indexes for the above columns which aren't optimal (including too many columns, etc)
Your query being serial when it may benefit from parallelism.
Using the LOWER function on a database which doesn't have a case sensitive collation (most don't, though this function doesn't slow things down that much)
You have a bad query plan in cache. Try adding OPTION (RECOMPILE) at the bottom so you get a new query plan. This is also done when comparing the speed of two queries to ensure they aren't using cached plans, or one isn't when another is which would skew the results.
Since your query is timing out, try including the estimated execution plan and post it for us at past the plan

I found putting 2 EXISTS in the WHERE condition made the whole process take significantly longer. What I found fixed it was using UNION and keeping the EXISTS in separate queries. The final result looked like the following:
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
UNION
--Statement 2
SELECT *
FROM table1 S
WHERE
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)

How to work with Multiple Joins without Duplicate Data SQL

I'm having an issue working with SQL data where once I have completed muptiple joins I am getting duplicate data.
Here is the code written for
SELECT RPPlannedLabor.PeriodHrs, RPPlannedLabor.StartDate, (RPAssignment.WBS1 + ' ' + PR.Name) AS 'WBS1', RPAssignment.WBS2, EM.FirstName, EM.LastName, EM.TKGroup, (EM.FirstName + ' ' + EM.LastName) AS 'Full Name'
FROM RPPlannedLabor
LEFT OUTER JOIN RPAssignment
ON RPPlannedLabor.AssignmentID = RPAssignment.AssignmentID
AND RPAssignment.WBS1 IS NOT NULL
LEFT OUTER JOIN EM
ON RPAssignment.ResourceID = EM.Employee
AND EM.Status = 'a'
LEFT OUTER JOIN PR
ON ((RPAssignment.WBS1 = PR.WBS1)
AND (ISNULL(RPAssignment.WBS2,0) = ISNULL(PR.WBS2,0))
AND (ISNULL(RPAssignment.WBS3,0) = ISNULL(PR.WBS3,0)))
AND PR.Sublevel = 'Y'
Any help would be greatly appreciated :)

I'd have to guess your isnull portions in the join is finding a bunch of null fields and cross joining, but thats just a guess. Data issues like this can't really be solved on a code help forum, best I can do is teach you to trouble shoot.
Run this and get the row count:
SELECT count(1)
FROM RPPlannedLabor
Run this
SELECT count(1)
FROM RPPlannedLabor
LEFT OUTER JOIN RPAssignment
ON RPPlannedLabor.AssignmentID = RPAssignment.AssignmentID
AND RPAssignment.WBS1 IS NOT NULL
Compare with first query...if count increase, your duplicate is on this first join.
Doesn't increase? Keep iterating, run this:
SELECT count(1)
FROM RPPlannedLabor
LEFT OUTER JOIN RPAssignment
ON RPPlannedLabor.AssignmentID = RPAssignment.AssignmentID
AND RPAssignment.WBS1 IS NOT NULL
LEFT OUTER JOIN EM
ON RPAssignment.ResourceID = EM.Employee
AND EM.Status = 'a'
Compare to your count above. Are there more records or is it the same? more records means this last join we added is causing them. If not...my guess is this here is causing the duplicates:
LEFT OUTER JOIN PR
ON ((RPAssignment.WBS1 = PR.WBS1)
AND (ISNULL(RPAssignment.WBS2,0) = ISNULL(PR.WBS2,0))
AND (ISNULL(RPAssignment.WBS3,0) = ISNULL(PR.WBS3,0)))
AND PR.Sublevel = 'Y'
If you are joining on fields with isnull functions, odds are there are nulls and potentially more than one...but I might be off as your data issue could be anywhere.

Refactoring slow SQL query

I currently have this very very slow query:
SELECT generators.id AS generator_id, COUNT(*) AS cnt
FROM generator_rows
JOIN generators ON generators.id = generator_rows.generator_id
WHERE
generators.id IN (SELECT "generators"."id" FROM "generators" WHERE "generators"."client_id" = 5212 AND ("generators"."state" IN ('enabled'))) AND
(
generators.single_use = 'f' OR generators.single_use IS NULL OR
generator_rows.id NOT IN (SELECT run_generator_rows.generator_row_id FROM run_generator_rows)
)
GROUP BY generators.id;
An I'm trying to refactor it/improve it with this query:
SELECT g.id AS generator_id, COUNT(*) AS cnt
from generator_rows gr
join generators g on g.id = gr.generator_id
join lateral(select case when exists(select * from run_generator_rows rgr where rgr.generator_row_id = gr.id) then 0 else 1 end as noRows) has on true
where g.client_id = 5212 and "g"."state" IN ('enabled') AND
(g.single_use = 'f' OR g.single_use IS NULL OR has.norows = 1)
group by g.id
For reason it doesn't quite work as expected(It returns 0 rows). I think I'm pretty close to the end result but can't get it to work.
I'm running on PostgreSQL 9.6.1.

This appears to be the query, formatted so I can read it:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id
WHERE gr.generators_id IN (SELECT g.id
FROM generators g
WHERE g.client_id = 5212 AND
g.state = 'enabled'
) AND
(g.single_use = 'f' OR
g.single_use IS NULL OR
gr.id NOT IN (SELECT rgr.generator_row_id FROM run_generator_rows rgr)
)
GROUP BY gr.generators_id;
I would be inclined to do most of this work in the FROM clause:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id JOIN
generators gg
on g.id = gg.id AND
gg.client_id = 5212 AND gg.state = 'enabled' LEFT JOIN
run_generator_rows rgr
ON g.id = rgr.generator_row_id
WHERE g.single_use = 'f' OR
g.single_use IS NULL OR
rgr.generator_row_id IS NULL
GROUP BY gr.generators_id;
This does make two assumptions that I think are reasonable:
generators.id is unique
run_generator_rows.generator_row_id is unique
(It is easy to avoid these assumptions, but the duplicate elimination is more work.)
Then, some indexes could help:
generators(client_id, state, id)
run_generator_rows(id)
generator_rows(generators_id)

Generally avoid inner selects as in
WHERE ... IN (SELECT ...)
as they are usually slow.
As it was already shown for your problem it's a good idea to think of SQL as of set- theory.
You do NOT join tables on their sole identity:
In fact you take (SQL does take) the set (- that is: all rows) of the first table and "multiply" it with the set of the second table - thus ending up with n times m rows.
Then the ON- clause is used to (often strongly) reduce the result by simply selecting each one of those many combinations by evaluating this portion to either true (take) or false (drop). This way you can chose any arbitrary logic to select those combinations in favor.
Things get trickier with LEFT JOIN and RIGHT JOIN, but one can easily think of them as to take one side for granted:
output the combinations of that row IF the logic yields true (once at least) - exactly like JOIN does
output exactly ONE row, with 'the other side' (right side on LEFT JOIN and vice versa) consisting of ALL NULL for every column.
Count(*) is great either, but if things getting complicated don't stick to it: Use Sub- Selects for the keys only, and once all the hard word is done join the Fun- Stuff to it. Like in
SELECT SUM(VALID), ID
FROM SELECT
(
(1 IF X 0 ELSE) AS VALID, ID
FROM ...
)
GROUP BY ID) AS sub
JOIN ... AS details ON sub.id = details.id
Difference is: The inner query is executed only once. The outer query does usually have no indices left to work with and will be slow, but if the inner select here doesn't make the data explode this is usually many times faster than SELECT ... WHERE ... IN (SELECT..) constructs.

Is there a better way to write this Oracle SQL query?

I have been using Oracle SQL for around 6 months so still a beginner. I need to query the database to get information on all items on a particular order (order number is via $_GET['id']).
I have come up with the below query, it works as expected and as I need but I do not know whether I am over complicating things which would slow the query down at all. I understand there are a number of ways to do a single thing and there may be better methods to write this query since I am a beginner.
I am using Oracle 8i (due to this is the version an application we use is supplied with) so I believe that some JOIN etc. are not available in this version, but is there a better way to write a query such as the below?
SELECT auf_pos.auf_pos,
(SELECT auf_stat.anz
FROM auf_stat
WHERE auf_stat.auf_pos = auf_pos.auf_pos
AND auf_stat.auf_nr = ".$_GET['id']."),
(SELECT auf_text.zl_str
FROM auf_text
WHERE auf_text.zl_mod = 0
AND auf_text.auf_pos = auf_pos.auf_pos
AND auf_text.auf_nr = ".$_GET['id']."),
(SELECT glas_daten_basis.gl_bez
FROM glas_daten_basis
WHERE glas_daten_basis.idnr = auf_pos.glas1),
(SELECT lzr_daten.lzr_breite
FROM lzr_daten
WHERE lzr_daten.lzr_idnr = auf_pos.lzr1),
(SELECT glas_daten_basis.gl_bez
FROM glas_daten_basis
WHERE glas_daten_basis.idnr = auf_pos.glas2),
auf_pos.breite,
auf_pos.hoehe,
auf_pos.spr_jn
FROM auf_pos
WHERE auf_pos.auf_nr = ".$_GET['id']."
Thanks in advance to any Oracle gurus that could help this beginner out!

You could rewrite it using joins. If your subselects aren't expected to return any NULL values, then you can use INNER JOINS:
SELECT auf_pos.auf_pos,
auf_stat.anz,
auf_text.zl_str,
glas_daten_basis.gl_bez,
lzr_daten.lzr_breite,
glas_daten_basis.gl_bez,
auf_pos.breite,
auf_pos.hoehe,
auf_pos.spr_jn
FROM auf_pos
INNER JOIN auf_stat ON auf_stat.auf_pos = auf_pos.auf_pos AND auf_stat.auf_nr = ".$_GET['id'].")
INNER JOIN auf_text ON auf_text.zl_mod = 0 AND auf_text.auf_pos = auf_pos.auf_pos AND auf_text.auf_nr = ".$_GET['id'].")
INNER JOIN glas_daten_basis ON glas_daten_basis.idnr = auf_pos.glas1
INNER JOIN lzr_daten ON lzr_daten.lzr_idnr = auf_pos.lzr1
INNER JOIN glas_daten_basis ON glas_daten_basis.idnr = auf_pos.glas2
Or if there are cases where you wouldn't have matches on all the tables, you could replace the INNER joins with LEFT OUTER joins:
SELECT auf_pos.auf_pos,
auf_stat.anz,
auf_text.zl_str,
glas_daten_basis.gl_bez,
lzr_daten.lzr_breite,
glas_daten_basis.gl_bez,
auf_pos.breite,
auf_pos.hoehe,
auf_pos.spr_jn
FROM auf_pos
LEFT OUTER JOIN auf_stat ON auf_stat.auf_pos = auf_pos.auf_pos AND auf_stat.auf_nr = ".$_GET['id'].")
LEFT OUTER JOIN auf_text ON auf_text.zl_mod = 0 AND auf_text.auf_pos = auf_pos.auf_pos AND auf_text.auf_nr = ".$_GET['id'].")
LEFT OUTER JOIN glas_daten_basis ON glas_daten_basis.idnr = auf_pos.glas1
LEFT OUTER JOIN lzr_daten ON lzr_daten.lzr_idnr = auf_pos.lzr1
LEFT OUTER JOIN glas_daten_basis ON glas_daten_basis.idnr = auf_pos.glas2
Whether or not you see any performance gains is debatable. As I understand it, the Oracle query optimizer should take your query and execute it with a similar plan to the join queries, but this is dependent on a number of factors, so the best thing to do it give it a try..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Query vs query - why is this one quicker than the other? - sql

I'm betting it's the coalesce statement. I believe that coalesce will end up getting applied before the where clause. So, it's actually going through each combination of the two tables, and THEN filtering on those that match the where clause.

You can put both queries into the same batch in SSMS and show execution plan - this will not only let you see them side by side, it will show a relative cost. I suspect that the IN (UNION) means the second can be easily parallelized.

Related

Need help in optimizing sql query

Why do multiple EXISTS break a query

How to work with Multiple Joins without Duplicate Data SQL

Refactoring slow SQL query

Is there a better way to write this Oracle SQL query?

Categories

Resources