Locally, this query runs fine (<1sec) but on the clients side it takes around 3mins (With a different Execution Plan).
I've rebuilt the indexes, stats, etc. This decreased the time to 2:15. After investigating I found out that the issue revolves around the or statement beginning on the line
AND (a.p = '123456789' or ... If I restructure the query to use a union instead, the query takes <1sec. So what is it about this 'or' that causes the clients time to jump 2+ minutes?
select *
from foo_main A with(nolock)
where a.i = a.i
and (IsNull(A.v, '0') = '1')
and (IsNull(A.d, '') = 'CODE_B')
and A.c in ('CODE_B')
AND (a.p = '123456789' or
a.p IN (SELECT DISTINCT f.ui
FROM foo_faculty f
LEFT JOIN foo_unit ff ON (f.ui = ff.ui)
LEFT JOIN unit u ON (ff.ui = u.ui AND
f.c = u.c),
foo_Personnel p, foo_System s, unit u1
WHERE s.P = p.P
AND s.s = 'G'
AND s.R = 'R4'
AND p.ui = '1q2w3e4r5t6y'
AND p.ui = u1.ui
AND p.i = u1.i
AND u.i = u1.i
AND u.dI LIKE u1.DI + '%' COLLATE
SQL_Latin1_General_CP1_CS_AS))
order by lname, fname
Thank you for any and all help!
Sorry I should have mentioned before, the select * is not part of the original code. I changed it to minimize the amount of code so everyone could find the or statement more easily.
Use JOINS instead of IN () clause.
Related
I am new to sql and have created the below sql to fetch the required results.However the query seems to take ages in running and is quite slow. It will be great if any help in optimization is provided.
Below is the sql query i am using:
SELECT
Date_trunc('week',a.pair_date) as pair_week,
a.used_code,
a.used_name,
b.line,
b.channel,
count(
case when b.sku = c.sku then used_code else null end
)
from
a
left join b on a.ma_number = b.ma_number
and (a.imei = b.set_id or a.imei = b.repair_imei
)
left join c on a.used_code = c.code
group by 1,2,3,4,5
I would rewrite the query as:
select Date_trunc('week',a.pair_date) as pair_week,
a.used_code, a.used_name, b.line, b.channel,
count(*) filter (where b.sku = c.sku)
from a left join
b
on a.ma_number = b.ma_number and
a.imei in ( b.set_id, b.repair_imei ) left join
c
on a.used_code = c.code
group by 1,2,3,4,5;
For this query, you want indexes on b(ma_number, set_id, repair_imei) and c(code, sku). However, this doesn't leave much scope for optimization.
There might be some other possibilities, depending on the tables. For instance, or/in in the on clause is usually a bad sign -- but it is unclear what your intention really is.
I currently have this very very slow query:
SELECT generators.id AS generator_id, COUNT(*) AS cnt
FROM generator_rows
JOIN generators ON generators.id = generator_rows.generator_id
WHERE
generators.id IN (SELECT "generators"."id" FROM "generators" WHERE "generators"."client_id" = 5212 AND ("generators"."state" IN ('enabled'))) AND
(
generators.single_use = 'f' OR generators.single_use IS NULL OR
generator_rows.id NOT IN (SELECT run_generator_rows.generator_row_id FROM run_generator_rows)
)
GROUP BY generators.id;
An I'm trying to refactor it/improve it with this query:
SELECT g.id AS generator_id, COUNT(*) AS cnt
from generator_rows gr
join generators g on g.id = gr.generator_id
join lateral(select case when exists(select * from run_generator_rows rgr where rgr.generator_row_id = gr.id) then 0 else 1 end as noRows) has on true
where g.client_id = 5212 and "g"."state" IN ('enabled') AND
(g.single_use = 'f' OR g.single_use IS NULL OR has.norows = 1)
group by g.id
For reason it doesn't quite work as expected(It returns 0 rows). I think I'm pretty close to the end result but can't get it to work.
I'm running on PostgreSQL 9.6.1.
This appears to be the query, formatted so I can read it:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id
WHERE gr.generators_id IN (SELECT g.id
FROM generators g
WHERE g.client_id = 5212 AND
g.state = 'enabled'
) AND
(g.single_use = 'f' OR
g.single_use IS NULL OR
gr.id NOT IN (SELECT rgr.generator_row_id FROM run_generator_rows rgr)
)
GROUP BY gr.generators_id;
I would be inclined to do most of this work in the FROM clause:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id JOIN
generators gg
on g.id = gg.id AND
gg.client_id = 5212 AND gg.state = 'enabled' LEFT JOIN
run_generator_rows rgr
ON g.id = rgr.generator_row_id
WHERE g.single_use = 'f' OR
g.single_use IS NULL OR
rgr.generator_row_id IS NULL
GROUP BY gr.generators_id;
This does make two assumptions that I think are reasonable:
generators.id is unique
run_generator_rows.generator_row_id is unique
(It is easy to avoid these assumptions, but the duplicate elimination is more work.)
Then, some indexes could help:
generators(client_id, state, id)
run_generator_rows(id)
generator_rows(generators_id)
Generally avoid inner selects as in
WHERE ... IN (SELECT ...)
as they are usually slow.
As it was already shown for your problem it's a good idea to think of SQL as of set- theory.
You do NOT join tables on their sole identity:
In fact you take (SQL does take) the set (- that is: all rows) of the first table and "multiply" it with the set of the second table - thus ending up with n times m rows.
Then the ON- clause is used to (often strongly) reduce the result by simply selecting each one of those many combinations by evaluating this portion to either true (take) or false (drop). This way you can chose any arbitrary logic to select those combinations in favor.
Things get trickier with LEFT JOIN and RIGHT JOIN, but one can easily think of them as to take one side for granted:
output the combinations of that row IF the logic yields true (once at least) - exactly like JOIN does
output exactly ONE row, with 'the other side' (right side on LEFT JOIN and vice versa) consisting of ALL NULL for every column.
Count(*) is great either, but if things getting complicated don't stick to it: Use Sub- Selects for the keys only, and once all the hard word is done join the Fun- Stuff to it. Like in
SELECT SUM(VALID), ID
FROM SELECT
(
(1 IF X 0 ELSE) AS VALID, ID
FROM ...
)
GROUP BY ID) AS sub
JOIN ... AS details ON sub.id = details.id
Difference is: The inner query is executed only once. The outer query does usually have no indices left to work with and will be slow, but if the inner select here doesn't make the data explode this is usually many times faster than SELECT ... WHERE ... IN (SELECT..) constructs.
We are currently upgrading to SQL Server 2014; I have a join that runs fine in SQL Server 2008 R2 but returns duplicates in SQL Server 2014. The issue appears to be with the predicate AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO for if I change it to anything but 4, I do not get the duplicates. The query is returning those values in Accounting Period 4 twice. This query gets account balances for all the previous Accounting Periods so in this case it returns values for Accounting Periods 0, 1, 2 and 3 correctly but then duplicates the values from Period 4.
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
LEFT JOIN PS_GL_ACCOUNT_TBL B
ON B.SETID = 'LTSHR'
LEFT OUTER JOIN PS_LEDGER L2
ON A.BUSINESS_UNIT = L2.BUSINESS_UNIT
AND A.LEDGER = L2.LEDGER
AND A.ACCOUNT = L2.ACCOUNT
AND A.ALTACCT = L2.ALTACCT
AND A.DEPTID = L2.DEPTID
AND A.PROJECT_ID = L2.PROJECT_ID
AND A.DATE_CODE = L2.DATE_CODE
AND A.BOOK_CODE = L2.BOOK_CODE
AND A.GL_ADJUST_TYPE = L2.GL_ADJUST_TYPE
AND A.CURRENCY_CD = L2.CURRENCY_CD
AND A.STATISTICS_CODE = L2.STATISTICS_CODE
AND A.FISCAL_YEAR = L2.FISCAL_YEAR
AND A.ACCOUNTING_PERIOD = L2.ACCOUNTING_PERIOD
AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO
WHERE
A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND ( (A.ACCOUNTING_PERIOD BETWEEN 1 and 4
AND B.ACCOUNT_TYPE IN ('E','R') )
OR
(A.ACCOUNTING_PERIOD BETWEEN 0 and 4
AND B.ACCOUNT_TYPE IN ('A','L','Q') ) )
AND A.STATISTICS_CODE = ' '
AND A.ACCOUNT = '21101'
AND A.CURRENCY_CD <> ' '
AND A.CURRENCY_CD = 'GBP'
AND B.SETID='LTSHR'
AND B.ACCOUNT=A.ACCOUNT
AND B.SETID = SETID
AND B.EFFDT=(SELECT MAX(EFFDT) FROM PS_GL_ACCOUNT_TBL WHERE SETID='LTSHR' AND WHERE ACCOUNT=B.ACCOUNT AND EFFDT<='2015-01-31 00:00:00.000')
GROUP BY A.ACCOUNT
ORDER BY A.ACCOUNT
I'm inclined to suspect that you have simplified your original query too much to reflect the real problem, but I'm going to answer the question as posed, in light of the comments on it to this point.
Since your query does not in fact select anything derived from table L2, nor do any other predicates rely on anything from that table, the only thing accomplished by (left) joining it is to duplicate rows of the pre-aggregation results where more than one satisfies the join condition for the same L2 row. That seems unlikely to be what you want, especially with that particular join being a self join, so I don't see any reason not to remove it altogether. Dollars to doughnuts, that solves the duplication problem.
I'm also going to suggest removing the correlated subquery in the WHERE clause in favor of joining an inline view, since you already join the base table for the subquery anyway. This particular inline view uses the window function version of MAX() instead of the aggregate function version. Ideally, it would directly select only the rows with the target EFFDT values, but it cannot do so without being rather more complicated, which is exactly what I am trying to avoid. The resulting query therefore filters EFFDT externally, as the original did, but without a correlated subquery.
I furthermore removed a few redundant predicates and rewrote one of the messier ones to a somewhat nicer equivalent. And I reordered the predicates in a way that seems more logical to me.
Additionally, since you are filtering on a specific value of A.ACCOUNT, it is pointless (but not wrong) to GROUP BY or ORDER_BY that column. Accordingly, I have removed those clauses to make the query simpler and clearer.
Here's what I came up with:
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
INNER JOIN (
SELECT
*,
MAX(EFFDT) OVER (PARTITION BY ACCOUNT) AS MAX_EFFDT
FROM PS_GL_ACCOUNT_TBL
WHERE
EFFDT <= '2015-01-31 00:00:00.000'
AND SETID = 'LTSHR'
) B
ON B.ACCOUNT=A.ACCOUNT
WHERE
A.ACCOUNT = '21101'
AND A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND A.CURRENCY_CD = 'GBP'
AND A.STATISTICS_CODE = ' '
AND B.EFFDT = B.MAX_EFFDT
AND CASE
WHEN B.ACCOUNT_TYPE IN ('E','R')
THEN A.ACCOUNTING_PERIOD BETWEEN 1 and 4
WHEN B.ACCOUNT_TYPE IN ('A','L','Q')
THEN A.ACCOUNTING_PERIOD BETWEEN 0 and 4
ELSE 0
END
I have a stored procedure that has been having some issues lately and I finally narrowed it down to 1 SELECT. The problem is I cannot figure out exactly what is happening to kill the performance of this one query. I re-wrote it, but I am not sure the re-write is the exact same data.
Original Query:
SELECT
#userId, p.job, p.charge_code, p.code
, (SELECT SUM(b.total) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, p.ponumber
, p.billable
, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = #userId
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
This query will practically never finish running. It actually times out for some larger jobs we have.
And if I change the 2 subqueries in the select to read like joins instead:
SELECT
#userid, p.job, p.charge_code, p.code
, (SELECT SUM(b.TOTAL))
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX))
, p.ponumber, p.billable, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = 11190030
INNER JOIN [BACKORDER W/TOTAL] b
ON P.PONUMBER = b.ponumber AND P.code = b.code
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
The data comes out looking very nearly identical to me (though there are thousands of lines overall so I could be wrong), and it runs very quickly.
Any ideas appreciated..
Performace
In the second query you have less logical reads because the table [BACKORDER W/TOTAL] has been scanned only once. In the first query two separate subqueries are processed indenpendent and the table is scanned twice although both subqueries have the same predicates.
Correctness
If you want to check if two queries return the same resultset you can use the EXCEPT operator:
If both statements:
First SELECT Query...
EXCEPT
Second SELECT Query...
and
Second SELECT Query..
EXCEPT
First SELECT Query...
return an empty set the resultsets are identical.
In terms of correctness, you are inner joining [BACKORDER W/TOTAL] in the second query, so if the first query has Null values in the subqueries, these rows would be missing in the second query.
For performance, the optimizer is a heuristic - it will sometimes use spectacularly bad query plans, and even minimal changes can sometimes lead to a completely different query plan. Your best chance is to compare the query plans and see what causes the difference.
I have the following two queries - the original is the first and the second is my slight "upgrade". The first takes nearly a second to run and the second finishes before I can get my finger completely off of the refresh button.
My question: Why?
The only difference between the first and second is that the first uses coalesce to get a value to compare berp.ID_PART_SESSION with and the second uses a union to put two select statements together to accomplish the same thing.
I still think the first one should be quicker (the original reason why I used coalesce) since it seems like it should be doing less work to get to the same result. Considering how weak I am deciphering execution plans, could someone please explain why the second query is so much better than the first?
declare #animator varchar
SELECT TOP 1 #animator = FULL_NAME
FROM T_BERP berp
INNER JOIN dbo.T_INTERV i ON i.ID_INTERV = berp.ID_INTERV
WHERE berp.VERSION = 1
AND berp.PRINCIPAL = 1
AND berp.DELETED = 0
AND berp.CANCELLED = 0
AND berp.ID_PART_SESSION = (
select coalesce(pss.ID_PART_SESSION, psst.ID_PART_SESSION)
from t_bersp b
LEFT JOIN T_PART_SESSION pss ON b.ID_PART_SESSION = pss.ID_PART_SESSION
LEFT JOIN T_PSS_TEMP psst ON b.ID_PSS_TEMP = psst.ID_PSS_TEMP
where ID_BERSP = 4040)
vs
declare #animator varchar
SELECT TOP 1 #animator = FULL_NAME
FROM dbo.T_BERP berp
INNER JOIN dbo.T_INTERV i ON i.ID_INTERV = berp.ID_INTERV
WHERE berp.VERSION = 1
AND berp.PRINCIPAL = 1
AND berp.DELETED = 0
AND berp.CANCELLED = 0
AND berp.ID_PART_SESSION IN (
select pss.ID_PART_SESSION
from dbo.t_bersp b
LEFT JOIN dbo.T_PART_SESSION pss ON b.ID_PART_SESSION = pss.ID_PART_SESSION
where ID_BERSP = 4040
union
select psst.ID_PART_SESSION
from dbo.t_bersp b
LEFT JOIN dbo.T_PSS_TEMP psst ON b.ID_PSS_TEMP = psst.ID_PSS_TEMP
where ID_BERSP = 4040)
It would be difficult to provide a definitive answer without understanding the relative sizes and indices of the various tables in your queries. One possibility: if t_part_session and t_pss_temp are both large, the query optimizer might be doing something inefficient with the two LEFT JOINs in the inner SELECT of your first query.
EDIT to clarify: Yes there are LEFT JOINs in both queries, but my speculation was that having two together (query 1) might adversely affect performance vs the UNION (query 2). Sorry if that wasn't clear initially.
Also, I highly recommend a tool such as the Instant SQL Formatter (combined with the {} icon in StackOverflow's editor) to make the queries in your question easier to read:
DECLARE #animator VARCHAR
SELECT TOP 1 #animator = full_name
FROM t_berp berp
INNER JOIN dbo.t_interv i
ON i.id_interv = berp.id_interv
WHERE berp.version = 1
AND berp.principal = 1
AND berp.deleted = 0
AND berp.cancelled = 0
AND berp.id_part_session = (SELECT
Coalesce(pss.id_part_session, psst.id_part_session)
FROM t_bersp b
LEFT JOIN t_part_session pss
ON b.id_part_session =
pss.id_part_session
LEFT JOIN t_pss_temp psst
ON b.id_pss_temporaire =
psst.id_pss_temporaire
WHERE id_bersp = 4040)
vs
DECLARE #animator VARCHAR
SELECT TOP 1 #animator = full_name
FROM dbo.t_berp berp
INNER JOIN dbo.t_interv i
ON i.id_interv = berp.id_interv
WHERE berp.version = 1
AND berp.principal = 1
AND berp.deleted = 0
AND berp.cancelled = 0
AND berp.id_part_session IN (SELECT pss.id_part_session
FROM dbo.t_bersp b
LEFT JOIN dbo.t_part_session pss
ON b.id_part_session =
pss.id_part_session
WHERE id_bersp = 4040
UNION
SELECT psst.id_part_session
FROM dbo.t_bersp b
LEFT JOIN dbo.t_pss_temp psst
ON b.id_pss_temporaire =
psst.id_pss_temporaire
WHERE id_bersp = 4040)
I'm betting it's the coalesce statement. I believe that coalesce will end up getting applied before the where clause. So, it's actually going through each combination of the two tables, and THEN filtering on those that match the where clause.
You can put both queries into the same batch in SSMS and show execution plan - this will not only let you see them side by side, it will show a relative cost.
I suspect that the IN (UNION) means the second can be easily parallelized.