I have a procedure which is basically inserting records to a table from selection of few table and views combinations
and SQL is like below
ALTER PROCEDURE [dbo].[aka_spring_rep_sum]
AS
INSERT INTO tbl_spring (col1,col2,....colx)
SELECT col1,col2...colx
FROM vw_tbl_spring bk
LEFT JOIN
(SELECT col1,col2,..
FROM vw_tbl_prices_spring) sp ON bk.col1 = sp.col1
AND bk.col2 = sp.col2
LEFT JOIN
(SELECT col1,col2...
FROM tbl_xx) stock ON bk.col1 = stock.col1
AND bk.col2 = stock.col2
LEFT JOIN
(SELECT DISTINCT col1,col2,....
FROM tbl_v) sf ON bk.col1 = sf.col1
AND bk.colx = sf.colx
LEFT JOIN
(SELECT DISTINCT col1, col2
FROM tbl_bb) vr ON sf.col1 = vr.col1
LEFT JOIN
(SELECT col1, is
FROM tbl_ss) sh ON bk.col1 = sh.col1
OPTION (RECOMPILE)
The stored procedure was taking less than 2-3 seconds only till today, but today all of a sudden this was taking very long time 30 minutes plus and never ending and forced to stop manually.
After breaking the different selections one by one I found that
select ..... FROM vw_tbl_spring bk
is ended up as a never ending call. Rest all select statements in the stored procedure are returning results less than 1 seconds.
ALTER VIEW [dbo].[vw_tbl_spring]
AS
SELECT col1, col2...., colx
FROM
(SELECT Icol1, col2, ....
FROM
(
SELECT DISTINCT col1,col2,... FROM tbl_pens s
INNER JOIN tbl_penh h ON s.col1 = h.col1 AND s.col2 = h.col2
WHERE s.col6 >= 21 AND h.col1 = 'X'
)
) b
LEFT JOIN
( SELECT col1,col2,...... FROM tbl_pens s
INNER JOIN tbl_penh h ON ON s.col1 = h.col1 AND s.col2 = h.col2
WHERE WHERE s.col6 >= 21 AND h.col1 = 'X'
) p ON b.col1 = p.col1 AND b.col1 = p.col1
LEFT JOIN vw_tbl_kk k ON p.col1 = k.col1 AND p.col1 = k.col2
Again filtering the different selections inside this view found out that the last left join is slowing things down
If we removed the last left join ie
LEFT JOIN vw_tbl_kk k ON p.col1 = k.col1 AND p.col1 = k.col2
Everything will be as normal ie will return results in less than 2-3 seconds
Unable to find what is the reason behind this sudden slowness
The same behaviour occurred a few months back and that time try to delete and recreate all associated views and stored procedure and then the issue was resolved. But this time this also didn't help
Any way to check what is causing this slowness in SQL Server?
Your query is a 6-way join. This means that if there are n rows that join in each joined table, there will be n6 resulting rows to insert. To highlight the exponential growth this causes:
rows in each table that join | number of resultant rows
1 | 1
2 | 64
3 | 729
4 | 4,096
10 | 1,000,000
There are probably suddenly more joining rows in the tables, not only making it slower, but also hitting the logging high water mark, which means the database must rollback the transaction, which is typically very slow.
Related
I am attempting to include a new table with values that need to be checked and included in a stored procedure. Statement 1 is the existing table that needs to be checked against, while statement 2 is the new table to check against.
I currently have 2 EXISTS conditions that function independently and produce the results I am expecting. By this I mean if I comment out Statement 1, statement 2 works and vice versa. When I put them together the query doesn't complete, there is no error but it times out which is unexpected because each statement only takes a few seconds.
I understand there is likely a better way to do this but before I do, I would like to know why I cannot seem to do multiple exists statements like this? Are there not meant to be multiple EXISTS conditions in the WHERE clause?
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
OR
(
--Statement 2
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
)
EDIT: I have included the query details. Table 1-5 represent different tables, there are no repeated tables.
Too long to comment.
Your query as written seems correct. The timeout will only be able to be troubleshot from the execution plan, but here are a few things that could be happening or that you could benefit from.
Parameter sniffing on #Date. Try hard-coding this value and see if you still get the same slowness
No covering index on P.OTHER_ID or P.DATE or P.ID or SA.ID which would cause a table scan for these predicates
Indexes for the above columns which aren't optimal (including too many columns, etc)
Your query being serial when it may benefit from parallelism.
Using the LOWER function on a database which doesn't have a case sensitive collation (most don't, though this function doesn't slow things down that much)
You have a bad query plan in cache. Try adding OPTION (RECOMPILE) at the bottom so you get a new query plan. This is also done when comparing the speed of two queries to ensure they aren't using cached plans, or one isn't when another is which would skew the results.
Since your query is timing out, try including the estimated execution plan and post it for us at past the plan
I found putting 2 EXISTS in the WHERE condition made the whole process take significantly longer. What I found fixed it was using UNION and keeping the EXISTS in separate queries. The final result looked like the following:
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
UNION
--Statement 2
SELECT *
FROM table1 S
WHERE
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
When I execute this query it gives a 10 result set .
select * from OA_SERVICE_REQUESTS WHERE
OA_SERVICE_REQUESTS.CUSREG_ID=4
But when I join with other table with to get more information, I use 2 inner join because this is 2 foreign key from ELVM_SMUNT_CUS table it gives me 120 results
select * from OA_SERVICE_REQUESTS
inner join ELVM_SMUNT_CUS T1 on OA_SERVICE_REQUESTS.DIVCOD = T1.DIVCOD
inner join ELVM_SMUNT_CUS T2 on OA_SERVICE_REQUESTS.UNTNUM = T2.UNTNUM
WHERE OA_SERVICE_REQUESTS.CUSREG_ID=4
Try to combine them together :
select * from OA_SERVICE_REQUESTS R
inner join ELVM_SMUNT_CUS T1 on ( R.DIVCOD = T1.DIVCOD
and R.UNTNUM = T1.UNTNUM )
where R.CUSREG_ID=4;
for your query not to produce cross-product results.
Probably, you have 12 matching records for R.DIVCOD = T1.DIVCOD, and 10 matching records for R.UNTNUM = T1.UNTNUM for R.CUSREG_ID=4, by combining the result set by an and you can have 10results at the same time, but may yield 120 occurences by 12 times 10, if conditions are taken apart by more joins.
I have a flag in a table which value ( 1 for US, or 2 for Global) indicates if the data will be in Table A or Table B.
A solution that works is to left join both tables; however this slows down significantly the scripts (from less than a second to over 15 seconds).
Is there any other clever way to do this? an equivalent of
join TableA only if TableCore.CountryFlag = "US"
join TableB only if TableCore.CountryFlag = "global"
Thanks a lot for the help.
You can try using this approach:
-- US data
SELECT
YourColumns
FROM
TableCore
INNER JOIN TableA AS T ON TableCore.JoinColumn = T.JoinColumn
WHERE
TableCore.CountryFlag = 'US'
UNION ALL
-- Non-US Data
SELECT
YourColumns -- These columns must match in number and datatype with previous SELECT
FROM
TableCore
INNER JOIN TableB AS T ON TableCore.JoinColumn = T.JoinColumn
WHERE
TableCore.CountryFlag = 'global'
However, if the result is still slow, you might want to check if the TableCore table has a index on CountryFlag and JoinColumn, and TableA and TableB an index on JoinColumn.
The basic structure is:
select . . ., coalesce(a.?, b.?) as ?
from tablecore c left join
tablea a
on c.? = a.? and c.countryflag = 'US' left join
tableb b
on c.? b.? and c.counryflag = 'global';
This version of the query can take advantage of indexes on tablea(?) and tableb(?).
If you have a complex query, this portion is probably not responsible for the performance problem.
I currently have this very very slow query:
SELECT generators.id AS generator_id, COUNT(*) AS cnt
FROM generator_rows
JOIN generators ON generators.id = generator_rows.generator_id
WHERE
generators.id IN (SELECT "generators"."id" FROM "generators" WHERE "generators"."client_id" = 5212 AND ("generators"."state" IN ('enabled'))) AND
(
generators.single_use = 'f' OR generators.single_use IS NULL OR
generator_rows.id NOT IN (SELECT run_generator_rows.generator_row_id FROM run_generator_rows)
)
GROUP BY generators.id;
An I'm trying to refactor it/improve it with this query:
SELECT g.id AS generator_id, COUNT(*) AS cnt
from generator_rows gr
join generators g on g.id = gr.generator_id
join lateral(select case when exists(select * from run_generator_rows rgr where rgr.generator_row_id = gr.id) then 0 else 1 end as noRows) has on true
where g.client_id = 5212 and "g"."state" IN ('enabled') AND
(g.single_use = 'f' OR g.single_use IS NULL OR has.norows = 1)
group by g.id
For reason it doesn't quite work as expected(It returns 0 rows). I think I'm pretty close to the end result but can't get it to work.
I'm running on PostgreSQL 9.6.1.
This appears to be the query, formatted so I can read it:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id
WHERE gr.generators_id IN (SELECT g.id
FROM generators g
WHERE g.client_id = 5212 AND
g.state = 'enabled'
) AND
(g.single_use = 'f' OR
g.single_use IS NULL OR
gr.id NOT IN (SELECT rgr.generator_row_id FROM run_generator_rows rgr)
)
GROUP BY gr.generators_id;
I would be inclined to do most of this work in the FROM clause:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id JOIN
generators gg
on g.id = gg.id AND
gg.client_id = 5212 AND gg.state = 'enabled' LEFT JOIN
run_generator_rows rgr
ON g.id = rgr.generator_row_id
WHERE g.single_use = 'f' OR
g.single_use IS NULL OR
rgr.generator_row_id IS NULL
GROUP BY gr.generators_id;
This does make two assumptions that I think are reasonable:
generators.id is unique
run_generator_rows.generator_row_id is unique
(It is easy to avoid these assumptions, but the duplicate elimination is more work.)
Then, some indexes could help:
generators(client_id, state, id)
run_generator_rows(id)
generator_rows(generators_id)
Generally avoid inner selects as in
WHERE ... IN (SELECT ...)
as they are usually slow.
As it was already shown for your problem it's a good idea to think of SQL as of set- theory.
You do NOT join tables on their sole identity:
In fact you take (SQL does take) the set (- that is: all rows) of the first table and "multiply" it with the set of the second table - thus ending up with n times m rows.
Then the ON- clause is used to (often strongly) reduce the result by simply selecting each one of those many combinations by evaluating this portion to either true (take) or false (drop). This way you can chose any arbitrary logic to select those combinations in favor.
Things get trickier with LEFT JOIN and RIGHT JOIN, but one can easily think of them as to take one side for granted:
output the combinations of that row IF the logic yields true (once at least) - exactly like JOIN does
output exactly ONE row, with 'the other side' (right side on LEFT JOIN and vice versa) consisting of ALL NULL for every column.
Count(*) is great either, but if things getting complicated don't stick to it: Use Sub- Selects for the keys only, and once all the hard word is done join the Fun- Stuff to it. Like in
SELECT SUM(VALID), ID
FROM SELECT
(
(1 IF X 0 ELSE) AS VALID, ID
FROM ...
)
GROUP BY ID) AS sub
JOIN ... AS details ON sub.id = details.id
Difference is: The inner query is executed only once. The outer query does usually have no indices left to work with and will be slow, but if the inner select here doesn't make the data explode this is usually many times faster than SELECT ... WHERE ... IN (SELECT..) constructs.
Tables and requested output
I'm using National Instrumets Teststand default database setup. I've tried to simplify the DB layout in the picture above.
I can manage to get what i want through some rather "complicated" sql, and it's very slow.
I think there is a better way, and then i stumbled over SELF JOIN. Basically what I want is to get data values from several different rows, from one "serial number".
My problem is to combine the self Join with the "general" join of my tables.
I'm using an Access Databdase at the moment.
This will give you the output you're aiming for with the sample data:
with x as (
select
row_number() over (partition by t1.Serial order by t1.Serial) as [RN],
t1.Serial,
case when t3.Sub_Test_Name = 'AAA' then t3.Value end as [AAA],
case when t3.Sub_Test_Name = 'BBB' then t3.Value end as [BBB],
case when t3.Sub_Test_Name = 'CCC' then t3.Value end as [CCC],
case when t3.Sub_Test_Name = 'DDD' then t3.Value end as [DDD]
from Table_1 t1
inner join Table_2 t2 on t2.Table_1_Id = t1.Id
inner join Table_3 t3 on t3.Table_2_Id = t2.Id
)
select
x.Serial,
AAA.AAA,
BBB.BBB,
CCC.CCC,
DDD.DDD
from x
left outer join x AAA on AAA.Serial = x.Serial and AAA.RN = x.rn + 0
left outer join x BBB on BBB.Serial = x.Serial and BBB.RN = x.rn + 1
left outer join x CCC on CCC.Serial = x.Serial and CCC.RN = x.rn + 2
left outer join x DDD on DDD.Serial = x.Serial and DDD.RN = x.rn + 3
where x.rn = 1
This uses self joins as you mentioned (where you see x being left joined to itself multiple times in the final select statement).
I've deliberately added extra columns CCC and DDD so it is easier to see how you would build this out for a larger data set, incrementing the row_number offset for each join.
I've tested this in SQL Fiddle and you're welcome to play around with it. If you need to apply additional filters, your where clause should be placed inside the CTE.
Note, you're effectively pivoting the data with this sort of query (except we're not aggregating anything, so we can't use the built in PIVOT option). The downside of both this method and real pivots is that you have to manually specify every column header with its own CASE statement in the CTE, and a left join in the final select statement. This can get unwieldy in medium - large data sets, so it best suited in cases where you will have a small number of known column headers in your results.