SQL Server view optimization help (repeated subqueries, case when, and so on...) - sql

I need some help optimizing a MSSQL view that is, honestly, a little too much complex for my knowledge.
The view is working good but I would like to rewrite it using less subqueries or a better structure in order to simplify it and use less server resources.
The main issue is a CASE WHEN with the same subquery repeated 4 times... I tried to understand if I can put it in a variable and use it for the CASE instead of repeating the query each time but it seems not possible to me...
Here's the query
SELECT dbo.MACCHINE.id_macchina, [...] dbo.VIEW_CANTIERI.indirizzo,
(SELECT TOP (1) data_fine
FROM dbo.MANUTENZIONI
WHERE (id_macchina = dbo.MACCHINE.id_macchina)
ORDER BY data_fine DESC)
AS ultima_manutenzione,
DATEDIFF(day,
(SELECT TOP (1) data_fine
FROM dbo.MANUTENZIONI AS MANUTENZIONI_1
WHERE (id_macchina = dbo.MACCHINE.id_macchina)
ORDER BY data_fine DESC), GETDATE())
AS data_diff,
(CASE WHEN
stato = 0
THEN 'GREY'
WHEN stato = 2
THEN 'BLACK'
WHEN stato = 1
THEN
(CASE WHEN
DATEDIFF(day,
(SELECT TOP (1) data_fine
FROM dbo.MANUTENZIONI AS MANUTENZIONI_1
WHERE (id_macchina = dbo.MACCHINE.id_macchina)
ORDER BY data_fine DESC), GETDATE()) >= 90
THEN 'RED'
WHEN
DATEDIFF(day,
(SELECT TOP (1) data_fine
FROM dbo.MANUTENZIONI AS MANUTENZIONI_1
WHERE (id_macchina = dbo.MACCHINE.id_macchina)
ORDER BY data_fine DESC), GETDATE()) >= 80
THEN 'ORANGE'
WHEN
DATEDIFF(day,
(SELECT TOP (1) data_fine
FROM dbo.MANUTENZIONI AS MANUTENZIONI_1
WHERE (id_macchina = dbo.MACCHINE.id_macchina)
ORDER BY data_fine DESC), GETDATE()) >= 60
THEN 'YELLOW'
WHEN
(SELECT TOP (1) data_fine
FROM dbo.MANUTENZIONI AS MANUTENZIONI_1
WHERE (id_macchina = dbo.MACCHINE.id_macchina)
ORDER BY data_fine DESC) IS NULL
THEN 'RED' ELSE 'GREEN'
END)
END)
AS colore FROM dbo.MACCHINE INNER JOIN
dbo.MACCHINE_MODELLI ON dbo.MACCHINE.id_modello = dbo.MACCHINE_MODELLI.id_modello INNER JOIN
dbo.MACCHINE_TIPOLOGIE ON dbo.MACCHINE_MODELLI.id_tipologia = dbo.MACCHINE_TIPOLOGIE.id_tipologia INNER JOIN
dbo.VIEW_CANTIERI ON dbo.MACCHINE.id_cantiere = dbo.VIEW_CANTIERI.id_cantiere INNER JOIN
dbo.MACCHINE_PRODUTTORI ON dbo.MACCHINE_MODELLI.id_produttore = dbo.MACCHINE_PRODUTTORI.id_produttore INNER JOIN
dbo.CLIENTI ON dbo.VIEW_CANTIERI.id_cliente = dbo.CLIENTI.id_cliente WHERE (dbo.MACCHINE._del = 'N')
Any suggestion is really appreciated, if you need more information about the db I will try to provide it...

I notice you are using TOP(1), so this must be 2005 or above. You can keep the result of the datediff in an OUTER APPLY subquery so that it is only evaluated once.
SELECT
M.id_macchina,
[...],
V.indirizzo,
MA.data_fine AS ultima_manutenzione,
MA.data_diff,
CASE
WHEN stato = 0 THEN 'GREY'
WHEN stato = 2 THEN 'BLACK'
WHEN stato = 1 THEN
CASE
WHEN MA.data_diff >= 90 THEN 'RED'
WHEN MA.data_diff >= 80 THEN 'ORANGE'
WHEN MA.data_diff >= 60 THEN 'YELLOW'
WHEN MA.data_fine IS NULL THEN 'RED'
ELSE 'GREEN'
END
END AS colore
FROM dbo.MACCHINE M
INNER JOIN dbo.MACCHINE_MODELLI I ON M.id_modello = I.id_modello
INNER JOIN dbo.MACCHINE_TIPOLOGIE T ON I.id_tipologia = T.id_tipologia
INNER JOIN dbo.VIEW_CANTIERI V ON M.id_cantiere = V.id_cantiere
INNER JOIN dbo.MACCHINE_PRODUTTORI P ON I.id_produttore = P.id_produttore
INNER JOIN dbo.CLIENTI C ON V.id_cliente = C.id_cliente
OUTER APPLY (SELECT TOP (1)
MA.data_fine,
DATEDIFF(day, MA.data_fine, GETDATE()) AS data_diff
FROM dbo.MANUTENZIONI AS MA
WHERE MA.id_macchina = M.id_macchina
ORDER BY MA.data_fine DESC) MA
WHERE M._del = 'N'

Apologies if you've done this already, but I would suggest using SHOWPLAN (ctrl+L), or performance tools if you have them. Confirm that it's really these CASE statements that are causing the issue - the bottleneck could be one of the joins for instance, a missing index, or stale statistics promoting a bad execution plan.
If the view is read from far more often than it is written to, or it doesn't need to have exactly up to date data (i.e. you could cache the results each day and read the view from last night's refresh), consider using an indexed view. This will cache the view as if it were a table, so the the operations (including both the CASE and the JOINs) are not performed every time you read from the table (having an index also speeds usage). Alternatively, if only recent data changes, you could try partitioning an indexed view on date.

Related

Optimizing SQL Query speed

I am trying to optimize my SQL query below as I am using a very old RDMS called firebird. I tried rearranging the items in my where clause and removing the order by statement but the query still seems to take forever to run. Unfortunately firebird doesn't support Explain Execution Plan Functionalities and therefore I cannot identify the code that is holding up the query.
select T.veh_reg_no,T.CON_NO, sum(T.pos_gpsunlock) as SUM_GPS_UNLOCK,
count(T.pos_gpsunlock) as SUM_REPORTS, contract.con_name
from
(
select veh_reg_no,CON_NO,
case when pos_gpsunlock = upper('T') then 1 else 0 end as pos_gpsunlock
from vehpos
where veh_reg_no in
( select regno
from fleetvehicle
where fleetno in (97)
) --DS5
and pos_timestamp > '2022-07-01'
and pos_timestamp < '2022-08-01'
) T
join contract on T.con_no = contract.con_no
group by T.veh_reg_no, T.con_no,contract.con_name
order by SUM_GPS_UNLOCK desc;
If anyone can help it would be greatly appreciated.
I'd either comment out some of the sub-queries or remove a join or aggregation and see if that improves it. Once you find the offending code maybe you can move it or re-write it. I know nothing of Firebird but I'd approach that query with the below code, wrapping the aggregation outside of the joins and removing the "Where in" clause.
If nothing works can you create an aggregation table or pre-filtered table and use that?
select
x.*
,sum(case when x.pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
,count(*) as SUM_REPORTS
FROM (
select
a.veh_reg_no
,a.pos_gpsunlock
,a.CON_NO
,c.con_name
FROM vehpos a
JOIN fleetvehicle b on a.veg_reg_no = b.reg_no and b.fleetno = 97 and b.pos_timestamp between '222-07-01' and '2022-08-01'
JOIN contract c on a.con_no = contract.con_no
) x
Group By....
This might help by converting subqueries to joins and reducing nesting. Also an = instead of IN() operation.
select vp.veh_reg_no,vp.con_no,c.con_name,
count(*) as SUM_REPORTS,
sum(case when pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
from vehpos vp
inner join fleetvehicle fv on fv.fleetno = 97 and fv.regno = vp.veh_reg_no
inner join contract c on vp.con_no = c.con_no
where vp.pos_timestamp >= '2022-07-01'
and vp.pos_timestamp < '2022-08-01'
group by vp.veh_reg_no, vp.con_no, c.con_name

How could I join these queries together?

I have 2 queries. One includes a subquery and the other is a pretty basic query. I can't figure out how to combine them to return a table with name, workdate, minutes_inactive, and hoursworked.
I have the code below for what I have tried. The simple query is lines 1,2, and the last 5 lines. I also added a join clause (join punchclock p on p.servrepID = l.repid) to it.
Both these queries ran on their own so this is solely just the problem of combining them.
select
sr.sr_name as liaison, cast(date_created as date) workdate,
(count(date_created) * 4) as minutes_inactive,
(select
sr_name, cast(punchintime as date) as workdate,
round(sum(cast(datediff(minute,punchintime, punchouttime) as real) / 60), 2) as hoursworked,
count(*) as punches
from
(select
sr_name, punchintime = punchdatetime,
punchouttime = isnull((select top 1 pc2.punchdatetime
from punchclock pc2
where pc2.punchdatetime > pc.punchdatetime
and pc.servrepid = pc2.servrepid
and pc2.inout = 0
order by pc2.punchdatetime), getdate())
from
punchclock pc
join
servicereps sr on pc.servrepid = sr.servrepid
where
punchyear >= 2017 and pc.inout = 1
group by
sr_name, cast(punchintime as date)))
from
tbl_liga_popup_log l
join
servicereps sr on sr.servrepID = l.repid
join
punchclock p on p.servrepID = l.repid collate latin1_general_bin
group by
cast(l.date_created as date), sr.sr_name
I get this error:
Msg 102, Level 15, State 1, Line 19
Incorrect syntax near ')'
I keep getting this error but there are more errors if I adjust that part.
I don't know that we'll fix everything here, but there are a few issues with your query.
You have to alias your sub-query (technically a derived table, but whatever)
You have two froms in your outer query.
You have to join to the derived table.
Here's an crude example:
select
<some stuff>
from
(select col1 from table1) t1
inner join t2
on t1.col1 = t2.col2
The large error here is that you are placing queries in the select section (before the from). You can only do this if the query returns a single value. Else, you have to put your query in a parenthesis (you have done this) in the from section, give it an alias, and join it accordingly.
You also seem to be using group bys that are not needed anywhere. I can't see aggregation functions like sum().
My best bet is that you are looking for the following query:
select
sr_name as liaison
,cast(date_created as date) workdate
,count(distinct date_created) * 4 as minutes_inactive
,cast(punchintime as date) as workdate
,round(sum(cast(datediff(minute,punchintime,isnull(pc2_query.punchouttime,getdate())) as real) / 60), 2) as hoursworked
,count(*) as punches
from
punchclock pc
inner join servicereps sr on pc.servrepid = sr.servrepid
cross apply
(
select top 1 pc2.punchdatetime as punchouttime
from punchclock pc2
where pc2.punchdatetime > pc.punchdatetime
and pc.servrepid = pc2.servrepid
and pc2.inout = 0
order by pc2.punchdatetime
)q1
inner join tbl_liga_popup_log l on sr.servrepID = l.repid
where punchyear >= 2017 and pc.inout = 1

Should a subquery on a join use tables from an outer query in the where clause?

I need to add a subquery to a join, because one payment can have more than one allotment, so I only need to account for the first match (where rownum = 1).
However, I'm not sure if adding pmt from the outer query to the subquery on the allotment join is best.
Should I be doing this differently in the event of performance hits, etc.. ?
SELECT
pmt.payment_uid,
alt.allotment_uid,
FROM
payment pmt
/* HERE: is the reference to pmt.pay_key and pmt.client_id
incorrect in the below subquery? */
INNER JOIN allotment alc ON alt.allotment_uid = (
SELECT
allotment_uid
FROM
allotment
WHERE
pay_key = pmt.pay_key
AND
pay_code = 'xyz'
AND
deleted = 'N'
AND
client_id = pmt.client_id
AND
ROWNUM = 1
)
WHERE
AND
pmt.deleted = 'N'
AND
pmt.date_paid >= TO_DATE('2017-07-01')
AND
pmt.date_paid < TO_DATE('2017-10-01') + 1;
It's difficult to identify the performance issue in your query without seeing an explain plan output. You query does seem to do an additional SELECT on the allotment for every record from the main query.
Here is a version which doesn't use correlated sub query. Obviously I haven't been able to test it. It does a simple join in and then filters all records except one of the allotments. Hope this helps.
WITH v_payment
AS
(
SELECT
pmt.payment_uid,
alt.allotment_uid,
ROW_NUMBER () OVER(PARTITION BY allotment_id) r_num
FROM
payment pmt JOIN allotment alt
ON (pmt.pay_key = alt.pay_key AND
pmt.client_id = alt.client_id)
WHERE pmt.deleted = 'N' AND
pmt.date_paid >= TO_DATE('2017-07-01') AND
pmt.date_paid < TO_DATE('2017-10-01') + 1 AND
alt.pay_code = 'xyz' AND
alt.deleted = 'N'
)
SELECT payment_uid,
allotment_uid
FROM v_payment
WHERE r_num = 1;
Let's know how this performs!
You can phrase the query that way. I would be more likely to do:
SELECT . . .
FROM payment p INNER JOIN
(SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY pay_key, client_id
ORDER BY allotment_uid
) as seqnum
FROM allotment a
WHERE pay_code = 'xyz' AND deleted = 'N'
) a
ON a.pay_key = p.pay_key AND a.client_id = p.client_id AND
seqnum = 1
WHERE p.deleted = 'N' AND
p.date_paid >= DATE '2017-07-01' AND
p.date_paid < (DATE '2017-10-01') + 1;

SQL Server / T-SQL : query optimization assistance

I have this QA logic that looks for errors into every AuditID within a RoomID to see if their AuditType were never marked Complete or if they have two complete statuses. Finally, it picks only the maximum AuditDate of the RoomIDs with errors to avoid showing multiple instances of the same RoomID, since there are many audits per room.
The issue is that the AUDIT table is very large and takes a long time to run. I was wondering if there is anyway to reach the same result faster.
Thank you in advance !
IF object_ID('tempdb..#AUDIT') is not null drop table #AUDIT
IF object_ID('tempdb..#ROOMS') is not null drop table #ROOMS
IF object_ID('tempdb..#COMPLETE') is not null drop table #COMPLETE
IF object_ID('tempdb..#FINALE') is not null drop table #FINALE
SELECT distinct
oc.HotelID, o.RoomID
INTO #ROOMS
FROM dbo.[rooms] o
LEFT OUTER JOIN dbo.[hotels] oc on o.HotelID = oc.HotelID
WHERE
o.[status] = '2'
AND o.orderType = '2'
SELECT
t.AuditID, t.RoomID, t.AuditDate, t.AuditType
INTO
#AUDIT
FROM
[dbo].[AUDIT] t
WHERE
t.RoomID IN (SELECT RoomID FROM #ROOMS)
SELECT
t1.RoomID, t3.AuditType, t3.AuditDate, t3.AuditID, t1.CompleteStatus
INTO
#COMPLETE
FROM
(SELECT
RoomID,
SUM(CASE WHEN AuditType = 'Complete' THEN 1 ELSE 0 END) AS CompleteStatus
FROM
#AUDIT
GROUP BY
RoomID) t1
INNER JOIN
#AUDIT t3 ON t1.RoomID = t3.RoomID
WHERE
t1.CompleteStatus = 0
OR t1.CompleteStatus > 1
SELECT
o.HotelID, o.RoomID,
a.AuditID, a.RoomID, a.AuditDate, a.AuditType, a.CompleteStatus,
c.ClientNum
INTO
#FINALE
FROM
#ROOMS O
LEFT OUTER JOIN
#COMPLETE a on o.RoomID = a.RoomID
LEFT OUTER JOIN
[dbo].[clients] c on o.clientNum = c.clientNum
SELECT
t.*,
Complete_Error_Status = CASE WHEN t.CompleteStatus = 0
THEN 'Not Complete'
WHEN t.CompleteStatus > 1
THEN 'Complete More Than Once'
END
FROM
#FINALE t
INNER JOIN
(SELECT
RoomID, MAX(AuditDate) AS MaxDate
FROM
#FINALE
GROUP BY
RoomID) tm ON t.RoomID = tm.RoomID AND t.AuditDate = tm.MaxDate
One section you could improve would be this one. See the inline comments.
SELECT
t1.RoomID, t3.AuditType, t3.AuditDate, t3.AuditID, t1.CompleteStatus
INTO
#COMPLETE
FROM
(SELECT
RoomID,
COUNT(1) AS CompleteStatus
-- Use the above along with the WHERE clause below
-- so that you are aggregating fewer records and
-- avoiding a CASE statement. Remove this next line.
--SUM(CASE WHEN AuditType = 'Complete' THEN 1 ELSE 0 END) AS CompleteStatus
FROM
#AUDIT
WHERE
AuditType = 'Complete'
GROUP BY
RoomID) t1
INNER JOIN
#AUDIT t3 ON t1.RoomID = t3.RoomID
WHERE
t1.CompleteStatus = 0
OR t1.CompleteStatus > 1
Just a thought. Streamline your code and your solution. you are not effectively filtering your datasets smaller so you continue to query the entire tables which is taking a lot of your resources and your temp tables are becoming full copies of those columns without the indexes (PK, FK, ++??) on the original table to take advantage of. This by no means is a perfect solution but it is an idea of how you can consolidate your logic and reduce your overall data set. Give it a try and see if it performs better for you.
Note this will return the last audit record for any room that has either not had an audit completed or completed more than once.
;WITH cte AS (
SELECT
o.RoomId
,o.clientNum
,a.AuditId
,a.AuditDate
,a.AuditType
,NumOfAuditsComplete = SUM(CASE WHEN a.AuditType = 'Complete' THEN 1 ELSE 0 END) OVER (PARTITION BY o.RoomId)
,RowNum = ROW_NUMBER() OVER (PARTITION BY o.RoomId ORDER BY a.AuditDate DESC)
FROm
dbo.Rooms o
LEFT JOIN dbo.Audit a
ON o.RoomId = a.RoomId
WHERE
o.[Status] = 2
AND o.OrderType = 2
)
SELECT
oc.HotelId
,cte.RoomId
,cte.AuditId
,cte.AuditDate
,cte.AuditType
,cte.NumOfAuditsComplete
,cte.clientNum
,Complete_Error_Status = CASE WHEN cte.NumOfAuditsComplete > 1 THEN 'Complete More Than Once' ELSE 'Not Complete' END
FROM
cte
LEFT JOIN dbo.Hotels oc
ON cte.HotelId = oc.HotelId
LEFT JOIN dbo.clients c
ON cte.clientNum = c.clientNum
WHERE
cte.RowNum = 1
AND cte.NumOfAuditsComplete != 1
Also note I changed your
WHERE
o.[status] = '2'
AND o.orderType = '2'
TO
WHERE
o.[status] = 2
AND o.orderType = 2
to be numeric without the single quotes. If the data type is truely varchar add them back but when you query a numeric column as a varchar it will do data conversion and may not take advantage of indexes that you have built on the table.

Need to speed up the results of this SQL statement. Any advice?

I've got the following SQL Statement that needs some major speed up. The problem is I need to search on two fields, where each of them is calling several sub-selects. Is there a way to join the two fields together so I call the sub-selects only once?
SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM trcdba.billspaid
WHERE date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010'
AND custno in
(select custno from cwdba.txpytaxid where taxpayerno in
(select taxpayerno from cwdba.txpyaccts where accountno in
(select accountno from rtadba.reasacct where controlno = 1234567)))
OR custno2 in
(select custno from cwdba.txpytaxid where taxpayerno in
(select taxpayerno from cwdba.txpyaccts where accountno in
(select accountno from rtadba.reasacct where controlno = 1234567)))
I would use joins instead of the embedded sub-queries.
when you use a function on the column:
date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010'
an index will NOT be used. Try something like this
datepif > someconversionhere('01/06/2009')
AND datepif <= someconversionhere('01/06/2010')
Use inner joins too. There isn't any info in the question to indicate table size or if there is an index or not, so this is a guess and should work best if there are many more rows in billspaid for the date range vs rows that match the joining tables for r.controlno = 1234567, which I suspect is the case:
SELECT
COALESCE(b1.billyr,b2.billyr) AS billyr
,COALESCE(b1.billno,b2.billno) AS billno
,COALESCE(b1.propacct,b2.propacct) AS propacct
,COALESCE(b1.vinid,b2.vinid) AS vinid
,COALESCE(b1.taxpaid,b2.taxpaid) AS taxpaid
,COALESCE(b1.duedate,b2.duedate) AS duedate
,COALESCE(b1.datepif,b2.datepif) AS datepif
,COALESCE(b1.propdesc,b2.propdesc) AS propdesc
FROM rtadba.reasacct r
INNER JOIN cwdba.txpyaccts a ON r.accountno=t.accountno
INNER JOIN cwdba.txpytaxid t ON a.taxpayerno=t.taxpayerno
LEFT OUTER JOIN trcdba.billspaid b1 ON t.custno=b1.custno AND b1.datepif > someconversionhere('01/06/2009') AND b1.datepif <= someconversionhere('01/06/2010')
LEFT OUTER JOIN trcdba.billspaid b2 ON t.custno2=b2.custno AND b2.datepif > someconversionhere('01/06/2009') AND b2.datepif <= someconversionhere('01/06/2010')
WHERE r.controlno = 1234567
AND COALESCE(b1.custno,b2.custno) IS NOT NULL
create an index for each of these:
rtadba.reasacct.controlno and cover on accountno
cwdba.txpyaccts.accountno and cover on taxpayerno
cwdba.txpytaxid.taxpayerno and cover on custno
trcdba.billspaid.custno +datepif
trcdba.billspaid.custno2 +datepif
Here's the same thing using JOIN instead of sub queries.
SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM billspaid
INNER JOIN txpytaxid
ON txpytaxid.custno = billspaid.custno OR txpytaxid.custno = billspaid.custno2
INNER JOIN txpyaccts
ON txpyaccts.taxpayerno = txpytaxid.taxpayerno
INNER JOIN reasacct
ON reasacct.accountno = txpyaccts.accountno AND reasacct.controlno = 1234567
WHERE date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010'
However, if the OR in the JOIN is giving you performance problems, you can always try using a union:
(SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM billspaid
INNER JOIN txpytaxid
ON txpytaxid.custno = billspaid.custno
INNER JOIN txpyaccts
ON txpyaccts.taxpayerno = txpytaxid.taxpayerno
INNER JOIN reasacct
ON reasacct.accountno = txpyaccts.accountno AND reasacct.controlno = 1234567
WHERE date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010')
UNION
(SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM billspaid
INNER JOIN txpytaxid
ON txpytaxid.custno = billspaid.custno2
INNER JOIN txpyaccts
ON txpyaccts.taxpayerno = txpytaxid.taxpayerno
INNER JOIN reasacct
ON reasacct.accountno = txpyaccts.accountno AND reasacct.controlno = 1234567
WHERE date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010')
Use EXISTS instead of IN ( unless the result set of the IN subquery is very small).
If you do UNION instead of OR ( which should be functionally equivalent ) use UNION ALL instead.