SQL Server "ORDER BY" optimization - massive performance decrease - sql

Using SQL Server 2000. I have a table that receives a dump from a legacy system once a day, I am trying to write a query that will process this table with a few reference table joins and an order by clause.
This is the SQL I have:
select d.acct_no,
d.associate_id,
d.first_name,
d.last_name,
d.acct_bal,
plr.long_name p_lvl,
tlr.long_name t_lvl,
d.category,
d.status,
tm.site_name,
d.addr1 + ' ' + isnull(d.addr2,'') address,
d.city,
d.state,
d.country,
d.post_code,
CASE WHEN d.home_phone_ok = 1 THEN d.home_phone END home_phone,
CASE WHEN d.work_phone_ok = 1 THEN d.work_phone END work_phone,
CASE WHEN d.alt_phone_ok = 1 THEN d.alt_phone END alt_phone,
CASE WHEN d.email_ok = 1 THEN d.email END email,
d.last_credit last_paid,
d.service,
d.quantity,
d.amount,
ar.area_desc area
from item_dump d
left outer join territory_map tm on tm.short_postcode = left(post_code,3) and country in ('United States','Canada')
left outer join p_level_ref plr on plr.p_level_id = d.p_lvl_id
left outer join t_level_ref tlr on tlr.t_level_id = d.t_lvl_id
left outer join (select distinct master_item_id, site_item_id from invoice_detail) as map on map.item_id = d.item_no
left outer join item_ref i on i.item_id = map.master_item_id
left outer join area_ref ar on ar.area_id = i.area_id
where (d.cat_id > 80 or d.cat_id < 70)
and d.standing < 4
and d.status not like 'DECEASED'
and d.paid = 1
order by d.associate_id
Most of these columns are straight from the legacy system dump table item_dump. All the joins are only reference tables with few rows. The legacy table itself has about 17000 records but with the where statements the query comes out to 3000.
I have a non-clustered index on the associate_id column.
When I run this query without the order by associate_id clause it takes about 2 seconds. With the order by clause it takes a full minute!
I've tried adding the where clause columns to the index along with associate_id but that didn't change the performance at all.
The end of the execution plan without the order by looks like this:
Using order by, parallelism kicks in on the order by argument and it looks like this:
I thought maybe it was weird SQL Server 2000 parallelism handling, but adding the (maxdop 1) hint made the query take 3 minutes instead!
It isn't really sensible for me to put sorting in the application code because this query caches for about 6 hours before it gets run again and I would have to sort it in the application code many times a minute.
I must be missing something very basic but after straining at the query for an hour i.e. running it 10 times, I can't see what it is anymore.

What happens when u remove all the outer joins and ofcourse the select's in there..
select d.acct_no,
d.associate_id,
d.first_name,
d.last_name,
d.acct_bal,
d.category,
d.status,
d.addr1 + ' ' + isnull(d.addr2,'') address,
d.city,
d.state,
d.country,
d.post_code,
CASE WHEN d.home_phone_ok = 1 THEN d.home_phone END home_phone,
CASE WHEN d.work_phone_ok = 1 THEN d.work_phone END work_phone,
CASE WHEN d.alt_phone_ok = 1 THEN d.alt_phone END alt_phone,
CASE WHEN d.email_ok = 1 THEN d.email END email,
d.last_credit last_paid,
d.service,
d.quantity,
d.amount
from item_dump d
where (d.cat_id > 80 or d.cat_id < 70)
and d.standing < 4
and d.status not like 'DECEASED'
and d.paid = 1
order by d.associate_id
If that works fast then i would go for sub selects inside the select's
select d.acct_no,
d.associate_id,
d.first_name,
d.last_name,
d.acct_bal,
plr.long_name p_lvl,
tlr.long_name t_lvl,
d.category,
d.status,
(select tm.site_name
from territory_map tm
where tm.short_postcode = left(post_code,3)
and country in ('United States','Canada') as site_name
etc. it'll be really faster as left outer joining them in the from clause

Related

Optimizing SQL Query speed

I am trying to optimize my SQL query below as I am using a very old RDMS called firebird. I tried rearranging the items in my where clause and removing the order by statement but the query still seems to take forever to run. Unfortunately firebird doesn't support Explain Execution Plan Functionalities and therefore I cannot identify the code that is holding up the query.
select T.veh_reg_no,T.CON_NO, sum(T.pos_gpsunlock) as SUM_GPS_UNLOCK,
count(T.pos_gpsunlock) as SUM_REPORTS, contract.con_name
from
(
select veh_reg_no,CON_NO,
case when pos_gpsunlock = upper('T') then 1 else 0 end as pos_gpsunlock
from vehpos
where veh_reg_no in
( select regno
from fleetvehicle
where fleetno in (97)
) --DS5
and pos_timestamp > '2022-07-01'
and pos_timestamp < '2022-08-01'
) T
join contract on T.con_no = contract.con_no
group by T.veh_reg_no, T.con_no,contract.con_name
order by SUM_GPS_UNLOCK desc;
If anyone can help it would be greatly appreciated.
I'd either comment out some of the sub-queries or remove a join or aggregation and see if that improves it. Once you find the offending code maybe you can move it or re-write it. I know nothing of Firebird but I'd approach that query with the below code, wrapping the aggregation outside of the joins and removing the "Where in" clause.
If nothing works can you create an aggregation table or pre-filtered table and use that?
select
x.*
,sum(case when x.pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
,count(*) as SUM_REPORTS
FROM (
select
a.veh_reg_no
,a.pos_gpsunlock
,a.CON_NO
,c.con_name
FROM vehpos a
JOIN fleetvehicle b on a.veg_reg_no = b.reg_no and b.fleetno = 97 and b.pos_timestamp between '222-07-01' and '2022-08-01'
JOIN contract c on a.con_no = contract.con_no
) x
Group By....
This might help by converting subqueries to joins and reducing nesting. Also an = instead of IN() operation.
select vp.veh_reg_no,vp.con_no,c.con_name,
count(*) as SUM_REPORTS,
sum(case when pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
from vehpos vp
inner join fleetvehicle fv on fv.fleetno = 97 and fv.regno = vp.veh_reg_no
inner join contract c on vp.con_no = c.con_no
where vp.pos_timestamp >= '2022-07-01'
and vp.pos_timestamp < '2022-08-01'
group by vp.veh_reg_no, vp.con_no, c.con_name

SQL Where clause greatly increases query time

I have a table that I do some joins and operations on. This table has about 150,000 rows and if I select all and run it, it returns in about 10 seconds. If I create my query into its own table, and filter out all the rows where a certain field is null, now the query takes 10 minutes to run. Is it suppoused to be like this or is there any way to fix it? Here is the query.
SELECT *
FROM
(
Select
I.Date_Created
,I.Company_Code
,I.Division_Code
,I.Invoice_Number
,Sh.CUST_PO
,I.Total_Quantity
,ID.Total
,SH.Ship_City City
,CASE WHEN SH.Ship_Cntry <> 'US' THEN 'INT' ELSE SH.Ship_prov END State
,SH.Ship_Zip Zip
,SH.Ship_Cntry Country
,S.CustomerEmail
from [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock)
LEFT JOIN (SELECT
ID.Company_Code
,ID.Division_Code
,ID.Invoice_Number
,SUM (ID.Price* ID.Quantity) Total
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices_Detail ID (nolock)
GROUP BY ID.Company_Code, ID.Division_Code, ID.Invoice_Number) ID
ON I.Company_Code = ID.Company_Code
AND I.Division_Code = ID.Division_Code
AND I.Invoice_Number = ID.Invoice_Number
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].SHIPHIST SH (nolock) ON I.Pickticket_Number = SH.Packslip
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].[MagentoCustomerEmailData] S on SH.CUST_PO = S.InvoiceNumber
Where I.Company_Code ='09' AND I.Division_Code = '001'
AND I.Customer_Number = 'ECOM2X'
)T
Where T.CustomerEmail IS NOT NULL -- This is the problematic line
Order By T.Date_Created desc
If you are aware of the Index Considerations and you are sure about the problem point, then you can use this to improve it:
USE A1WAREHOUSE;
GO
CREATE NONCLUSTERED INDEX IX_MagentoCustomerEmailData_CustomerEmail
ON [dbo].[MagentoCustomerEmailData] (CustomerEmail ASC);
GO
Totally, you need to add index on columns used in ORDER BY, WHERE, GROUP BY, ON etc sections. Before adding indexes be sure that you are aware of the consequences.
Read more about Index:
https://www.mssqltips.com/sqlservertutorial/9133/sql-server-nonclustered-indexes/
https://www.itprotoday.com/sql-server/indexing-dos-and-don-ts

Select row from a MAX() GROUP BY in SQL Server

I have a table I called Eventos. I have to select the corresponding outTime from the alarm which has the greater inTime.
And I have to do it quickly/optimized. I have about 1 million entries in the table.
This is my code:
SELECT
CadGrupoEventos.Severidade AS Nível,
CadGrupoEquipamentos.Nome AS Grupo,
CadEquipamentos.TAG AS Equipamento,
CadEventos.MensagemPT AS 'Mensagem de alarme',
MAX(Eventos.InTime) AS 'Hora do evento',
Eventos.OutTime AS 'Hora de saída'
FROM
CadGrupoEventos,
CadEquipamentos,
CadEventos,
Eventos,
CadUsuarios,
CadGrupoEquipamentos
WHERE
Eventos.Acked = 0
AND CadGrupoEventos.Codigo = CadEventos.Grupo
AND CadEquipamentos.Codigo = Eventos.TAG
AND CadEventos.Codigo = Eventos.CodMensagem
AND CadGrupoEquipamentos.Codigo = CadEquipamentos.Grupo
GROUP BY
CadGrupoEventos.Severidade,
CadEquipamentos.TAG,
CadEventos.MensagemPT,
CadGrupoEquipamentos.Nome,
Eventos.OutTime
This code, as it is, returns every single entry from the table.
I have to take Eventos.OutTime out of GROUP BY and still get the value of it.
This is just an educated guess based on your description. Notice I used ANSI-92 style joins which are much more explicit. I also used aliases to make this a lot more legible. Your query might look something like this.
select x.Severidade AS Nível,
x.Nome AS Grupo,
x.TAG AS Equipamento,
x.MensagemPT AS [Mensagem de alarme],
x.[Hora do evento],
x.OutTime AS [Hora de saída]
from
(
SELECT cge.Severidade,
cgequip.Nome,
ce.TAG,
cevt.MensagemPT,
MAX(e.InTime) AS [Hora do evento],
e.OutTime
, RowNum = ROW_NUMBER() over(partition by cge.Severidade, ce.TAG, cevt.MensagemPT, cgequip.Nome order by e.OutTime /*maybe desc???*/)
FROM CadGrupoEventos cge
join CadEventos cevt on cge.Codigo = cevt.Grupo
join Eventos e on AND cevt.Codigo = e.CodMensagem
join CadEquipamentos ce on ce.Codigo = e.TAG
join CadGrupoEquipamentos cgequip on cgequip.Codigo = ce.Grupo
cross join CadUsuarios cu --not sure if this is really what you want but your original code did not have any logic for this table
WHERE e.Acked = 0
GROUP BY cge.Severidade,
ce.TAG,
cevt.MensagemPT,
cgequip.Nome,
e.OutTime
) x
where x.RowNum = 1

Joining a derived table postgres

I have 4 tables:
Competencies: a list of obviously competencies, static and a library
Competency Levels: refers to an associated group of competencies and has a number of competencies I am testing for
call_competency: a list of all 'calls' that have recorded the specified competency
competency_review_status: proving whether each call_competency was reviewed
Now I am trying to write this query to count a total and spit out the competency, id and whether a user has reached the limit. Everything works except for when I add the user. I am not sure what I am doing wrong, once I limit call competency by user in the where clause, I get a small subset that ONLY exists in call_competency returned when I want the entire list of competencies.
The competencies not reached should be false, ones recorded appropriate number true. A FULL list from the competency table.
I added the derived table, not sure if this is right, obviously it doesn't run properly, not sure what I'm doing wrong and I'm wasting time. Any help much appreciated.
SELECT comp.id, comp.shortname, comp.description,
CASE WHEN sum(CASE WHEN crs.grade = 'Pass' THEN 1 ELSE CASE WHEN crs.grade = 'Fail' THEN -1 ELSE 0 END END) >= comp_l.competency_break_level
THEN TRUE ELSE FALSE END
FROM competencies comp
INNER JOIN competency_levels comp_l ON comp_l.competency_group = comp.competency_group
LEFT OUTER JOIN (
SELECT competency_id
FROM call_competency
WHERE call_competency.user_id IN (
SELECT users.id FROM users WHERE email= _studentemail
)
) call_c ON call_c.competency_id = comp.id
LEFT OUTER JOIN competency_review_status crs ON crs.id = call_competency.review_status_id
GROUP BY comp.id, comp.shortname, comp.description, comp_l.competency_break_level
ORDER BY comp.id;
(Shooting from the hip, no installation to test)
It looks like the below should do the trick. You apparently had some of the joins mixed up, with a column from a relation that was not referenced. Also, the CASE statement in the main query could be much cleaner.
SELECT comp.id, comp.shortname, comp.description,
(sum(CASE WHEN crs.grade = 'Pass' THEN 1 WHEN crs.grade = 'Fail' THEN -1 ELSE 0 END) >= comp_l.competency_break_level) AS reached_limit
FROM competencies comp
JOIN competency_levels comp_l USING (competency_group)
LEFT JOIN (
SELECT competency_id, review_status_id
FROM call_competency
JOIN users ON id = user_id
WHERE email = _studentemail
) call_c ON call_c.competency_id = comp.id
LEFT JOIN competency_review_status crs ON crs.id = call_c.review_status_id
GROUP BY comp.id, comp.shortname, comp.description
ORDER BY comp.id;

WHERE in Sql, combining two fast conditions multiplies costs many times

I have a fairly complex sql that returns 2158 rows' id from a table with ~14M rows. I'm using CTEs for simplification.
The WHERE consists of two conditions. If i comment out one of them, the other runs in ~2 second. If i leave them both (separated by OR) the query runs ~100 seconds. The first condition alone needs 1-2 seconds and returns 19 rows, the second condition alone needs 0 seconds and returns 2139 rows.
What can be the reason?
This is the complete SQL:
WITH fpcRepairs AS
(
SELECT FPC_Row = ROW_NUMBER()OVER(PARTITION BY t.SSN_Number ORDER BY t.Received_Date, t.Claim_Creation_Date, t.Repair_Completion_Date, t.Claim_Submitted_Date)
, idData, Repair_Completion_Date, Received_Date, Work_Order, SSN_number, fiMaxActionCode, idModel,ModelName
, SP=(SELECT TOP 1 Reused_Indicator FROM tabDataDetail td INNER JOIN tabSparePart sp ON td.fiSparePart=sp.idSparePart
WHERE td.fiData=t.idData
AND (td.Material_Quantity <> 0)
AND (sp.SparePartName = '1254-3751'))
FROM tabData AS t INNER JOIN
modModel AS m ON t.fiModel = m.idModel
WHERE (m.ModelName = 'LT26i')
AND EXISTS(
SELECT NULL
FROM tabDataDetail AS td
INNER JOIN tabSparePart AS sp ON td.fiSparePart = sp.idSparePart
WHERE (td.fiData = t.idData)
AND (td.Material_Quantity <> 0)
AND (sp.SparePartName = '1254-3751')
)
), needToChange AS
(
SELECT idData FROM tabData AS t INNER JOIN
modModel AS m ON t.fiModel = m.idModel
WHERE (m.ModelName = 'LT26i')
AND EXISTS(
SELECT NULL
FROM tabDataDetail AS td
INNER JOIN tabSparePart AS sp ON td.fiSparePart = sp.idSparePart
WHERE (td.fiData = t.idData)
AND (td.Material_Quantity <> 0)
AND (sp.SparePartName IN ('1257-2741','1257-2742','1248-2338','1254-7035','1248-2345','1254-7042'))
)
)
SELECT t.idData
FROM tabData AS t INNER JOIN modModel AS m ON t.fiModel = m.idModel
INNER JOIN needToChange ON t.idData = needToChange.idData -- needs to change FpcAssy
LEFT OUTER JOIN fpcRepairs rep ON t.idData = rep.idData
WHERE
rep.idData IS NOT NULL -- FpcAssy replaced, check if reused was claimed correctly
AND rep.FPC_Row > 1 -- other FpcAssy repair before
AND (
SELECT SP FROM fpcRepairs lastRep
WHERE lastRep.SSN_Number = rep.SSN_Number
AND lastRep.FPC_Row = rep.FPC_Row - 1
) = rep.SP -- same SP, must be rejected(reused+reused or new+new)
OR
rep.idData IS NOT NULL -- FpcAssy replaced, check if reused was claimed correctly
AND rep.FPC_Row = 1 -- no other FpcAssy repair before
AND rep.SP = 0 -- not reused, must be rejected
order by t.idData
Here's the execution plan:
Download: http://www.filedropper.com/exeplanfpc
Try to use UNION ALL of 2 queries separately instead of OR condition.
I've tried it many times and it really helped. I've read about this issue in Art Of SQL .
Read it, you can find many useful information about performance issues.
UPDATE:
Check related questions
UNION ALL vs OR condition in sql server query
http://www.sql-server-performance.com/2011/union-or-sql-server-queries/
Can UNION ALL be faster than JOINs or do my JOINs just suck?
Check Wes's answer
The usage of the OR is probably causing the query optimizer to no longer use an index in the second query.