Why grouping in a subquery causes problems

Why grouping in a subquery causes problems - sql

When I include the 2 commented out lines in the following subquery, seems that it takes forever until my Sybase 12.5 ASE server gets any results. Without these 2 lines the query runs ok. What is so wrong with that grouping?
select days_played.day_played, count(distinct days_played.user_id) as OLD_users
from days_played inner join days_received
on days_played.day_played = days_received.day_received
and days_played.user_id = days_received.user_id
where days_received.min_bulk_MT > days_played.min_MO
and days_played.user_id in
(select sgia.user_id
from days_played as sgia
where sgia.day_played < days_played.day_played
--group by sgia.user_id
--having sum(sgia.B_first_msg) = 0
)
group by days_played.day_played

Find out what the query does by using showplan to show the explanation.
In this case Can't you eliminate the subquery by making it part of the main query?

Could you try rewriting the query as follows?
select days_played.day_played,
count(distinct days_played.user_id) as OLD_users
from days_played
inner join days_received on days_played.day_played = days_received.day_received
and days_played.user_id = days_received.user_id
where days_received.min_bulk_MT > days_played.min_MO
and 0 = (select sum(sgia.B_first_msg)
from days_played as sgia
where sgia.user_id = days_played.user_id
and sgia.day_played < days_played.day_played
)
group by days_played.day_played
I guess this should give you better performance...

ok I found out what the problem was
I had to include user id in the subquery: "where days_played.user_id = sgia.user_id
and sgia.day_played < days_played.day_played"

Related

Optimizing SQL Query speed

I am trying to optimize my SQL query below as I am using a very old RDMS called firebird. I tried rearranging the items in my where clause and removing the order by statement but the query still seems to take forever to run. Unfortunately firebird doesn't support Explain Execution Plan Functionalities and therefore I cannot identify the code that is holding up the query.
select T.veh_reg_no,T.CON_NO, sum(T.pos_gpsunlock) as SUM_GPS_UNLOCK,
count(T.pos_gpsunlock) as SUM_REPORTS, contract.con_name
from
(
select veh_reg_no,CON_NO,
case when pos_gpsunlock = upper('T') then 1 else 0 end as pos_gpsunlock
from vehpos
where veh_reg_no in
( select regno
from fleetvehicle
where fleetno in (97)
) --DS5
and pos_timestamp > '2022-07-01'
and pos_timestamp < '2022-08-01'
) T
join contract on T.con_no = contract.con_no
group by T.veh_reg_no, T.con_no,contract.con_name
order by SUM_GPS_UNLOCK desc;
If anyone can help it would be greatly appreciated.

I'd either comment out some of the sub-queries or remove a join or aggregation and see if that improves it. Once you find the offending code maybe you can move it or re-write it. I know nothing of Firebird but I'd approach that query with the below code, wrapping the aggregation outside of the joins and removing the "Where in" clause.
If nothing works can you create an aggregation table or pre-filtered table and use that?
select
x.*
,sum(case when x.pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
,count(*) as SUM_REPORTS
FROM (
select
a.veh_reg_no
,a.pos_gpsunlock
,a.CON_NO
,c.con_name
FROM vehpos a
JOIN fleetvehicle b on a.veg_reg_no = b.reg_no and b.fleetno = 97 and b.pos_timestamp between '222-07-01' and '2022-08-01'
JOIN contract c on a.con_no = contract.con_no
) x
Group By....

This might help by converting subqueries to joins and reducing nesting. Also an = instead of IN() operation.
select vp.veh_reg_no,vp.con_no,c.con_name,
count(*) as SUM_REPORTS,
sum(case when pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
from vehpos vp
inner join fleetvehicle fv on fv.fleetno = 97 and fv.regno = vp.veh_reg_no
inner join contract c on vp.con_no = c.con_no
where vp.pos_timestamp >= '2022-07-01'
and vp.pos_timestamp < '2022-08-01'
group by vp.veh_reg_no, vp.con_no, c.con_name

Display only result > 2

I face a problem with the result on my script.
My formula for MARGIN is ((plnamt-(ibhexc/ibhand))/plnamt)*100.
I want to display only result > 2. How to do this? Please help.
This my script:
select
a.plnitm,a.plnstr,max(a.plncdt),max(a.plnndt),max(a.plnamt),max(a.plnevt),b.idept,c.ibhand,c.ibhexc,
decimal((c.ibhexc/c.ibhand),12,4) as AVG_COST,
decimal(((((max(a.plnamt)-(c.ibhexc/c.ibhand))/(max(a.plnamt))))*100),12,4) as MARGIN
from prcpln a
inner join invmst b on a.plnitm = b.inumbr
inner join invbal c on a.plnitm = c.inumbr and a.plnstr = c.istore and c.ibhand <> 0
where a.plnstr = ''14006''
group by a.plnitm,a.plnstr,b.idept,c.ibhand,c.ibhexc
order by a.plnitm

Simplistically, for situations like this, you can take your query and put it inside a cte:
WITH q AS (
select
a.plnitm,a.plnstr,max(a.plncdt),max(a.plnndt),max(a.plnamt),max(a.plnevt),b.idept,c.ibhand,c.ibhexc,
decimal((c.ibhexc/c.ibhand),12,4) as AVG_COST,
decimal(((((max(a.plnamt)-(c.ibhexc/c.ibhand))/(max(a.plnamt))))*100),12,4) as MARGIN
from prcpln a
inner join invmst b on a.plnitm = b.inumbr
inner join invbal c on a.plnitm = c.inumbr and a.plnstr = c.istore and c.ibhand <> 0
where a.plnstr = ''14006''
group by a.plnitm,a.plnstr,b.idept,c.ibhand,c.ibhexc
)
SELECT * FROM q WHERE margin > 2
ORDER BY q.plnitm
This is similar to the advice to use HAVING - a HAVING is a "where clause that is done after a GROUP BY"
A cte is a way of taking some calculated block of data and giving it an alias that can be used just like a table. I wanted to answer this way to point out to you that queries don't have to be formed purely from tables; tables are just blocks of data (with a name), and the output from a select is also "just a block of data" that can be given a name (by use of a cte or subquery) and then used just like a table is

Trying to figure out why this SQL request take 47min to execute

I'm trying to make a SQL request but that request is taking forever to finish. The request is done in Excel 2003 with VBA.
Size of the TABLE:
TABLE1 = 12600 Row
TABLE2 = 361K Row
Here's the query:
SELECT DISTINCT
y.code AS CODE,
y.name AS LIBELLE,
#[...]
#[...]
#[...]
#[...]
y.IS_BILAN,
y.INACTIVE,
(SELECT COUNT(1)
FROM TABLE1 d, TABLE2 a
WHERE a.record_date_time >= '2018/01/01'
AND a.record_date_time < '2019/01/01'
AND global_status <> 'C'
AND a.id = d.id
AND d.type_id = y.code) AS TOTAL_2018
FROM
anal_exam y
ORDER BY
code
The whole query run instantly when removing the last part "SELECT COUNT(1)"
The execution plan I see in Oracle SQL Developer:
How could I speed up this query? It takes 47 minutes to finish

Try defining your JOIN like this:
SELECT DISTINCT
y.code AS CODE,
y.name AS LIBELLE,
y.IS_BILAN,
y.INACTIVE,
COUNT(*) AS TOTAL_2018
FROM anal_exam y
JOIN TABLE1 d
ON d.type_id = y.code
JOIN TABLE2 a
ON d.ID = a.ID
WHERE a.record_date_time BETWEEN '2018/01/01' AND '2019/01/01'
AND global_status <> 'C'
order by code

I added a GROUP BY y.code, y.name, y.IS_BILAN, y.inactive at the end and it work's
runtime is 47 sec.
It's quite fast but i'm wondering if there's a way to get the line with count = 0 because 3k line are omitted in this query

With the code from T McKeown i'm getting this result :
CODE1|LIBELLE1|T|T|1530
CODE3|LIBELLE2|T|T|20
CODE5|LIBELLE3|T|T|143
The result i'm seeking include the line with count()=0
CODE1|LIBELLE1|T|T|1530
CODE2|LIBELLE2|T|F|0
CODE3|LIBELLE2|T|F|20
CODE4|LIBELLE4|T|T|0
CODE5|LIBELLE3|F|T|143
How can i achieve this ?

Teradata using Top 10 and Distinct

I am getting an error saying that I cannot use Top 10 with distinct I am wondering if there is any way I can get my query to work on that fashion. this is what the error is saying .
[Teradata Database] [6916] TOP N Syntax error: Top N option is not supported with DISTINCT option.
Query Below:
Thank you.
Select Distinct TOP 10 t1.Adjustment_ID, t1.OfficeNum, t1.InvoiceNum, t1.PatientNum,
t1.CurrentStatus, t1.AdjustmentTotal, t1.SubmittedOn, t1.UserSubmitted,
t1.Invoice_Type, t1.Pat_First_Name, t1.Pat_Last_Name,t2.Reason_Code FROM App_UnityAdj_AdjInfo_Tbl t1
Left Join RCM_WORK_PRD.App_UnityAdj_AdjRecord_Tbl t2
On t1.Adjustment_ID = t2.Adjustment_ID
Where t1.UserSubmitted = 'Name' AND (t1.CurrentStatus = 'Pending' OR t1.CurrentStatus = 'Deny')

What are you trying to do? You have columns in the SELECT that are not in the GROUP BY. You also have TOP without ORDER BY, which is suspicious.
One simple method is to move all the SELECT columns to the GROUP BY:
select TOP 10 t1.Adjustment_ID, t1.OfficeNum, t1.InvoiceNum, t1.PatientNum,
t1.CurrentStatus, t1.AdjustmentTotal, t1.SubmittedOn, t1.UserSubmitted,
t1.Invoice_Type, t1.Pat_First_Name, t1.Pat_Last_Name, t2.Reason_Code
from App_UnityAdj_AdjInfo_Tbl t1 Left Join
RCM_WORK_PRD.App_UnityAdj_AdjRecord_Tbl t2
On t1.Adjustment_ID = t2.Adjustment_ID
where t1.UserSubmitted = 'Name' AND
t1.CurrentStatus in ('Pending', 'Deny')
group by t1.Adjustment_ID, t1.OfficeNum, t1.InvoiceNum, t1.PatientNum,
t1.CurrentStatus, t1.AdjustmentTotal, t1.SubmittedOn, t1.UserSubmitted,
t1.Invoice_Type, t1.Pat_First_Name, t1.Pat_Last_Name,t2.Reason_Code;

I think I figured it out, in case anyone wants to know it is sample 10
Select Distinct t1.Adjustment_ID, t1.OfficeNum, t1.InvoiceNum, t1.PatientNum,
t1.CurrentStatus, t1.AdjustmentTotal, t1.SubmittedOn, t1.UserSubmitted,
t1.Invoice_Type, t1.Pat_First_Name, t1.Pat_Last_Name,t2.Reason_Code FROM App_UnityAdj_AdjInfo_Tbl t1
Left Join RCM_WORK_PRD.App_UnityAdj_AdjRecord_Tbl t2
On t1.Adjustment_ID = t2.Adjustment_ID
Where t1.AssignedTo IS null AND (t1.CurrentStatus = 'Pending')
sample 10

How can I refactor this query

How can I rewrite this SQL code? I would like to avoid the repetitive execution of the case for every record.
SELECT chcr.chsid,
CASE
WHEN EXISTS
(SELECT 1
FROM hdr_run chre,
clmerr ce
WHERE chre.chsid = chcr.chsid
AND chre.run_nmbr < chcr.last_run_nmbr
AND chre.clm_error_sid = ce.clm_error_sid
GROUP BY chre.chsid
HAVING COUNT(chre.clm_error_sid) > 0
)
THEN 'Appended'
ELSE 'Never Appended'
END Run_Detail
FROM
clm_res chcr,
clm_der chde
WHERE chde.chsid = chcr.chsid

Have a look at that query please:
SELECT
chcr.chsid,
CASE
WHEN ce.clm_error_sid IS NOT NULL AND COUNT(chre.clm_error_sid, 0) > 0
THEN 'Appended'
ELSE 'Never Appended'
END Run_Detail
FROM
clm_res chcr
JOIN clm_der chde ON chde.chsid = chcr.chsid
LEFT JOIN hdr_run chre ON chre.chsid = chcr.chsid AND chre.run_nmbr < chcr.last_run_nmbr
LEFT JOIN clmerr ce ON chre.clm_error_sid = ce.clm_error_sid
GROUP BY chcr.chsid
If there's more than one clm_der for each clm_res based on their chsid, we'd have to double check that the COUNT is not counting the extras from clm_der.

I have no data to test this on so all I can go on is the SQL you've provided but from a brief look it does not appear that the GROUP BY and HAVING COUNT() > 0 statements are required as the combination of the INNER JOIN condition in the sub-query and the use of EXISTS in the outer query does the same thing.
Does this have exactly the same functionality:
SELECT chcr.chsid,
CASE
WHEN EXISTS
( SELECT 1
FROM hdr_run chre
INNER JOIN
clmerr ce
ON (chre.clm_error_sid = ce.clm_error_sid)
WHERE chre.chsid = chcr.chsid
AND chre.run_nmbr < chcr.last_run_nmbr
)
THEN 'Appended'
ELSE 'Never Appended'
END Run_Detail
FROM clm_res chcr
INNER JOIN
clm_der chde
ON ( chde.chsid = chcr.chsid );
SQLFIDDLE
Also, repeated execution of the CASE statement is not a bad thing as the optimizer should not duplicate executions of the sub-query.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why grouping in a subquery causes problems - sql

Find out what the query does by using showplan to show the explanation. In this case Can't you eliminate the subquery by making it part of the main query?

ok I found out what the problem was I had to include user id in the subquery: "where days_played.user_id = sgia.user_id and sgia.day_played < days_played.day_played"

Related

Optimizing SQL Query speed

Display only result > 2

Trying to figure out why this SQL request take 47min to execute

Teradata using Top 10 and Distinct

How can I refactor this query

Categories

Resources