simplifying query that has multiple WITH and multiple subqueries

simplifying query that has multiple WITH and multiple subqueries - sql

It is bothering me that for a simple query, I have to write out so many sub-selects and WITH statements.
The question is: are there basic guidelines on how to simplify queries that have subqueries?
Here's my query:
WITH cte_min
AS (SELECT a.client_id,
a.specimen_source,
a.received_date
FROM f_accession_daily a
JOIN (SELECT DISTINCT f.client_id,
f.received_date,
f.accession_daily_key
FROM F_ACCESSION_DAILY f
JOIN (SELECT CLIENT_ID,
Min(received_date) MinRecDate
FROM F_ACCESSION_DAILY
GROUP BY CLIENT_ID) i
ON f.CLIENT_ID = i.CLIENT_ID
AND f.RECEIVED_DATE = i.MinRecDate) b
ON a.ACCESSION_DAILY_KEY = b.ACCESSION_DAILY_KEY),
cte_max
AS (SELECT a.client_id,
a.specimen_source,
a.received_date
FROM f_accession_daily a
JOIN (SELECT DISTINCT f.client_id,
f.received_date,
f.accession_daily_key
FROM F_ACCESSION_DAILY f
JOIN (SELECT CLIENT_ID,
Max(received_date) MaxRecDate
FROM F_ACCESSION_DAILY
GROUP BY CLIENT_ID) i
ON f.CLIENT_ID = i.CLIENT_ID
AND f.RECEIVED_DATE = i.MaxRecDate) b
ON a.ACCESSION_DAILY_KEY = b.ACCESSION_DAILY_KEY),
cte_est
AS (SELECT DISTINCT client_id,
MLIS_DATE_ESTABLISHED
FROM D_CLIENT
WHERE REC_ACTIVE_FLG = 1
AND MLIS_DATE_ESTABLISHED IS NOT NULL)
SELECT DISTINCT f.client_id,
cmin.specimen_source,
cmin.received_date,
cmax.specimen_source,
cmax.received_date,
cest.MLIS_DATE_ESTABLISHED
FROM F_ACCESSION_DAILY f
LEFT JOIN cte_max cmax
ON cmax.CLIENT_ID = f.CLIENT_ID
LEFT JOIN cte_min cmin
ON cmin.CLIENT_ID = f.CLIENT_ID
LEFT JOIN cte_est cest
ON cest.CLIENT_ID = f.CLIENT_ID
I am not asking necessarily for you to do the simplification yourself (although I would be very grateful for this), rather I am asking for general guidelines/directions on re-writing this query to be more elegant.

Does this look any better?
;WITH minmax AS (
SELECT client_id, specimen_source, received_date,
RMin = row_number() over (partition by Client_id
order by received_date, accession_daily_key),
RMax = row_number() over (partition by Client_id
order by received_date desc, accession_daily_key desc)
FROM F_ACCESSION_DAILY
)
SELECT f.client_id,
max(case when rmin=1 then f.specimen_source end),
max(case when rmin=1 then f.received_date end),
max(case when rmax=1 then f.specimen_source end),
max(case when rmax=1 then f.received_date end),
D.MLIS_DATE_ESTABLISHED
FROM minmax f
LEFT JOIN D_CLIENT D ON D.REC_ACTIVE_FLG = 1 AND D.MLIS_DATE_ESTABLISHED IS NOT NULL
WHERE 1 in (f.rmin, f.rmax)
GROUP BY f.client_id, D.MLIS_DATE_ESTABLISHED

50 rows reporting 5 values and in all of that only two tables are referenced.
In the first CTE you have 4 joins (or virtual joins) to the same table and no other table involved reporting 3 columns. Don't know the key so cannot conclude it can be reduced.
If a cte is not reference more than once then it does not result in less lines of code.
For one this cte can be replaced with less code.
cte_est
AS (SELECT DISTINCT client_id,
MLIS_DATE_ESTABLISHED
FROM D_CLIENT
WHERE REC_ACTIVE_FLG = 1
AND MLIS_DATE_ESTABLISHED IS NOT NULL)
...
cest.MLIS_DATE_ESTABLISHED
...
LEFT JOIN cte_est cest
ON cest.CLIENT_ID = f.CLIENT_ID
reduces to
D_CLIENT.MLIS_DATE_ESTABLISHED
...
LEFT JOIN D_CLIENT
ON D_CLIENT.CLIENT_ID = f.CLIENT_ID
AND D_CLIENT.REC_ACTIVE_FLG = 1
AND D_CLIENT.MLIS_DATE_ESTABLISHED IS NOT NULL

While I am not sure if everyone would consider this simpler and/or easier to read, this is how I would do it:
WITH
cte_MaxMinRecvd As
(
SELECT CLIENT_ID,
Min(received_date) MinRecDate,
Max(received_date) MaxRecDate
FROM F_ACCESSION_DAILY
GROUP BY CLIENT_ID
)
, cte_MaxMinDaily As
(
SELECT *
FROM F_ACCESSION_DAILY f
JOIN cte_MaxMinRecvd i ON f.CLIENT_ID = i.CLIENT_ID
)
, cte_min AS
(
SELECT a.client_id,
a.specimen_source,
a.received_date
FROM F_ACCESSION_DAILY a
WHERE EXISTS(
SELECT *
FROM cte_MaxMinDaily f
WHERE f.RECEIVED_DATE = f.MinRecDate
AND a.ACCESSION_DAILY_KEY = f.ACCESSION_DAILY_KEY
)
)
, cte_max AS
(
SELECT a.client_id,
a.specimen_source,
a.received_date
FROM f_accession_daily a
WHERE EXISTS(
SELECT *
FROM cte_MaxMinDaily f
WHERE f.RECEIVED_DATE = f.MinRecDate
AND a.ACCESSION_DAILY_KEY = f.ACCESSION_DAILY_KEY
)
)
SELECT DISTINCT
f.client_id,
cmin.specimen_source,
cmin.received_date,
cmax.specimen_source,
cmax.received_date,
cest.MLIS_DATE_ESTABLISHED
FROM F_ACCESSION_DAILY f
LEFT JOIN cte_max cmax ON cmax.CLIENT_ID = f.CLIENT_ID
LEFT JOIN cte_min cmin ON cmin.CLIENT_ID = f.CLIENT_ID
LEFT JOIN D_CLIENT cest ON cest.CLIENT_ID = f.CLIENT_ID
AND cest.REC_ACTIVE_FLG = 1
AND cest.MLIS_DATE_ESTABLISHED IS NOT NULL
Mainly what I did was to
Turn most of the subqueries into CTEs, where applicable,
Merge the Min and Max subqueries together, and
Change the DISTINCT subqueries into EXISTS subqueries, which can be simpler (and usually perform better)
Ooops, I also got rid of the cte_est CTE as Blam suggested..

Related

Performing a JOIN with the results of a WITH clause

I have one query that uses one join already, call it query1. I would like to join query1 to the results of a WITH clause. I don't know how to merge the two data sets.
Query 1:
SELECT
p.NetObjectID
, n.Caption, n.ObjectSubType
FROM Pollers p
LEFT JOIN NodesData n ON n.NodeID = p.NetObjectID
The next query is a WITH clause. I don't know how to obtain the results without a WITH because I need to do what is outlined in this post.
Query 2:
WITH ranked_DateStamp AS
(
SELECT c.NodeID, c.DateTime
, ROW_NUMBER() OVER (PARTITION BY NodeID ORDER BY DateTime DESC) AS rn
FROM CPULoad AS c
)
SELECT *
FROM ranked_DateStamp
WHERE rn = 1;
I thought I could just JOIN ranked_DateStamp ON ranked_DateStamp.NodeID = p.NodeID but it won't allow it.

You don't really need a with clause here, you can use a subquery. I would just phrase this as:
SELECT
p.NetObjectID,
n.Caption,
n.ObjectSubType,
c.DateTime
FROM Pollers p
LEFT JOIN NodesData n ON n.NodeID = p.NetObjectID
LEFT JOIN (
SELECT
NodeID,
DateTime,
ROW_NUMBER() OVER (PARTITION BY NodeID ORDER BY DateTime DESC) AS rn
FROM CPULoad AS c
) c ON c.rn = 1 and c.NodeID = p.NetObjectID
However, if you were to use a common table expression, that would look like:
WITH ranked_DateStamp AS (
SELECT
NodeID,
DateTime,
ROW_NUMBER() OVER (PARTITION BY NodeID ORDER BY DateTime DESC) AS rn
FROM CPULoad AS c
)
SELECT
p.NetObjectID,
n.Caption,
n.ObjectSubType,
c.DateTime
FROM Pollers p
LEFT JOIN NodesData n ON n.NodeID = p.NetObjectID
LEFT JOIN ranked_DateStamp c ON c.rn = 1 and c.NodeID = p.NetObjectID
Actually, a lateral join might perform equally well, or better:
SELECT
p.NetObjectID,
n.Caption,
n.ObjectSubType,
c.DateTime
FROM Pollers p
LEFT JOIN NodesData n ON n.NodeID = p.NetObjectID
OUTER APPLY (SELECT TOP (1) * FROM CPULOad c WHERE c.NodeID = p.NetObjectID ORDER BY DateTime DESC) c

Actually, I would suggest a different approach -- a lateral join:
SELECT p.NetObjectID, n.Caption, n.ObjectSubType,c.DateTime
FROM Pollers p LEFT JOIN
NodesData n
ON n.NodeID = p.NetObjectID OUTER APPLY
(SELECT TOP (1) NodeID, DateTime,
ROW_NUMBER() OVER (PARTITION BY NodeID ORDER BY DateTime DESC) AS rn
FROM CPULoad c
WHERE c.NodeId = p.NetObjectID
ORDER BY DateTime DESC
) c;
Lateral joins are very powerful and worth learning about. They are also often a bit faster than the ROW_NUMBER() approach.

Use of Analytical Funtions and Keep clause

I have a lenghtly query that can be shorten with the correct functionally (I beleive). Can we use existing functions such as Max Min Keep to make this query more efficient? My entire query is posted below.
For example: Can we remove the CTEs and use analytical functions such as max and min This would also elimate ranks and several joins
SQL:
WITH LAST_VALUE_BEFORE_START_DT AS (
SELECT DISTINCT * FROM(
SELECT
P.CL_ID,
HISTORYID,
H.MENT_DT,
H.ROLE AS MAX_ROLE,
H.PM_ID AS MAX_P_ID,
DENSE_RANK() OVER (PARTITION BY P.CL_ID ORDER BY H.MENT_DT DESC )AS RNK
FROM MANAGER_HISTORY H
INNER JOIN CP CCP ON H.CLIID = CCP.CLIID
INNER JOIN PROGRAM CP ON PROGRAMID = CP.PROGRAMID
WHERE 1=1
AND CP.TYPEID IN (13,200,11001)
AND H.ROLE = 'RED'
AND H.MENT_DT < START_DT
--AND P.CL_ID = 920917
)LAST_VALUE_BEFORE_START_DT_RNK
WHERE 1=1
AND RNK =1
)
,MIN_VALUE_BETWEEN_PROGRAM AS (
SELECT * FROM(
SELECT DISTINCT
P.CL_ID,
HISTORYID,
TRUNC(H.MENT_DT) AS MENT_DT,
H.ROLE AS MIN_ROLE,
H.PM_ID AS MIN_PM_ID,
DENSE_RANK() OVER (PARTITION BY P.CL_ID ORDER BY H.MENT_DT)AS RNK
FROM MANAGER_HISTORY H
INNER JOIN CP CCP ON H.CLIID = CCP.CLIID
INNER JOIN PROGRAM CP ON PROGRAMID = CP.PROGRAMID
WHERE 1=1
AND CP.TYPEID IN (13,200,11001)
AND H.ROLE = 'RED'
AND H.PM_ID IS NOT NULL
AND TRUNC(H.MENT_DT) BETWEEN TRUNC(START_DT) AND NVL(END_DT,SYSDATE)
--AND P.CL_ID = 920917
) MIN_VALUE_BETWEEN_PROGRAM_RNK
WHERE 1=1
AND RNK =1
)
SELECT * FROM (
SELECT
X.*,
DENSE_RANK() OVER (PARTITION BY CL_ID ORDER BY FIRST_ASSGN_DT,MENT_DT ) AS RNK
FROM(
SELECT DISTINCT
C.CL_ID,
P.CL_ID,
CP.PROGRAM,
START_DT,
END_DT,
H.ROLE,
H.MENT_DT,
H.PM_ID,
LVBS.MAX_ROLE,
LVBS.MAX_P_ID,
MVBP.MIN_ROLE,
MVBP.MIN_PM_ID
,CASE
WHEN H.MENT_DT < START_DT AND LVBS.MAX_ROLE = 'RED' AND LVBS.MAX_P_ID IS NOT NULL THEN TRUNC(START_DT)
WHEN H.MENT_DT BETWEEN START_DT AND NVL(END_DT,SYSDATE) AND H.ROLE = 'RED' AND H.PM_ID IS NOT NULL
THEN MVBP.MENT_DT
ELSE NULL --TESTING PURPOSES
END FIRST_ASSGN_DT
FROM MANAGER_HISTORY H
INNER JOIN CP CCP ON H.CLIID = CCP.CLIID
INNER JOIN CLIENT C ON CCP.CLIID = C.CLIID
INNER JOIN PROGRAM CP ON PROGRAMID = CP.PROGRAMID
LEFT JOIN LAST_VALUE_BEFORE_START_DT LVBS ON P.CL_ID = LVBS.CL_ID
LEFT JOIN MIN_VALUE_BETWEEN_PROGRAM MVBP ON P.CL_ID = MVBP.CL_ID
WHERE 1=1
AND CP.TYPEID IN (13,200,11001)
)X)Z
WHERE 1=1
AND Z.RNK = 1

Oracle SQL Correlated subquery - Returning count(*) in some columns

I have my initial statement which is :
SELECT TEAM.ID PKEY_SRC_OBJECT,
TEAM.MODF_DAT UPDATE_DATE,
TEAM.MODF_USR UPDATED_BY,
PERSO.FIRST_NAM FISRT_NAME
FROM TEAM
LEFT OUTER JOIN PERSO ON (TEAM.ID=PERSO.TEAM_ID)
I want to calculate some "flags" and return them in my initial statement.
There are 3 flags which can be calculated like this :
1) Flag ISMASTER:
SELECT Count(*)
FROM TEAM_TEAM_REL A, TEAM B
WHERE B.PARTY_PTY_ID = A.RLTD_TEAM_ID
AND CODE = 'Double';
2) Flag ISAGENT:
SELECT Count(*)
FROM TEAM_ROL_REL A, TEAM B
WHERE B.PARTY_PTY_ID = A.TEAM_ID;
3) Flag NUMPACTS:
SELECT Count(*)
FROM TEAM_ROL_REL A,
TEAM_ROL_POL_REL B,
PERSO_POL_STA_REL C,
TEAM D
WHERE A.ROL_CD IN ('1','2')
AND A.T_ROL_REL_ID = B.P_ROL_REL_ID
AND B.P_POL_ID = C.P_POL_ID
AND C.STA_CD = 'A'
AND D.PARTY_PTY_ID = A.TEAM_ID;
To try to achieve this, I've updated my initial statement like this :
WITH ABC AS (
SELECT TEAM.ID PKEY_SRC_OBJECT,
TEAM.MODF_DAT UPDATE_DATE,
TEAM.MODF_USR UPDATED_BY,
PERSO.FIRST_NAM FISRT_NAME
FROM TEAM
LEFT OUTER JOIN PERSO ON (TEAM.ID=PERSO.TEAM_ID)
)
SELECT ABC.*, MAST.ISMASTER, AGENT.ISAGENT, PACTS.NUMPACTS FROM ABC
LEFT OUTER JOIN (
select
RLTD_TEAM_ID,
Count(RLTD_TEAM_ID) OVER (PARTITION BY RLTD_TEAM_ID) as ISMASTER
FROM TEAM_TEAM_REL
WHERE CODE = 'Double'
) MAST
ON ABC.PKEY_SRC_OBJECT = MAST.RLTD_TEAM_ID
LEFT OUTER JOIN (
select
TEAM_ID,
Count(TEAM_ID) OVER (PARTITION BY TEAM_ID) as ISAGENT
FROM TEAM_ROL_REL
) AGENT
ON ABC.PKEY_SRC_OBJECT = AGENT.TEAM_ID
LEFT OUTER JOIN (
select
TEAM_ID,
Count(TEAM_ID) OVER (PARTITION BY TEAM_ID) as NUMPACTS
FROM TEAM_ROL_REL, TEAM_ROL_POL_REL, PERSO_POL_STA_REL
WHERE TEAM_ROL_REL.ROL_CD IN ('1','2')
AND TEAM_ROL_REL.T_ROL_REL_ID = TEAM_ROL_POL_REL.P_ROL_REL_ID
AND TEAM_ROL_POL_REL.P_POL_ID = PERSO_POL_STA_REL.P_POL_ID
AND PERSO_POL_STA_REL.STA_CD = 'A'
) PACTS
ON ABC.PKEY_SRC_OBJECT = PACTS.TEAM_ID;
For the two first flags (ISMASTER and ISAGENT) I get the result in less than 1min, but for the last flag (NUMPACTS) it runs few minutes without provide any result.
I think my statement is too heavy, maybe I should do it in a totally different way.

I think you have perhaps over complicated things.
You could do this (assuming I have understood your requirements correctly) like so:
WITH ttr AS (SELECT rltd_team_id,
COUNT(*) is_master
FROM team_team_rel
AND CODE = 'Double'
GROUP BY rltd_team_id),
trr AS (SELECT team_id,
COUNT(*) is_agent
FROM team_rol_rel
GROUP BY team_id)
pacts AS (SELECT trr1.team_id,
COUNT(*) num_pacts
FROM team_rol_rel trr1
INNER JOIN team_rol_pol_rel trpr ON (trr1.t_rol_rel_id = trpr.p_rol_rel_id)
INNER JOIN perso_pol_sta_rel ppsr ON (trpr.p_pol_id = ppsr.p_pol_id
WHERE trr1.rol_cd IN ('1', '2')
AND ppsr.st_cd = 'A'
GROUP BY trr1.team_id)
SELECT t.id pkey_src_object,
t.modf_dat update_date,
t.modf_usr updated_by,
p.first_nam first_name,
ttr.is_master,
trr.is_agent,
pacts.num_pacts
FROM team t
LEFT OUTER JOIN perso p ON t.id = p.team_id
LEFT OUTER JOIN ttr ON t.party_pty_id = ttr.rltd_team_id
LEFT OUTER JOIN trr ON t.party_pty_id = trr.team_id
LEFT OUTER JOIN pacts ON t.pkey_src_object = pacts.team_id;
N.B. untested, since you didn't provide any test data.

how to join a table on a subquery that uses order by and limit

For each row from table tClass matching a given where clause,
join on the first row in tEv, sorted by time, where tEv.class_id = tClass.class_id
The following code throws the error
ORA-01799: a column may not be outer-joined to a subquery
select
c.class_id,
c.class_name,
e.start_time,
e.ev_id
from
tClass c
left join tEv e on (
e.ev_id = (
select
ss1.ev_id
from (
select
ed.ev_id
from
tEvDisp ed,
tEv e
where
ed.class_id = c.class_id
and ed.viewable = 'Y'
and ed.display_until > localtimestamp
and e.ev_id = ed.ev_id
order by
e.start_time
) ss1
where
rownum = 1
)
)
where
c.is_matching = 'Y';
How can this be rewritten to do what is described?
The above is for oracle, but needs to work in sqlite (substituting where necessary)

No idea about SQLite - that would need to be a separate question if this doesn't work - but for Oracle you could do something like this:
select c.class_id,
c.class_name,
e.start_time,
e.ev_id
from tClass c
left join (
select class_id, ev_id, start_time
from (
select ed.class_id,
ed.ev_id,
e.start_time,
row_number() over (partition by ed.class_id order by e.start_time) as rn
from tEvDisp ed
join tEv e on e.ev_id = ed.ev_id
where ed.viewable = 'Y'
and ed.display_until > localtimestamp
)
where rn = 1
) e on e.class_id = c.class_id
where c.is_matching = 'Y';
This uses a subquery which finds the most tEv data, using an analytic row_number() to identify the latest data for each class_id, which is restricted by the rn = 1 filter.
That subquery, consisting of at most one row per class_id, is then used the left outer join against tClass.

This sort of construct should get you what you need. You can fix the details.
select c.classid
, c.classname
, temp.maxstarttime
from tClass c left join (
select c.classid id
max(e.start_time) maxstarttime
from tClass join tEv on tEv.classId = tClass.ClassId
where whatever
group by c.classid) temp on c.classid = temp.id

Using nested queries vs CTEs and performance impact

I've been using sql for a while, and would like to know whether and how it would make sense to convert a script containing a lot of CTE's into a regular nested script. I'm using:
WITH cte_account_pricelevelid
AS (SELECT a.accountid,
a.pricelevelid
FROM companypricelist a
JOIN(SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) b
ON a.accountid = b.accountid),
totals
AS (SELECT a.accountid,
a.pricelevelid
FROM companypricelist a
JOIN(SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) b
ON a.accountid = b.accountid),
totalsgrouped
AS (SELECT pricelevelid,
Count(*) counts
FROM totals
GROUP BY pricelevelid),
final
AS (SELECT cte.accountid,
cte.pricelevelid,
frequency.counts
FROM cte_account_pricelevelid cte
CROSS JOIN totalsgrouped frequency
WHERE cte.pricelevelid = frequency.pricelevelid),
mycolumns
AS (SELECT b.accountid,
b.pricelevelid,
b.counts
FROM (SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) a
JOIN final b
ON a.accountid = b.accountid),
e
AS (SELECT *,
Row_number()
OVER (
partition BY accountid
ORDER BY counts, pricelevelid ) AS Recency
FROM mycolumns),
cte_result
AS (SELECT accountid,
pricelevelid
FROM e
WHERE recency = 1)
SELECT a.accountid,
a.defaultpricelevelid,
b.pricelevelid
FROM crm_accountbase a
JOIN cte_result b
ON a.accountid = b.accountid
I feel that it is silly to keep running the same query within my CTE's:
SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL
But I don't know how to get around it. I suppose that I can convert this into, but don't know if there would be any performance gain.
select * from (select * from(select * from(...
Is there an opportunity to tremendously improve performance by converting this into a nested SQL or just simplifying the CTE? If so, would you kindly get me started?

If accountid is indexed this join will not use that index
The derived table has not index
SELECT a.accountid, a.pricelevelid
FROM companypricelist a
JOIN (SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) b
ON a.accountid = b.accountid
This will use an index on accountid
Even if there is not an index on accountid it will be faster
SELECT a.accountid, a.pricelevelid
FROM companypricelist a
JOIN crm_accountbase b
ON a.accountid = b.accountid
AND b.defaultpricelevelid IS NULL

I tried to simplify it a bit. Mind running it and letting me know the results?
WITH
totals
AS (SELECT a.accountid,
a.pricelevelid
FROM companypricelist a
JOIN(SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) b
ON a.accountid = b.accountid),
totalsgrouped
AS (SELECT pricelevelid,
Count(*) counts
FROM totals
GROUP BY pricelevelid),
final
AS (SELECT cte.accountid,
cte.pricelevelid,
frequency.counts
FROM totals cte
CROSS JOIN totalsgrouped frequency
WHERE cte.pricelevelid = frequency.pricelevelid),
mycolumns
AS (SELECT b.accountid,
b.pricelevelid,
b.counts
FROM final b
JOIN totals a
ON a.accountid = b.accountid),
e
AS (SELECT *,
Row_number()
OVER (
partition BY accountid
ORDER BY counts, pricelevelid ) AS Recency
FROM mycolumns)
SELECT a.accountid,
a.defaultpricelevelid,
b.pricelevelid
FROM crm_accountbase a
JOIN e b
ON a.accountid = b.accountid
WHERE b.recency = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

simplifying query that has multiple WITH and multiple subqueries - sql

Related

Performing a JOIN with the results of a WITH clause

Use of Analytical Funtions and Keep clause

Oracle SQL Correlated subquery - Returning count(*) in some columns

how to join a table on a subquery that uses order by and limit

Using nested queries vs CTEs and performance impact

Categories

Resources