Getting Count() in CTE results in a table scan - sql

I have the following query:
;WITH CTE
AS (
SELECT ROW_NUMBER() OVER (
ORDER BY RTT.DAYSTOWAIT DESC
) AS ROW
,COUNT(*) OVER () AS ROWCNT
--...
-- ADDITIONAL COLUMNS
--...
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.OID = T2.OID
INNER JOIN ORG ON T1.ORGOID = ORG.OID
INNER JOIN EPI E ON E.OID = T1.EOID
AND E.PKEY = T1.PKEY
AND E.STATUS = 'A'
INNER JOIN PATHWAY ON OID = EPI.PATHWAYOID
AND PATHWAY.PARTKEY = T1.PARTKEY
AND PATPATHWAY.STATUS = 'A'
)
SELECT CTE.ROW
,CTE.ROWCNT
FROM CTE
When Selecting Cte with Cte.RowCount it leads to a Table scan,
but when I remove the "RowCount" column from select, it works fine (seeks).
Is there a more efficient way to get the count()?

Related

How to retrieve count rows from a table that is filtered using QUALIFY?

To get the number of rows from a table, I can use SELECT COUNT( row-name ) for the joined table.
But this doesn't work if I filter it using QUALIFY ROW_NUMBER() OVER ( PARTITION BY rowx, rowy) = 1
Is there a way to get the total number of rows for a QUALIFY filtered table?
Here is a full example of the query
query = """
SELECT
COUNT(*)
FROM table1
JOIN table2 ON
table1.column1 = table2.column2
JOIN table2 ON
table1.column4 = table3.column5
QUALIFY ROW_NUMBER() OVER
(
PARTITION BY
table3.column6,
table3.column7
) = 1
"""
I also tried
query = """
SELECT
COUNT(*)
FROM (
table1
JOIN table2 ON
table1.column1 = table2.column2
JOIN table2 ON
table1.column4 = table3.column5
QUALIFY ROW_NUMBER() OVER
(
PARTITION BY
table3.column6,
table3.column7
) = 1
)
"""
But it didn't work
Most likely QUALIFY is happening after the COUNT(*) expression is being evaluated. To remedy this, you may take the count of a subquery:
SELECT COUNT(*)
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY t3.column6, t3.column7) rn
FROM table1 t1
INNER JOIN table2 t2 ON t1.column1 = t2.column2
INNER JOIN table3 t3 ON t1.column4 = t3.column5
) t
WHERE rn = 1;

column ambigously defined

with t1 as (
SELECT *
from claim fc
inner join drug_product d
on d.drug_id = d.drug_id
AND d.id = d.id
inner join pharmacy pha
on fc.pharmacy_id = pha.pharmacy_id
and fcnum = pha.num
),
t2 as (
Select d_memberid,
fill_dt,
num,
d_drug_id,
count(distinct device_type) as device_count,
count(device_type),
count(distinct claim_ID)as claim_count
from t1
group by
d_member_id,
fill_dt,
num
)
Select t1.*,
t2.device_count,
d.*
from t1
inner join t2
on t1.num = t2.num
and t1.fill_dt = t2.fill_dt
and t1.d_member_id = t2.d_member_id
inner join drug_product d
on t1.d_drug_id = d.d_drug_id
order by claim_count desc
column ambiguouly defined. Im trying to find if there dup drug fill on the same day. line 54 column 32
column ambigously defined. I wonder if my joins are incorrect. for t1 i join 3 different table for t2 is from the first table. outcome should be a join of the t1 and t2
d_member_hq_id is not prefixed by a table alias, and could be causing the problem if the column name exists in more than 1 table in the from clause. There are other columns which are also not qualified, it is a good practice to qualify all columns to avoid this error.

Need assistance in rewriting this query

We have this query in production which runs daily
It does a lot of joins and also uses window function in hive
We tried to add few set options but that did not help much
Structure is something like this -
SELECT
C.f1, C.f2, A.f2 ...
FROM (
SELECT * FROM (
SELECT T1.*, B.atid, B.a_id,
ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
FROM T1 AS T1
JOIN T5 ON T1.t_dt = T5.t_dt
JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
LEFT OUTER JOIN (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
WHERE T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) T
WHERE T.rank_ = 1
) A
JOIN (SELECT *, row_number() over (partition by ac_id order by b_ts desc) rank_
FROM T4
WHERE event not in ('CT','UPD')
) AS C
ON A.a_id = C.a_id
AND A.atid = C.ac_id
AND C.rank_ = 1
JOIN T6 ON C.t_dt = T6.t_dt
As i cannot ignore any tables ( and joins ), My approach was to substitute the window function with another join using aggregate function max but i was not able to rewrite it.
Also i am not sure if that will surely help to improve performance so any guidance will help us.
Analytic functions usually perform better than joins with select max, because you are reading the same table only once in case of analytic function and row_number calculation is parallelized by partition by.
Try to regroup joins and filtering.
Join
LEFT OUTER JOIN (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
with where condition ISNULL(PV.p_cd) is reducing some rows in T1.
The same do these conditions:
WHERE T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
Move this join into the subquery, if it filters a lo, this may help to reduce the dataset in T1 before all other joins and row_number():
(select T1.* from T1
left join (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
where T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) as T1
Also first row_number is calculated only on T1 and B tables:
PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC
Consider joining T5 table after row_number filter, if this join is heavy, and row_number filter is reducing the dataset, then wrap row_number with filter in the subquery again and join subquery filtered with T5.
(--filtered by row_number
select * from
(
SELECT T1.*, B.atid, B.a_id,
ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
from
(select T1.* from T1
left join (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
where T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) as T1 JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
) T WHERE T.rank_ = 1
) T --filtered
JOIN T5 ON T1.t_dt = T5.t_d
This may help depending on your data.
Read also: https://stackoverflow.com/a/51061613/2700344 and this: https://stackoverflow.com/a/51061613/2700344

Join to only work if a record present in SQL

I have 2 tables - table1 and table2.
In table2, there are multiple records matching criteria for table1 based on:
c_type, h_level, loop, e_id
I want records from RIGHT table ONLY if there is EXACT 1 match. If not, element_nm should be NULL, so I have the exact same records in the output as left table.
SELECT a.*,
b.element_nm
FROM table1 a
LEFT JOIN table2 b ON
a.c_type = b.c_type
AND a.h_level = b.h_level
AND a.loop = b.loop
AND a.e_id = b.e_id
ORDER BY a.file_name,
a.line_num asc;
As this is about one value only, you can use a Sub query in the select clause. Otherwise you'd use a subquery in a LEFT OUTER JOIN or use OUTER APPLY.
SELECT
t1.*,
(
SELECT MIN(t2.element_nm)
FROM table2 t2
WHERE t2.c_type = t1.c_type
AND t2.h_level = t1.h_level
AND t2.loop = t1.loop
AND t2.e_id = t1.e_id
HAVING COUNT(*) = 1
) AS element_nm
FROM table1 t1
ORDER BY t1.file_name, t1.line_num;
Thorsten's answer works when you want only one column from the second table. But if you want multiple columns, it is a bit cumbersome.
Alternatively:
SELECT a.*, b.*
FROM table1 a LEFT JOIN
(SELECT b.*,
COUNT(*) OVER (PARTITION BY b.c_type, b.h_level, b.loop, b.e_id) as cnt
FROM b
) b
ON a.c_type = b.c_type AND
a.h_level = b.h_level AND
a.loop = b.loop AND
a.e_id = b.e_id AND
b.cnt = 1
ORDER BY a.file_name, a.line_num asc;
you should use Row_Number , like this :
WITH cte AS (
SELECT ROW_NUMBER() OVER(PARTITION BY a.c_type ,a.h_level,a.loop ,a.e_id ) rnum
,a.c_type ,a.h_level,a.loop ,a.e_id FROM table1 a
LEFT JOIN table2 b ON a.c_type = b.c_type AND a.h_level = b.h_level AND a.loop = b.loop AND a.e_id = b.e_id
)
,cte2 AS (SELECT * FROM cte WHERE rnum = 2)
SELECT a.*,
CASE WHEN cte2.element_nm IS NULL then b.element_nm ELSE NULL END element_nm
FROM table1 a
LEFT JOIN table2 b ON a.c_type = b.c_type AND a.h_level = b.h_level AND a.loop = b.loop AND a.e_id = b.e_id
LEFT JOIN cte2 ON a.c_type = cte2.c_type AND a.h_level = cte2.h_level AND a.loop = cte2.loop AND a.e_id = cte2.e_id

Count of matching IDs with in C as query

I'm looking to add a column to display a count of all records where the drgpackid matches.
Essentially I want one line in the example provided and a count of how many records have that ID and meet the conditions of the query.
with C as (
select t1.*
from DrgPack t1 join
DrgPack t2
on t1.DrgID = t2.DrgID and t1.CentralMaintFieldMask <> t2.CentralMaintFieldMask
)
select *
from rxworkflowpack
where drgpackid in (select ID from c where CentralMaintFieldMask = 0)
There are a thousand ways to do this, like adding another CTE with the counts and joining to it
with C as (
select t1.*
from DrgPack t1 join
DrgPack t2
on t1.DrgID = t2.DrgID and t1.CentralMaintFieldMask <> t2.CentralMaintFieldMask
),
D as (
select drgpackid, count(*) from rxworkflowpack group by drgpackid)
select *
from rxworkflowpack left join D on rxworkflowpack.drgpackid = d.drgpackid
where drgpackid in (select ID from c where CentralMaintFieldMask = 0)
You can use window function like this:
with C as (
select t1.*
from DrgPack t1 join
DrgPack t2
on t1.DrgID = t2.DrgID and t1.CentralMaintFieldMask <> t2.CentralMaintFieldMask
)
select DISTINCT *, COUNT(*) OVER (PARTITION BY drgpackid) AS CountRecords from rxworkflowpack
where drgpackid in (select ID from c where CentralMaintFieldMask = 0)
You should use < to not double count
select t1.drgpackid, count(*) as cnt
from DrgPack t1
join DrgPack t2
on t1.DrgID = t2.DrgID
and t1.CentralMaintFieldMask < t2.CentralMaintFieldMask
join rxworkflowpack
on rx.ID = t1.drgpackid
and rx.CentralMaintFieldMask = 0
group by t1.drgpackid