RANK() OVER ignore equal ranks

RANK() OVER ignore equal ranks - sql

I have this query
SELECT
patientid,
practiceid,
visitcount
FROM
(
SELECT
patientid,
practiceid ,
visitcount,
RANK() OVER (PARTITION BY patientid ORDER BY visitcount DESC) as Rank
FROM
aco.patients_practices
WHERE practiceid in (select id from aco.practices where parentaco = 30982) and isprimary = 0
) AS A
WHERE
Rank = 1
Here are some results
patientid practiceid visitcount
157053 30976 6
158463 30974 2
187772 30973 15
187797 30971 1
187797 30975 1
Notice the last 2 patientid's are the same and have the same visitcount hence the same rank. How can I omit these records with equal ranks completely from the output?
Thanks!

You can eliminate them by counting them and including them in the where clause. The following query counts them using logic similar to the rank -- the number of times the patient has the same visitcount:
SELECT patientid, practiceid, visitcount
FROM (SELECT patientid, practiceid, visitcount,
RANK() OVER (PARTITION BY patientid ORDER BY visitcount DESC) as Rank,
COUNT(*) over (PARTITION by patientid, visitcount) as RankCount
FROM aco.patients_practices
WHERE practiceid in (select id from aco.practices where parentaco = 30982) and isprimary = 0
) A
WHERE Rank = 1 and RankCount = 1
I do notice that the practiceid is different in the last two records. It seems that you still want to eliminate both, though.

Related

Gaps and Islands - How to Sum Each Group of Consecutive Rows by ID

Below is my current SQL code and output. I only need to get the sum of EFF_DAYS for consecutive (or single) rows where CD is equal to STG (highlighted in yellow).
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY TMSP, EFF_DT) RN,
Z2.*
FROM (
SELECT CASE WHEN (LAG_CD IS NULL OR LAG_CD NOT IN ('STG')) AND CD IN ('STG')
THEN RANK() OVER (PARTITION BY ID ORDER BY TMSP, EFF_DT)
WHEN CD = LAG_CD AND CD IN ('STG')
THEN RANK() OVER (PARTITION BY ID ORDER BY TMSP, EFF_DT)
WHEN CD = LAG_CD AND CD != LEAD_CD
THEN RANK() OVER (PARTITION BY ID ORDER BY TMSP, EFF_DT)
END AS CASES,
Z.* FROM (
SELECT ID,
LAG(CD) OVER (PARTITION BY ID ORDER BY TMSP, EFF_DT) AS LAG_CD,
LEAD(CD) OVER (PARTITION BY ID ORDER BY TMSP, EFF_DT) AS LEAD_CD,
CD,
TMSP,
EFF_DT,
END_EFF_DT,
DATEDIFF(day, EFF_DT, END_EFF_DT) AS EFF_DAYS
FROM #POSTCHG_ROWS
WHERE ID IN ('ABC123', 'XYZ789')
) Z
) Z2 ORDER BY TMSP, EFF_DT
I've tried all kinds of row number and rank stuff, but I can't seem to get the CASES column correct. I've spent hours looking at other gap-island sql solutions, but haven't come across the exact scenario below.
Ideally my CASES column would be output like below so I can GROUP BY CASES, ID, the starting TMSP of the consecutive row block and then calculate: SUM(EFF_DAYS).
Below is my goal output:

You are only interested with series of adjacent "CTG" rows. I think that the simplest approach is a window count of non-"STG" values to define the groups, then filtering and aggregation:
select
id,
min(tmsp) tmsp,
min(eff_dt) eff_dt,
sum(datediff(day, eff_dt, end_eff_dt)) sum_eff_days
from (
select
p.*
sum(case when cd = 'STG' then 0 else 1 end)
over(partition by id order by tmsp) grp
from #postchg_rows p
) p
where cd = 'STG'
group by id, grp

Find the lowest score after the first Highest Score

I am looking for a SQL query.
Table 1: Source Table
Table 2: Result Table
Attached the patient table. Here I want to find the Highest Score, Highest Score Date, Lowest Score and Lowest Score Date of each patient. The tricky part is that if a patient has same highest score (here it is 9) on two different dates (10/5/2018 and 8/4/2020), we need to take earliest high score (10/5/2018). Similarly, if the patient has same lowest score (6 here) on two different dates (3/1/2019 and 4/2/2020) we should take the latest low score (4/2/2020)
Table 1: Source table contains all scores of a single patient. Patient ID is the primary key of that table. I want a result table that look like table 2.
I have tried this
SELECT distinct
pat.PAT_NAME 'Patient Name'
, pat.PAT_ID
, CAST(pat.CONTACT_DATE AS DATE) 'Service Date'
, pat.MEAS_VALUE 'Score'
, [Row Number] = rank() OVER (PARTITION BY pat.PAT_ID ORDER BY CAST(pat.MEAS_VALUE AS int) DESC, CONTACT_DATE asc)
FROM Patient pat
WHERE pat.PAT_ID = 'A112233'
But this code can show me highest or the lowest score. But it does not meet all my requirement.

If there is table t with columns: PatientName, PatientID, ServiceDate, Score. Something like this:
;with high_low_cte(PatientID, high_rn, low_rn) as(
select
PatientID,
row_number() over (partition by PatientID order by Score, ServiceDate asc),
row_number() over (partition by PatientID order by Score, ServiceDate desc)
from
t)
select * from high_low_cte where high_rn=1 and low_rn=1;
After update to question:
;with high_low_cte([Patient Name], PAT_ID, [Service Date], Score, high_rn, low_rn) as (
SELECT distinct
pat.PAT_NAME 'Patient Name'
,pat.PAT_ID
,CAST(pat.CONTACT_DATE AS DATE) 'Service Date'
,pat.MEAS_VALUE 'Score'
,high_rn=row_number() OVER (PARTITION BY pat.PAT_ID ORDER BY CAST(pat.MEAS_VALUE AS int) DESC, CONTACT_DATE asc)
,low_rn=row_number() OVER (PARTITION BY pat.PAT_ID ORDER BY CAST(pat.MEAS_VALUE AS int) asc, CONTACT_DATE asc)
FROM
Patient pat
WHERE
pat.PAT_ID='A112233')
select hld1.*, hld2.Score [Low_Score], hld2.[Service Date] [Low Service Date]
from high_low_cte hld1 join high_low_cte hld2 on hld1.PAT_ID=hld2.PAT_ID
where
hld1.high_rn=1
and hld2.low_rn=1;

You can use conditional aggregation with row_number():
select PatientName, PatientID,
min(case when ;
_hi = 1 then ServiceDate end) as high_date,
max(score) as high_score,
min(case when seqnum_lo = 1 then ServiceDate end) as low_date,
min(score) as low_score
from (select t.*
row_number() over (partition by PatientID order by Score desc, ServiceDate asc) as seqnum_hi,
row_number() over (partition by PatientID order by Score, ServiceDate desc) as seqnum_lo
from t
) t
group by PatientName, PatientID;

First, get highest and lowest score, then find on which date it's occurred
;with scores as ( -- get highest/lowest score per patientId
select patientID, max(score) hScore, min(score) lScore
from table
group by patientID
), dates as (
-- get first date a patient had the highest score
select patientID, min(ServiceDate) dateHScore, score
from table t
inner join scores s on s.patientid = t.patientid and s.hScore= t.score
group by patientID, score
union all
-- get the last date a patient had the lowest score
select patientID, max(ServiceDate) dateLScore, score
from table t
inner join scores s on s.patientid = t.patientid and s.lScore = t.score
group by patientID, score
)
select t.patientName, scores.*
from table t
inner join scores s on s.patientid= t.patientid
If you want highest/lowest score on separate columns on the same row, just split the second cte and join them appropriately in the select

I haven't tested, but maybe something like this could do the job:
select x.PatientName, x.PatientId,
case when x.HSD=1 then 'Highest' else 'Lowest' as ScoreType,
Score, ScoreDate
from (
select
p.PAT_NAME as PatientName,
p.PAT_ID as PatientId,
p.MEAS_VALUE as Score,
p.CONTACT_DATE as ScoreDate,
row_number() over (partition by p.PatientId order by p.MEAS_VALUE desc, p.CONTACT_DATE desc) as HSD,
row_number() over (partition by p.PatientId order by p.MEAS_VALUE asc, p.CONTACT_DATE asc) as LSD
from Patient p
) x
where
x.HighestScore=1 or
x.LowestScoreDate=1
You should receive two rows. You would need to pivot the data in order to get your expected output.

Selecting City from Customer ID in SQL

Customer have ordered from different cities. Thus we have multiple cities against same customer_id. I want to display that city against customer id which has occurred maximum number of times , in case where customer has ordered same number of orders from multiple cities that city should be selected from where he has placed last order. I have tried something like
SELECT customer_id,delivery_city,COUNT(DISTINCT delivery_city)
FROM analytics.f_order
GROUP BY customer_id,delivery_city
HAVING COUNT(DISTINCT delivery_city) > 1

WITH cte as (
SELECT customer_id,
delivery_city,
COUNT(delivery_city) as city_count,
MAX(order_date) as last_order
FROM analytics.f_order
GROUP BY customer_id, delivery_city
), ranking as (
SELECT *, row_number() over (partition by customer_id
order by city_count DESC, last_order DESC) as rn
FROM cte
)
SELECT *
FROM ranking
WHERE rn = 1

select customer_id,
delivery_city,
amount
from
(
select t.*,
rank() over (partition by customer_id order by amount asc) as rank
from(
SELECT customer_id,
delivery_city,
COUNT(DISTINCT delivery_city) as amount
FROM analytics.f_order
GROUP BY customer_id,delivery_city
) t
)
where rank = 1

SQL server query UNION ALL [duplicate]

This question already has an answer here:
SQL Server query assistance needed
(1 answer)
Closed 6 years ago.
I have a UNION ALL query that I'm getting incorrect results for. I'm supposed to get about 1100 hundred records. Please see query...
select
Pop, planID, PopFull, ApptDate, intake1,FollowUP2,FollowupCode, rn, '5133'
from
(Select *, row_number() over (partition by planID order BY AddedDate asc) as rn from Vinfo) t
where
rn = 1 and ApptDate >='12/1/2014' and ApptDate <='12/31/2015'
Union All
select
Pop, planID, PopFull, ApptDate, intake1, FollowUP2, FollowupCode, rn, '5133'
from
(Select *,row_number() over (partition by PlanID order BY AddedDate DESC) as rn from Vinfo) t
where
rn = 1 and ApptDate >='12/1/2014' and ApptDate <='12/31/2015'
So what I'm trying to do is SELECT all the info and in the first SELECT statement, I'm trying to get a VALUE for INTAKE when ADDEDDATE is the lowest (earliest).
I'm doing the UNION all because in the second SELECT statement, I'm trying to get a value for FOLLOWup when ADDEDDATE is the oldest (most recent).
The values for INTAKE AND FOLLOWUP might be different, but doesnt have to be. I'm trying to track the difference.
However when I run this query, I get double the records. Is there a way for me to run this query so that I can get the correct number of records (1100) and get value for INTAKE and if there's a change in value for FOLLOWUP I will see that in the same row?
Instead of seeing double of everything. Basically the way it's running now is if PLANID is 1023 and Intake for EARLIEST date is B then it will show me B for FollowUP as well. in Addition if, FollowUP for LATEST date is C then it will show me C for Intake and FollowUP in the row below it.
EDIT:
select Pop,planID,PopFull,ApptDate, intake1,FollowUP2,FollowupCode, '5133'
from (select *,
row_number() over (partition by planID order BY AddedDate asc) as rn_first,
row_number() over (partition by PlanID order BY AddedDate DESC) as rn_last
from VInfo
) t
where t.rn_first = 1 or rn_last = 1
and ApptDate >='12/1/2014' and ApptDate <='12/31/2015'
Ran this but doesn't give right results

You will obviously get duplicates for those rows which have only one instance of Pop, planID, PopFull, ApptDate, intake1, FollowUP2, FollowupCode combination - i.e. both ASC and DESC return one row and rn=1 for ASC and DESC sorting.
UNION ALL does allow duplicates from top and bottom rowsets.
You may try UNION instead.
Also, as suggested before, you may count both ROW_NUMBERS in single select:
select *
from
(
select v.*,
row_number() over (partition by PlanID order BY AddedDate asc) as rn_first,
row_number() over (partition by PlanID order BY AddedDate DESC) as rn_last
from Vinfo v
) t
where t.rn_first = 1 or rn_last = 1

Find minimum value in groups of rows

In the SQL space (specifically T-SQL, SQL Server 2008), given this list of values:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060
ACT 2012-01-07 11:51:07.920
ACT 2012-01-08 04:13:29.140
NOS 2012-01-09 04:29:16.873
ACT 2012-01-21 12:39:37.607 <-- THIS
ACT 2012-01-21 12:40:03.840
ACT 2012-05-02 16:27:17.370
GRAD 2012-05-19 13:30:02.503
GRAD 2013-09-03 22:58:48.750
Generated from this query:
SELECT Status, Date
FROM Account_History
WHERE AccountNumber = '1234'
ORDER BY Date
The status for this particular object started at ACT, then changed to NOS, then back to ACT, then to GRAD.
What is the best way to get the minimum date from the latest "group" of records where Status = 'ACT'?

Here is a query that does this, by identifying the groups where the student statuses are the same and then using simple aggregation:
select top 1 StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged,
(row_number() over (order by "date") -
row_number() over (partition by studentstatus order by "date)
) as grp
FROM Account_History
WHERE AccountNumber = '1234'
) t
where StudentStatus = 'ACT'
group by StudentStatus, grp
order by WhenLastChanged desc;
The row_number() function assigns sequential numbers within groups of rows based on the date. For your data, the two row_numbers() and their difference is:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060 1 1 0
ACT 2012-01-07 11:51:07.920 2 2 0
ACT 2012-01-08 04:13:29.140 3 3 0
NOS 2012-01-09 04:29:16.873 4 1 3
ACT 2012-01-21 12:39:37.607 5 4 1
ACT 2012-01-21 12:40:03.840 6 5 1
ACT 2012-05-02 16:27:17.370 7 6 1
GRAD 2012-05-19 13:30:02.503 8 1 7
GRAD 2013-09-03 22:58:48.750 9 2 7
Notice the last row is constant for rows that have the same status.
The aggregation brings these together and chooses the latest (top 1 . . . order by date desc) of the first dates (min(date)).
EDIT:
The query is easy to tweak for multiple account numbers. I probably should have written that way to begin with, except the final selection is trickier. The results from this has the date for each status and account:
select StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged, AccountNumber
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
order by WhenLastChanged desc;
But you can't get the last one per account quite so easily. Another level of subqueries:
select AccountNumber, StudentStatus, WhenLastChanged
from (select AccountNumber, StudentStatus, min(WhenLastChanged) as WhenLastChanged,
row_number() over (partition by AccountNumber, StudentStatus order by min(WhenLastChanged) desc
) as seqnum
from (SELECT AccountNumber, StudentStatus, WhenLastChanged,
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
) t
where seqnum = 1;
This uses aggregation along with the window function row_number(). This is assigning sequential numbers to the groups (after aggregation), with the last date for each account getting a value of 1 (order by min(WhenLastChanged) desc). The outermost select then just chooses that row for each account.

SELECT [Status], MIN([Date])
FROM Table_Name
WHERE [Status] = (SELECT [Status]
FROM Table_Name
WHERE [Date] = (SELECT MAX([Date])
FROM Table_Name)
)
GROUP BY [Status]
Try here Sql Fiddle

Hogan: basically, yes. I just want to know the date/time when the
account was last changed to ACT. The records after the point above
marked THIS are just extra.
Instead of just looking for act we can look for first time status changes and select act (and max) from that.
so... every time a status changes:
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
)
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
now finding the max of the act items.
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
), statuschange as
(
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
)
select max(date)
from satuschange
where status='Act'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

RANK() OVER ignore equal ranks - sql

Related

Gaps and Islands - How to Sum Each Group of Consecutive Rows by ID

Find the lowest score after the first Highest Score

Selecting City from Customer ID in SQL

SQL server query UNION ALL [duplicate]

Find minimum value in groups of rows

Categories

Resources