Trying to count unique observations in SQL using Partition By - sql

I have these two datasets:
Conditions: I would like to count the number of Unique Discharge_ID as Total_Discharges in my final dataset.
ICU_ID is a little bit more difficult. For PT_ID 001, what is happening is that PT 001 has 4 of the same discharge dates but 4 unique ICU_IDs. Since all of these ICU_IDs occur within 30 days of the Discharge_DT, I only want to count one of them. That is why total discharges for AZ is 1 and ICU_Admits = 1.
For PT_ID 002, I have 2 different Discharge_IDs but 1 ICU Admit that occurred within 30 days of both of the Discharge_IDs. I would like to count the Discharges as 2, and ICU_admits as 1.
DF1: Dataset of Discharges from hospital and admission to ICU within 30 days of Discharge_DT
City
PT_ID
Hospital_ID
Admit_Dt
Discharge_DT
Discharge_ID
ICU_ID
AZ
001
ABC
01-01-2021
01-03-2021
001,ABC,01-01-2021,01-03-2021
001,XYZ,01-05-2021,01-06-2021
AZ
001
ABC
01-01-2021
01-03-2021
001,ABC,01-01-2021,01-03-2021
001,XYZ,01-08-2021,01-09-2021
AZ
001
ABC
01-01-2021
01-03-2021
001,ABC,01-01-2021,01-03-2021
001,XYZ,01-11-2021,01-11-2021
AZ
001
ABC
01-01-2021
01-03-2021
001,ABC,01-01-2021,01-03-2021
001,XYZ,01-15-2021,01-16-2021
CA
002
DEF
04-03-2021
04-07-2021
001,ABC,04-03-2021,04-07-2021
002,LMN,04-27-2021,04-27-2021
CA
002
DEF
04-20-2021
04-21-2021
001,ABC,04-20-2021,04-21-2021
002,LMN,04-27-2021,04-27-2021
DF desired:
City
TotalDischarges
ICU_Admit
AZ
1
1
CA
2
1
Current Code:
DROP TABLE IF EXISTS #edit1
WITH CTE_df1 as (
select * from df1
)
select
City,
PT_ID,
Hospital_ID,
Admit_Dt,
Discharge_DT,
Discharge_ID,
count(ICU_ID) over (partition by ICU_ID) as ICU_Pts,
count(distinct Discharge_ID) as Total_Discharges
into #edit1
from CTE_df1
group by City, Discharge_ID, ICU_ID, PT_ID
order by City,
;with CTE_edit1 as (
select * from #edit1
)
select City, sum(ICU_Pts), sum(Total_Discharges)
from CTE_edit1
group by City
order by City
Current Output: PT_ID 001 works great but PT_ID 002 shows up at 2 in ICU_Admit as it is counting both as unique ICU visits.
City
TotalDischarges
ICU_Admit
AZ
1
1
CA
2
2
Any help would be appreciated

Related

Find most visited Hotel by month in PostgreSQL

I have a table with couple of customers resided in a hotel for a month or months. I need to find 3 most visited hotels by month. In case one customer lived in a hotel for three months, then it refers for three month. To be more precise below table hotel I have:
id
usr_id
srch_ci
srch_co
hotel_id
1
13
2021-10-01
2021-11-22
200
2
12
2021-10-11
2021-10-22
300
3
11
2021-10-28
2021-11-05
200
4
10
2021-10-28
2021-12-03
100
Result should look like below:
mnth
hotel_id
rnk
visits
2021-10
200
1
2
2021-10
100
2
1
2021-10
300
2
1
2021-11
200
1
2
2021-11
100
2
1
2021-12
100
1
1
As we can see above, user_id = 10 stayed in a hotel = 100 for 3 different months. That means it is counted for 3 different month for a hotel as 1 count. And for 2021-12 month only user = 10 stayed, for this reason in 2021-12 month hotel = 100 is ranked as 1st.
I solved problem using generate_series function in Postgres. That is what I was looking for. This link helped me. Splitting single row into multiple rows based on date
SELECT hotel_id,mnth,visits,
ROW_NUMBER() OVER (PARTITION BY mnth ORDER BY visits DESC) AS rnk FROM (
SELECT hotel_id,to_char(live_mnth,'YYYY-MM') AS mnth,count(*) AS visits FROM (
SELECT id,usr_id,hotel_id,date_in,date_out,
generate_series(date_in, date_out, '1 MONTH')::DATE AS live_mnth
FROM (
SELECT *,TO_CHAR(srch_ci, 'yyyy-mm-01')::date AS date_in,
TO_CHAR(srch_co, 'yyyy-mm-01')::date AS date_out
FROM hotels
) s
) s GROUP BY hotel_id,to_char(live_mnth,'YYYY-MM')
) t

SQL: Count distinct column combinations with multiple conditions

I have a joined table that looks like this where mytable contains date, storeid, item and units, whereas stores has state and storeid.
date state storeid item units
==============================================
2020-01-22 new york 712 a 5
2020-01-22 new york 712 b 7
2020-02-18 new york 712 c 0
2020-05-11 new york 518 b 9
2020-01-22 new york 518 b 10
2020-01-21 oregon 613 b 0
2020-02-13 oregon 613 b 9
2020-04-30 oregon 613 b 10
2020-01-22 oregon 515 c 3
And I am trying to create a column that counts the unique number of times that both storeid and item occur in a row where the date is between a given date range and units is greater than 0. Also, only need to select/group by state and the calculated column. I have something that looks like this:
select
s.state,
count(distinct case
when m.date between '2020-01-01' and '2020-03-31'
and m.units > 0
then m.storeid, m.item
end) as q1_total
from mytable as m
left join (select
state,
storeid
from stores) s
on m.storeid=s.storeid
group by s.state
I know my count function isn't written correctly, but I'm not sure how to fix it. Trying to get the end result to look like this.
state q1total
=========================
new york 3
oregon 2
With this query:
select distinct state, storeid, item
from mytable
where date between '2020-01-01' and '2020-03-31' and units > 0
you get all the distinct combinations of state, storeid and item under your conditions and you can aggregate on it:
select state, count(*) q1total
from (
select distinct state, storeid, item
from mytable
where date between '2020-01-01' and '2020-03-31' and units > 0
) t
group by state
See the demo.
Results:
> state | q1total
> :------- | ------:
> new york | 3
> oregon | 2

Display Top result by region

I have the following table:
Table Orders
OrderID Region CustomerID SalesPersonID
1 North 01 001
2 North 12 002
3 North 33 002
4 North 55 002
5 North 21 001
6 North 11 002
7 North 33 004
8 North 15 002
9 East 23 005
10 East 01 005
11 East 12 005
12 East 33 007
13 East 55 005
14 East 21 006
15 East 11 006
16 East 33 006
17 East 15 007
10 East 34 007
I am looking to display the sales person with most orders in each region. So my end result should look like:
Region SalesPerson Orders
North 002 5
East 005 4
How can I retrieve this information?
You can use window functions - if your database, which you did not tell, supports them:
select Region, SalesPersonID, Orders
from (
select
Region,
SalesPersonID,
count(*) orders,
rank() over(partition by Region order by count(*) desc) rn
from orders
group by Region, SalesPersonID
) t
where rn = 1
rank() allows top ties, if any. You can use row_number() if you want just one result per region, even if there are ties.
Consult the totals by seller and by region.
With the previous totals calculate the maximum sale by region and with this totals the related sellers must be determined
with Totales as
(
select Region,SalesPersonID,count(1) As Totales
from sales
group by Region,SalesPersonID
)
,MaxRegion as
(
select Region,max(Totales) As Totales
from totales
group by Region
)
select MaxRegion.Region,Totales.SalesPersonID,MaxRegion.Totales
from MaxRegion
inner join Totales on MaxRegion.Region=Totales.Region and MaxRegion.Totales=Totales.Totales
Example
Keep in mind that if you have two sellers with the same number of maximum sales, they will be included in the query.

How Should I handle this Start and End Date for each address changes in Oracle?

I have a request to generate a report with the following data in an Oracle table: Just an example of a member.
MEMBER_ID START_DATE END_DATE ADDRESS1 ADDRESS2 CITY STATE LAST_UPDATED
12345 1/1/2019 12/31/9999 1 Test Ave Apt 111 City AA 3/4/2020
12345 1/1/2019 12/31/9999 2 Test Dr Apt 222 City AA 9/5/2019
12345 1/1/2019 12/31/9999 1 Test Ave APT 111 City AA 6/3/2019
12345 1/1/2019 12/31/9999 3 Test TRL City AA 3/3/2019
I want this as my output on the report from the data above:
MEMBER_ID START_DATE END_DATE ADDRESS1 ADDRESS2 CITY STATE LAST_UPDATED
12345 10/1/2019 12/31/9999 1 Test Ave Apt 111 City AA 3/4/2020
12345 7/1/2019 9/30/2019 2 Test Dr Apt 222 City AA 9/5/2019
12345 4/1/2019 6/31/2019 1 Test Ave APT 111 City AA 6/3/2019
12345 1/1/2019 3/31/2019 3 Test TRL City AA 3/3/2019
Would someone be able to help with this? I tried Dense_rank but just couldn't figure a logic that would work correctly. Like if a member has another address change, i would need to pull in the latest change on the report as well.
You seem to want records to end on the last day of the month of the last_updated column. Then next then begins on the next day.
This is easily handled using lag():
select t.*,
( lag(last_day(last_updated)) over (partition by member_id order by last_updated) +
interval '1' day
) as new_start_date,
last_day(last_updated) as new_end_date
from t;
I think you need a quarter start and end date of the last updated date as start and end date.
Select member_id,
Trunc(last_updated,'Q') as start_date,
case
when extract(month from Trunc(last_updated,'Q')) = 12
then end_date
else Add_months(Trunc(last_updated,'Q'), 3) - 1
end as end_date,
.....
From your_table

Getting a snapshot on the first day of the each month

My dataset looks like this:
emplid region location sub_dept dept start_dt end_dt days
------ ------ -------- -------- ---- -------- ---------- ----
123456 East NY A 1 7/1/2005 9/30/2005 91
123456 East NY B 1 7/1/2012 11/9/2012 131
123456 West San Jose C 2 7/1/2013 12/31/2013 183
123457 East NY B 1 7/1/2017 9/7/2017 68
123457 East NY B 1 7/1/2005 12/31/2005 183
123458 East NY B 1 7/1/2017 9/7/2017 68
123458 West San Jose C 2 7/1/2010 7/31/2010 30
123459 East NY A 1 7/1/2017 9/7/2017 68
123460 East Boston F 3 7/1/2007 11/30/2007 152
I need to be able to get a snapshot for each 1st of the month starting from the minimum date. So in the example minimum date is 9/30/2005. So I need to know in which department/sub_dept/location/region was each empl on 10/1/2005, 11/1/2005 , 12/1/2005 all the way through the max date.
You didn't mention the name of the employee table, so I've called it employee_table. The following query (or something very close to it) should generate what you want:
With report_limits as (
Select Trunc(min(start_dt), 'MONTH') as min_rpt_dt,
Trunc(max(end_dt), 'MONTH') as max_rpt_dt
From employee_table),
report_dates as (
Select add_months(min_rpt_dt, level-1) as rpt_dt
From report_limits
Connect By add_months(min_rpt_dt, level-1) <= max_rpt_dt)
--
Select e.emplid, e.region, e.location, e.sub_dept, e.dept,
e.start_dt, e.end_dt, e.days, r.rpt_dt
From report_dates r
Inner Join employee_table e on r.rpt_dt Between e.start_dt And e.end_dt
Order By r.rpt_dt, e.emplid;
The report_limits query determines the range of report dates, the report_dates query uses a Connect By clause to generate a set of dates within the range, and the main query joins the list of dates to the employee date.
Try this query:
Declare #StartDate date='2005-09-29',
#EndDate date='2017-04-01'
Select *,
Dateadd(mm, Datediff(mm, 0, date), 0) AS FirstDateOfMonth from TableName
where date >=#StartDate and date<=#EndDate