What I would like to do is find the number of consecutive weeks that someone is active on Sundays and assign them a value. They have to participate in at least 2 races a day to be counted as active for the week.
If they are active for 2 consecutive weeks I would like to assign a value of 100, 3 consecutive weeks a value of 200, 4 consecutive weeks a value of 300, and continuing up to 9 consecutive weeks.
My difficulty is not determining consecutive weeks, but breaks in between consecutive dates. Suppose the following dataset:
CustomerID RaceDate Races
1 2/2/2014 2
1 2/9/2014 5
1 2/16/2014 3
1 2/23/2014 3
1 3/2/2014 4
1 3/9/2014 3
1 3/16/2014 3
2 2/2/2014 2
2 2/9/2014 3
2 3/2/2014 2
2 3/9/2014 4
2 3/16/2014 3
CustomerID 1 would have 7 consecutive weeks for a value of 600.
The hard part for me is CustomerID 2. They would have 2 consecutive weeks AND 3 consecutive weeks. So their total value would be 100 + 200 = 300.
I would like to be able to do this with any different combination of consecutive weeks.
Any help please?
EDIT: I am using SQL Server 2008 R2.
When looking for sequential values, there is a simple observation that helps. If you subtract a sequence from the dates then the value is a constant. You can use this as a grouping mechanism
select CustomerId, min(RaceDate) as seqStart, max(RaceDate) as seqEnd,
count(*) as NumDaysRaced
from (select t.*,
dateadd(week, - row_number() over (partition by customerID, RaceDate),
RaceDate) as grp
from table t
where races >= 2
) t
group by CustomerId, grp;
You can then use this to get your final "points":
select CustomerId,
sum(case when NumDaysRaced > 1 then (NumDaysRaced - 1) * 100 else 0 end) as Points
from (select CustomerId, min(RaceDate) as seqStart, max(RaceDate) as seqEnd,
count(*) as NumDaysRaced
from (select t.*,
dateadd(week, - row_number() over (partition by customerID, RaceDate),
RaceDate) as grp
from table t
where races >= 2
) t
group by CustomerId, grp
) c
group by CustomerId;
Related
I have data that looks like this:
ID num_of_days
1 0
2 0
2 8
2 9
2 10
2 15
3 10
3 20
I want to add another column that increments in value only if the num_of_days column is divisible by 5 or the ID number increases so my end result would look like this:
ID num_of_days row_num
1 0 1
2 0 2
2 8 2
2 9 2
2 10 3
2 15 4
3 10 5
3 20 6
Any suggestions?
Edit #1:
num_of_days represents the number of days since the customer last saw a doctor between 1 visit and the next.
A customer can see a doctor 1 time or they can see a doctor multiple times.
If it's the first time visiting, the num_of_days = 0.
SQL tables represent unordered sets. Based on your question, I'll assume that the combination of id/num_of_days provides the ordering.
You can use a cumulative sum . . . with lag():
select t.*,
sum(case when prev_id = id and num_of_days % 5 <> 0
then 0 else 1
end) over (order by id, num_of_days)
from (select t.*,
lag(id) over (order by id, num_of_days) as prev_id
from t
) t;
Here is a db<>fiddle.
If you have a different ordering column, then just use that in the order by clauses.
I have at table which looks like below:
ID Date IsFull
1 2020-01-05 0
1 2020-02-05 0
1 2020-02-25 1
1 2020-03-01 1
1 2020-03-20 1
I want to display how many months for ID = 1
have sum(isfull)/count(*) > .6 in a given month (More than 60% of the times in that month isfull = 1)
So the final output should
ID HowManyMonths
1 1 --------(Only month 3----2 out 2 cases)
If the question changes to sum(isfull)/count(*) > .4
then the final output should be
ID HowManyMonths
1 2 --------(Month 2 and Month 3)
Thanks!!
You can do this with two levels of aggregation:
select id, count(*) howManyMonths
from (
select id
from mytable
group by id, year(date), month(date)
having avg(1.0 * isFull) > 0.6
) t
group by id
The subquery aggregates by id, year and month, and uses a having clause to filter on groups that meet the success rate (avg() comes handy for this). The outer query counts how many month passed the target rate for each id.
As I am preparing my data for predicting no-shows at a hospital, I ran into the following problem: In the query below I tried to get the number of shows/no-shows relatively shown to the number of appointments (APPTS). INDICATION_NO_SHOW means whether a patient showed up at a appointment. 0 means show, and 1 means no-show.
with t1 as
(
select
PAT_ID
,APPT_TIME
,APPT_ID
,ROW_NUMBER () over(PARTITION BY PAT_ID order by pat_id,APPT_TIME) as [TOTAL_APPTS]
,INDICATION_NO_SHOW
from appointments
)
,
t2 as
(
t1.PAT_ID
,t1.APPT_TIME
,INDICATION_NO_SHOW
,sum(INDICATION_NO_SHOW) over(order by PAT_ID, APPT_TIME ) as TOTAL_NO_SHOWS
,TOTAL_APPT
from t1
)
SELECT *
,(TOTAL_APPT- TOTAL_NO_SHOWS) AS TOTAL_SHOWS
FROM T2
order by PAT_ID, APPT_TIME
This resulted into the following dataset:
PAT ID APPT_TIME INDICATION_NO_SHOW TOTAL_SHOWS TOTAL_NO_SHOWS TOTAL_APPTS
1 1-1-2001 0 1 0 1
1 1-2-2001 0 2 0 2
1 1-3-2001 1 2 1 3
1 1-4-2001 0 3 1 4
2 1-1-2001 0 0 1 1
2 2-1-2001 0 1 1 2
2 2-2-2001 1 1 2 3
2 2-3-2001 0 2 2 4
As you can see my query only worked for patient 1, and then it also counts the no-shows for patient 1 for patient 2. So individually it worked for 1 patient, but not over the whole dataset.
The TOTAL_APPTs column worked out, because it counted the number of appts the patient had at the moment of that given appt. My question is: How do I succesfully get these shows and no-shows succesfully added up (as I did for patient 1)? I'm completely aware why this query doesn't work, I'm just completely in the blue on how to fix it..
I think that you can just use window functions. You seem to be looking for window sums of shows and no shows per patient, so:
select
pat_id,
appt_time,
indication_no_show,
sum(1 - indication_no_show)
over(partition by pat_id order by appt_time) total_shows,
sum(indication_no_show)
over(partition by pat_id order by appt_time) total_no_shows
from appointments
I'm trying to group consecutive dates, count the consecutive dates, and use that count as filter.
I have a table that currently looks like:
pat_id admin_dates admin_grp daily_admin
-------------------------------------------------
1 08/20/2018 1 2 doses
1 08/21/2018 1 3 doses
1 08/22/2018 1 1 doses
1 10/05/2018 2 3 doses
1 12/10/2018 3 4 doses
2 01/05/2019 1 1 doses
2 02/10/2019 2 2 doses
2 02/11/2019 2 2 doses
where admin_grp is grouping consecutive dates per pat_id.
I want to exclude all rows that have less than 3 consecutive dates for same pat_id. In this example, only pat_id = 1 and admin_grp = 1 condition has 3 consecutive dates, which I would like to see in result. My desired output would be:
pat_id admin_dates admin_grp daily_admin
-------------------------------------------------
1 08/20/2018 1 2 doses
1 08/21/2018 1 3 doses
1 08/22/2018 1 1 doses
I honestly have no idea how to perform this.. my attempt failed to count how many admin_grp has same value within same pat_id, let alone using that count as filter. If anyone could help out / suggest ideas how to tackle this, it will be greatly appreciated.
Assuming that any admin_grp would only have consecutive days, you would just need to count those records by (patid,admin_grp) that have 3 or greater records.
Eg:
select x.*
from (select t.*
,count(*) over(partition by patid,admin_grp) as cnt
from table t
)x
where x.cnt>=3
Short answer: join the table with itself on ‘pat_id’ and filter appropriately:
Select a.* from TABLE a
join (Select * from TABLE where daily_admin=‘3 doses’) b
using (pat_id)
Where a.daily_admin in (‘1 doses’, ‘2 doses’, ‘3 doses’)
Btw: too bad the ‘daily_admin’ column is not an integer... better data model would have made the Where statement slightly simpler :)
I am currently working on postgres and below is the question that I have.
We have a customer ID and the date when the person visited a property. Based on this I need to calculate the number of trips. Consecutive dates are considered as one trip. Eg: If a person visits on first date the trip no is first, post that he visits consecutively for three days that will counted as trip two.
Below is the input
ID Date
1 1-Jan
1 2-Jan
1 5-Jan
1 1-Jul
2 1-Jan
2 2-Feb
2 5-Feb
2 6-Feb
2 7-Feb
2 12-Feb
Expected output
ID Date Trip no
1 1-Jan 1
1 2-Jan 1
1 5-Jan 2
1 1-Jul 3
2 1-Jan 1
2 2-Feb 2
2 5-Feb 3
2 6-Feb 3
2 7-Feb 3
2 12-Feb 4
I am able to implement successfully using loop but its running very slow given the volume of the data.
Can you please suggest a workaround where we can not use loop.
Subtract a sequence from the dates -- these will be constant for a particular trip. Then you can use dense_rank() for the numbering:
select t.*,
dense_rank() over (partition by id order by grp) as trip_num
from (select t.*,
(date - row_number() over (partition by id order by date) * interval '1 day'
) as grp
from t
) t;