SQL: Get date when cumulative sum reaches a mark - sql

I have a table in the following format:
APP_iD| Date | Impressions
113 2015-01-01 10
113 2015-01-02 5
113 2015-01-03 50
113 2015-01-04 35
113 2015-01-05 30
113 2015-01-06 75
Now, I need to know the date when cumulative SUM of those impressions crossed 65/100/150 and so on.
I tried using CASE WHEN statement:
CASE WHEN SUM(impressions) >100
THEN date
but it doesn't sum the data across the column. It just does checks against the individual row.
My final result should look like:
APP_iD | Date_65 | Date_100 | Date_150
113 2015-01-03 2015-01-04 2015-01-06
Does anyone know how to do this?
Is this even possible?

Use sum() over() to get the running sum and check for the required values with a case expression. Finally aggregate the results to get one row per each app_id.
select app_id,max(dt_65),max(dt_100),max(dt_150)
from (
select app_id
,case when sum(impressions) over(partition by app_id order by dt) between 65 and 99 then dt end dt_65
,case when sum(impressions) over(partition by app_id order by dt) between 100 and 149 then dt end dt_100
,case when sum(impressions) over(partition by app_id order by dt) >= 150 then dt end dt_150
from t) x
group by app_id

with c as (
select
app_id, date,
sum(impressions) over (partition by app_id order by date) as c
from t
)
select app_id, s65.date, s100.date, s150.date
from
(
select distinct on (app_id) app_id, date
from c
where c >= 65 and c < 100
order by app_id, date
) s65
left join
(
select distinct on (app_id) app_id, date
from c
where c >= 100 and c <150
order by app_id, date
) s100 using (app_id)
left join
(
select distinct on (app_id) app_id, date
from c
where c >= 150
order by app_id, date
) s150 using (app_id)
;
app_id | date | date | date
--------+------------+------------+------------
113 | 2015-01-03 | 2015-01-04 | 2015-01-06
Without the pivot:
select distinct on (app_id, break) app_id, break, date
from (
select *,
case
when c < 100 then 65
when c < 150 then 100
else 150
end as break
from (
select
app_id, date,
sum(impressions) over (partition by app_id order by date) as c
from t
) t
where c >= 65
) t
order by app_id, break, date
;
app_id | break | date
--------+-------+------------
113 | 65 | 2015-01-03
113 | 100 | 2015-01-04
113 | 150 | 2015-01-06

You can try this for desired result.
with t as (select app_id, date, sum(Impressions)
over (partition by app_id order by date) AS s from tbl)
select app_id,
min(date_65) AS date_65 ,
min(date_100) AS date_100,
min(date_150) AS date_150
-- more columns to observe other sum of Impressions
from
(select app_id,
CASE WHEN (s >= 65 and s < 100) THEN date END AS date_65,
CASE WHEN (s >= 100 and s < 150) THEN date END AS date_100,
CASE WHEN (s >= 150 ) THEN date END AS date_150
-- more cases to observe other sum of Impressions
from t) q
group by q.app_id
if you want to observe more sum of Impressions, just add more conditions

Related

How to create a start and end date with no gaps from one date column and to sum a value within the dates

I am new SQL coding using in SQL developer.
I have a table that has 4 columns: Patient ID (ptid), service date (dt), insurance payment amount (insr_amt), out of pocket payment amount (op_amt). (see table 1 below)
What I would like to do is (1) create two columns "start_dt" and "end_dt" using the "dt" column where if there are no gaps in the date by the patient ID then populate the start and end date with the first and last date by patient ID, however if there is a gap in service date within the patient ID then to create the separate start and end date rows per patient ID, along with (2) summing the two payment amounts by patient ID with in the one set of start and end date visits (see table 2 below).
What would be the way to run this using SQL code in SQL developer?
Thank you!
Table 1:
Ptid
dt
insr_amt
op_amt
A
1/1/2021
30
20
A
1/2/2021
30
10
A
1/3/2021
30
10
A
1/4/2021
30
30
B
1/6/2021
10
10
B
1/7/2021
20
10
C
2/1/2021
15
30
C
2/2/2021
15
30
C
2/6/2021
60
30
Table 2:
Ptid
start_dt
end_dt
total_insr_amt
total_op_amt
A
1/1/2021
1/4/2021
120
70
B
1/6/2021
1/7/2021
30
20
C
2/1/2021
2/2/2021
30
60
C
2/6/2021
2/6/2021
60
30
You didn't mention the specific database so this solution works in PostgreSQL. You can do:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select *,
sum(inc) over(partition by ptid order by dt) as grp
from (
select *,
case when dt - interval '1 day' = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
Result:
ptid start_dt end_dt total_insr_amt total_op_amt
----- ---------- ---------- -------------- -----------
A 2021-01-01 2021-01-04 120 70
B 2021-01-06 2021-01-07 30 20
C 2021-02-01 2021-02-02 30 60
C 2021-02-06 2021-02-06 60 30
See running example at DB Fiddle 1.
EDIT for Oracle
As requested, the modified query that works in Oracle is:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select x.*,
sum(inc) over(partition by ptid order by dt) as grp
from (
select t.*,
case when dt - 1 = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
See running example at db<>fiddle 2.

How to return id users buy several months consecutive?

How can I get all user_id values from the data below, for all rows containing the same user_id value over consecutive months from a given start date in the date column.
For example, given the below table....
date
user_id
2018-11-01
13
2018-11-01
13
2018-11-01
14
2018-11-01
15
2018-12-01
13
2019-01-01
13
2019-01-01
14
...supposing I want to get the user_id values for consecutive months prior to (but not including) 2019-01-01 then I'd have this as my output:
user_id
m_year
13
2018-11
13
2018-12
13
2019-01
probably can be applied windows function
If you want to aggregate on a user and the year-months
select
t.user_id,
to_char(date_trunc('month',t.date),'YYYY-MM') as m_year
from yourtable t
where t.date < '2019-02-01'::date
group by t.user_id, date_trunc('month',t.date)
order by t.user_id, m_year
But if you only want those with consecutive months, then a little extra is needed.
select
user_id,
to_char(ym,'YYYY-MM') as m_year
from
(
select t.user_id
, date_trunc('month',t.date) as ym
, lag(date_trunc('month',t.date))
over (partition by t.user_id order by date_trunc('month',t.date)) as prev_ym
, lead(date_trunc('month',t.date))
over (partition by t.user_id order by date_trunc('month',t.date)) as next_ym
from yourtable t
where t.date < '2019-02-01'::date
group by t.user_id, date_trunc('month',t.date)
) q
where (ym - prev_ym <= '31 days'::interval or
next_ym - ym <= '31 days'::interval)
order by user_id, ym
user_id | m_year
------: | :------
13 | 2018-11
13 | 2018-12
13 | 2019-01
db<>fiddle here
you don't need a window function in this specific query. Just try :
SELECT DISTINCT ON (user_id) user_id, date_trunc('month', date :: date) AS m_year
FROM your_table

Create date pairs from list of dates in one column table

I have a problem with a SQL query. I have a list of dates in one column, I would like to create pairs of dates. The dates are sequenced, so I have to match the first date with the second and create a record, then the third date with the fourth date and create a record etc .. as in the following example:
ID DATA
50 10/04/2019
50 12/04/2019
50 13/04/2019
50 17/04/2019
50 18/04/2019
50 19/04/2019
ID DATA_START DATA_END
50 10/04/2019 12/04/2019
50 13/04/2019 17/04/2019
50 18/04/2019 19/04/2019
Thanks very much everyone for the help
You should mark your rows that should be grouped together (into single row) and which date will have which role (start or end).
Here's the code:
with a as (
/*Source data*/
select 50 as id, convert(date, '2019-04-10', 23) as dt union all
select 50 as id, convert(date, '2019-04-12', 23) as dt union all
select 50 as id, convert(date, '2019-04-13', 23) as dt union all
select 50 as id, convert(date, '2019-04-17', 23) as dt union all
select 50 as id, convert(date, '2019-04-18', 23) as dt union all
select 50 as id, convert(date, '2019-04-19', 23) as dt
)
select
id,
[1] as dt_start,
[0] as dt_end
from (
select
id,
dt,
/*
the first row (with modulo = 1) is date from
and the second row (with modulo = 0) is date to
*/
(row_number() over(partition by id order by dt)) % 2 as dt_role,
/*Integer division by 2 will group rows together*/
(row_number() over(partition by id order by dt) + 1) / 2 as dt_group
from a
) as s
pivot (
max(dt) for dt_role in ([0], [1])
) as p
GO
id | dt_start | dt_end
-: | :--------- | :---------
50 | 2019-04-10 | 2019-04-12
50 | 2019-04-13 | 2019-04-17
50 | 2019-04-18 | 2019-04-19
db<>fiddle here

sql date difference with multiple variables

I'm trying to get the number of days difference in dates between the effdate status 0 that follows the most recent status 1
the code below yields the following results
SELECT * FROM
(SELECT FILEKEY, STATUS, EFFDATE FROM ASTATUSHIST
UNION
SELECT FILEKEY, ASTATUS, ASTATUSEFFDATE FROM USERS ) A
ORDER BY 1, 3 DESC
130 0 2019-10-25 00:00:00.000
130 0 2017-03-01 00:00:00.000
130 0 2017-01-01 00:00:00.000
130 1 2005-02-01 00:00:00.000
130 0 2001-03-03 00:00:00.000
130 0 2000-01-30 00:00:00.000
130 0 2000-01-01 00:00:00.000
this code combines 2 tables to get the complete history for a given user.
Ideally I could produce something that looks like this:
130 4352
or
125 null
where the null is filekey without a status 1 or a filekey with a status 1 but without a following status 0
Thanks
In all supported versions of SQL Server, you can use window functions:
with t as (
<your query here>
)
select t.*,
datediff(day, date, next_date) as days_diff
from (select t.*,
row_number() over (partition by filekey, status order by date desc) as seqnum,
lead(date) over (partition by filekey order by date) as next_date
from t
) t
where seqnum = 1;

SQL - Find if column dates include at least partially a date range

I need to create a report and I am struggling with the SQL script.
The table I want to query is a company_status_history table which has entries like the following (the ones that I can't figure out)
Table company_status_history
Columns:
| id | company_id | status_id | effective_date |
Data:
| 1 | 10 | 1 | 2016-12-30 00:00:00.000 |
| 2 | 10 | 5 | 2017-02-04 00:00:00.000 |
| 3 | 11 | 5 | 2017-06-05 00:00:00.000 |
| 4 | 11 | 1 | 2018-04-30 00:00:00.000 |
I want to answer to the question "Get all companies that have been at least for some point in status 1 inside the time period 01/01/2017 - 31/12/2017"
Above are the cases that I don't know how to handle since I need to add some logic of type :
"If this row is status 1 and it's date is before the date range check the next row if it has a date inside the date range."
"If this row is status 1 and it's date is after the date range check the row before if it has a date inside the date range."
I think this can be handled as a gaps and islands problem. Consider the following input data: (same as sample data of OP plus two additional rows)
id company_id status_id effective_date
-------------------------------------------
1 10 1 2016-12-15
2 10 1 2016-12-30
3 10 5 2017-02-04
4 10 4 2017-02-08
5 11 5 2017-06-05
6 11 1 2018-04-30
You can use the following query:
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
ORDER BY company_id, effective_date
to get:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 0
2 10 1 2016-12-30 1
3 10 5 2017-02-04 2
4 10 4 2017-02-08 2
5 11 5 2017-06-05 0
6 11 1 2018-04-30 0
Now you can identify status = 1 islands using:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
)
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
Output:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 1
2 10 1 2016-12-30 1
3 10 5 2017-02-04 1
4 10 4 2017-02-08 2
5 11 5 2017-06-05 1
6 11 1 2018-04-30 2
Calculated field grp will help us identify those islands:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
), CTE2 AS
(
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
)
SELECT company_id,
MIN(effective_date) AS start_date,
CASE
WHEN COUNT(*) > 1 THEN DATEADD(DAY, -1, MAX(effective_date))
ELSE MIN(effective_date)
END AS end_date
FROM CTE2
GROUP BY company_id, grp
HAVING COUNT(CASE WHEN status_id = 1 THEN 1 END) > 0
Output:
company_id start_date end_date
-----------------------------------
10 2016-12-15 2017-02-03
11 2018-04-30 2018-04-30
All you want know is those records from above that overlap with the specified interval.
Demo here with somewhat more complicated use case.
Maybe this is what you are looking for? For these kind of questions, you need to join two instance of your table, in this case I am just joining with next record by Id, which probably is not totally correct. To do it better, you can create a new Id using a windowed function like row_number, ordering the table by your requirement criteria
If this row is status 1 and it's date is before the date range check
the next row if it has a date inside the date range
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
else NULL
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
Implementing second criteria:
"If this row is status 1 and it's date is after the date range check
the row before if it has a date inside the date range."
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
when csh1.status_id=1 and csh1.effective_date>#range_en
then
case
when csh3.effective_date between #range_st and #range_en then true
else false
end
else null -- ¿?
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
left join company_status_history csh3
on csh1.id=csh3.id-1
I would suggest the use of a cte and the window functions ROW_NUMBER. With this you can find the desired records. An example:
DECLARE #t TABLE(
id INT
,company_id INT
,status_id INT
,effective_date DATETIME
)
INSERT INTO #t VALUES
(1, 10, 1, '2016-12-30 00:00:00.000')
,(2, 10, 5, '2017-02-04 00:00:00.000')
,(3, 11, 5, '2017-06-05 00:00:00.000')
,(4, 11, 1, '2018-04-30 00:00:00.000')
DECLARE #StartDate DATETIME = '2017-01-01';
DECLARE #EndDate DATETIME = '2017-12-31';
WITH cte AS(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) AS rn
FROM #t
),
cteLeadLag AS(
SELECT c.*, ISNULL(c2.effective_date, c.effective_date) LagEffective, ISNULL(c3.effective_date, c.effective_date)LeadEffective
FROM cte c
LEFT JOIN cte c2 ON c2.company_id = c.company_id AND c2.rn = c.rn-1
LEFT JOIN cte c3 ON c3.company_id = c.company_id AND c3.rn = c.rn+1
)
SELECT 'Included' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Following' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date > #EndDate
AND LagEffective BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Trailing' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date < #EndDate
AND LeadEffective BETWEEN #StartDate AND #EndDate
I first select all records with their leading and lagging Dates and then I perform your checks on the inclusion in the desired timespan.
Try with this, self-explanatory. Responds to this part of your question:
I want to answer to the question "Get all companies that have been at
least for some point in status 1 inside the time period 01/01/2017 -
31/12/2017"
Case that you want to find those id's that have been in any moment in status 1 and have records in the period requested:
SELECT *
FROM company_status_history
WHERE id IN
( SELECT Id
FROM company_status_history
WHERE status_id=1 )
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'
Case that you want to find id's in status 1 and inside the period:
SELECT *
FROM company_status_history
WHERE status_id=1
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'