cumlative sum missing values of the month in sql - sql

i have input data below
date amount
01-01-2020 10
01-02-2020 15
01-03-2020 10
01-05-2020 20
01-06-2020 30
01-08-2020 5
01-09-2020 6
01-10-2020 10
select sum(date),over(partition date) from table;
after add the missing month values i need output
output
Date amount cum_sum
01-01-2020 10 10
01-02-2020 15 25
01-03-2020 10 35
01-04-2020 0 35
01-05-2020 20 55
01-06-2020 30 85
01-07-2020 0 85
01-08-2020 5 90
01-09-2020 6 96
01-10-2020 10 106

You would typically generate the dates with a recursive query, then use window functions.
You don't tell which database you use. The exact syntax of recursive queries and date artithmetics varies across vendors, but here is what it would look like:
with recursive all_dates (dt, max_dt) as (
select min(date) dt, max(date) max_dt from mytable
union all
select dt + interval '1' day, max_dt from all_dates where dt < max_dt
)
select d.dt, sum(t.amount) over(order by c.dt) amount
from all_dates d
left join mytable t on t.date = d.dt
order by d.dt

You simply want a window function:
select t.*, sum(amount) over (order by date)
from table t

Related

Group items from the first time + certain time period

I want to group orders from the same customer if they happen within 10 minutes of the first order, then find the next first order and group them and so on.
Ex:
Customer group orders
6 1 3
2 4,5
3 8
7 1 9,10
2 11,12
3 13
id customer time
3 6 2021-05-12 12:14:22.000000
4 6 2021-05-12 12:24:24.000000
5 6 2021-05-12 12:29:16.000000
8 6 2021-05-12 13:01:40.000000
9 7 2021-05-14 12:13:11.000000
10 7 2021-05-14 12:20:01.000000
11 7 2021-05-14 12:45:00.000000
12 7 2021-05-14 12:48:41.000000
13 7 2021-05-14 12:58:16.000000
18 9 2021-05-18 12:22:13.000000
25 15 2021-05-18 13:44:02.000000
26 16 2021-05-17 09:39:02.000000
27 16 2021-05-18 19:38:43.000000
28 17 2021-05-18 15:40:02.000000
29 18 2021-05-19 15:32:53.000000
30 18 2021-05-19 15:45:56.000000
31 18 2021-05-19 16:29:09.000000
34 15 2021-05-24 15:45:14.000000
35 15 2021-05-24 15:45:14.000000
36 19 2021-05-24 17:14:53.000000
Here is what I have currently, I think that it is currently not grouping by customer when case when d.StartTime > dateadd(minute, 10, c.first_time) so it compares StartTime of all orders for all customers.
with
data as (select Customer,StartTime,Id, row_number() over(partition by Customer order by StartTime) rn from orders t),
cte as (
select d.*, StartTime as first_time
from data d
where rn = 1
union all
select d.*,
case when d.StartTime > dateadd(minute, 10, c.first_time)
then d.StartTime
else c.first_time
end
from cte c
inner join data d on d.rn = c.rn + 1
)
select c.*, dense_rank() over(partition by Customer order by first_time) grp
from cte c;'
I have two databases (MySQL & SQL Server) having similar schema so either would work for me.
Try the following on SQL Server:
SELECT customer,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY grp) AS group_no,
STRING_AGG(id, ',') AS orders
FROM
(
SELECT id,customer, [time],
(DATEDIFF(SECOND, MIN([time]) OVER (PARTITION BY CUSTOMER), [time])/60)/10 grp
FROM orders
) T
GROUP BY customer, grp
ORDER BY customer
See a demo.
According to your posted requirement, you are trying to divide the period between the first order date and the last order date into groups (or let's say time frames) each one is 10 minutes long.
What I did in this query: for each customer order, find the difference between the order date and the minimum date (first customer order date) in seconds and then divide it by 10 to get it's time frame number. i.e. for a difference = 599s the frame number = 599/60 =9m /10 = 0. for a difference = 620s the frame number = 620/60 =10m /10 = 1.
After defining the correct groups/time frames for each order you can simply use the STRING_AGG function to get the desired output. Noting that the STRING_AGG function applies to SQL Server 2017 (14.x) and later.

How to create a start and end date with no gaps from one date column and to sum a value within the dates

I am new SQL coding using in SQL developer.
I have a table that has 4 columns: Patient ID (ptid), service date (dt), insurance payment amount (insr_amt), out of pocket payment amount (op_amt). (see table 1 below)
What I would like to do is (1) create two columns "start_dt" and "end_dt" using the "dt" column where if there are no gaps in the date by the patient ID then populate the start and end date with the first and last date by patient ID, however if there is a gap in service date within the patient ID then to create the separate start and end date rows per patient ID, along with (2) summing the two payment amounts by patient ID with in the one set of start and end date visits (see table 2 below).
What would be the way to run this using SQL code in SQL developer?
Thank you!
Table 1:
Ptid
dt
insr_amt
op_amt
A
1/1/2021
30
20
A
1/2/2021
30
10
A
1/3/2021
30
10
A
1/4/2021
30
30
B
1/6/2021
10
10
B
1/7/2021
20
10
C
2/1/2021
15
30
C
2/2/2021
15
30
C
2/6/2021
60
30
Table 2:
Ptid
start_dt
end_dt
total_insr_amt
total_op_amt
A
1/1/2021
1/4/2021
120
70
B
1/6/2021
1/7/2021
30
20
C
2/1/2021
2/2/2021
30
60
C
2/6/2021
2/6/2021
60
30
You didn't mention the specific database so this solution works in PostgreSQL. You can do:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select *,
sum(inc) over(partition by ptid order by dt) as grp
from (
select *,
case when dt - interval '1 day' = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
Result:
ptid start_dt end_dt total_insr_amt total_op_amt
----- ---------- ---------- -------------- -----------
A 2021-01-01 2021-01-04 120 70
B 2021-01-06 2021-01-07 30 20
C 2021-02-01 2021-02-02 30 60
C 2021-02-06 2021-02-06 60 30
See running example at DB Fiddle 1.
EDIT for Oracle
As requested, the modified query that works in Oracle is:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select x.*,
sum(inc) over(partition by ptid order by dt) as grp
from (
select t.*,
case when dt - 1 = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
See running example at db<>fiddle 2.

SQL : create intermediate data from date range

I have a table as shown here:
USER
ROI
DATE
1
5
2021-11-24
1
4
2021-11-26
1
6
2021-11-29
I want to get the ROI for the dates in between the other dates, expected result will be as below
From 2021-11-24 to 2021-11-30
USER
ROI
DATE
1
5
2021-11-24
1
5
2021-11-25
1
4
2021-11-26
1
4
2021-11-27
1
4
2021-11-28
1
6
2021-11-29
1
6
2021-11-30
You may use a calendar table approach here. Create a table containing all dates and then join with it. Sans an actual table, you may use an inline CTE:
WITH dates AS (
SELECT '2021-11-24' AS dt UNION ALL
SELECT '2021-11-25' UNION ALL
SELECT '2021-11-26' UNION ALL
SELECT '2021-11-27' UNION ALL
SELECT '2021-11-28' UNION ALL
SELECT '2021-11-29' UNION ALL
SELECT '2021-11-30'
),
cte AS (
SELECT USER, ROI, DATE, LEAD(DATE) OVER (ORDER BY DATE) AS NEXT_DATE
FROM yourTable
)
SELECT t.USER, t.ROI, d.dt
FROM dates d
INNER JOIN cte t
ON d.dt >= t.DATE AND (d.dt < t.NEXT_DATE OR t.NEXT_DATE IS NULL)
ORDER BY d.dt;

How to include values that count nothing on certain day (APEX)

I have this query:
SELECT
COUNT(ID) AS FREQ,
TO_CHAR(TRUNC(CREATED_AT),'DD-MON') DATES
FROM TICKETS
WHERE TRUNC(CREATED_AT) > TRUNC(SYSDATE) - 32
GROUP BY TRUNC(CREATED_AT)
ORDER BY TRUNC(CREATED_AT) ASC
This counts how many tickets where created every day for the past month.
The result looks something like this: (first 10 rows)
FREQ DATES
3 28-DEC
4 04-JAN
8 05-JAN
1 06-JAN
4 07-JAN
5 08-JAN
2 11-JAN
6 12-JAN
3 13-JAN
8 14-JAN
The linechart that I created looks like this:
The problem is that the days where tickets are not created (in particular the weekends) the line just goes straight to the day where there is created a ticket.
Is there a way in APEX or in my query to include the days that aren't counted?
As commented, using one of row generator techniques you'd create a "calendar" table and outer join it with a table that contains data you're displaying.
Something like this (see comments within code):
SQL> with yours (amount, datum) as
2 -- your sample table
3 (select 100, date '2021-01-01' from dual union all
4 select 200, date '2021-01-03' from dual union all
5 select 300, date '2021-01-07' from dual
6 ),
7 minimax as
8 -- MIN and MAX date (so that they could be used in row generator --> CALENDAR CTE (below)
9 (select min(datum) min_datum,
10 max(datum) max_datum
11 from yours
12 ),
13 calendar as
14 -- calendar, from MIN to MAX date in YOUR table
15 (select min_datum + level - 1 datum
16 from minimax
17 connect by level <= max_datum - min_datum + 1
18 )
19 -- final query uses outer join
20 select c.datum,
21 nvl(y.amount, 0) amount
22 from calendar c left join yours y on y.datum = c.datum
23 order by c.datum;
DATUM AMOUNT
---------- ----------
01.01.2021 100
02.01.2021 0
03.01.2021 200
04.01.2021 0
05.01.2021 0
06.01.2021 0
07.01.2021 300
7 rows selected.
SQL>
Applied to your current query:
WITH
minimax
AS
-- MIN and MAX date (so that they could be used in row generator --> CALENDAR CTE (below)
(SELECT MIN (created_at) min_datum, MAX (created_at) max_datum
FROM tickets),
calendar
AS
-- calendar, from MIN to MAX date in YOUR table
( SELECT min_datum + LEVEL - 1 datum
FROM minimax
CONNECT BY LEVEL <= max_datum - min_datum + 1)
-- final query uses outer join
SELECT COUNT (t.id) AS freq, TO_CHAR (TRUNC (c.datum), 'DD-MON') dates
FROM calendar c LEFT JOIN tickets t ON t.created_at = c.datum
WHERE TRUNC (t.created_at) > TRUNC (SYSDATE) - 32
GROUP BY TRUNC (c.datum)
ORDER BY dates ASC
I added a with clause to generate last 31 days, then I left joined with your base table like below.
with last_31_days as (
select trunc(sysdate) - 32 + level dt from dual connect by trunc(sysdate) - 32 + level < trunc(sysdate)
)
SELECT
nvl(COUNT(t.ID), 0) AS FREQ,
TO_CHAR(
nvl(TRUNC(t.CREATED_AT), a.dt)
,'DD-MON') DATES
FROM last_31_days a
LEFT JOIN TICKETS t
ON TRUNC(t.CREATED_AT) = a.dt
GROUP BY nvl(TRUNC(t.CREATED_AT), a.dt)
ORDER BY 2 ASC
;
#Littlefoot answer is perfect. but here is a cheeky way to get the similar table with format match OP output. using a simple cte for this.
WITH cte AS (
SELECT To_Char(Trunc(SYSDATE - ROWNUM),'DD-MON') dtcol
FROM DUAL
CONNECT BY ROWNUM < 366
)
SELECT * FROM cte
here is db<>fiddle
and then you can simply join this cte to fill up empty date. as the origin output column date looks like a string column.
connect by is for oracle only. but I think you can still use recursive cte to get similar result in other DBMS support recursive cte.

get median in overlap time range

vertica db, for example, have a table called revenue:
date revenue
2016-07-12 1
2016-07-12 10
2016-07-12 5
2016-07-12 3
2016-07-13 7
2016-07-13 120
2016-07-13 22
2016-07-14 5
2016-07-14 17
The tricky thing is I don't want median for each date but I want to calculate the median revenue for the timerange >= given each day, for example the result would be like:
daterange median_revenue
>= 2016-07-12 7
>= 2016-07-13 17
>= 2016-07-14 11
to be clear:
7 = median(1,10,5,3,7,120,22,5,17)
17 = median(7,120,22,5,17)
11 = median(5,17)
How could I write a sql script for these daterange? Is there an easy way to query? I don't want to calculate in each daterange then union because there are many days.
Would this help?
SELECT
date_table.[date],
MEDIAN (r.revenue) AS median_revenue
FROM
(SELECT DISTINCT [date] FROM revenue) date_table
LEFT JOIN revenue r ON r.[date] >= r_main.[date]
GROUP BY
date_table.[date]
just figured out
select distinct date, median(revenue) over (partition by date) as rev_median
from (select a.date,b.revenue
from (select distinct date from revenue_test) a
left outer join revenue b
on a.date<=b.date order by a.date,b.date) a ;`