bigquery - calculate monthly outstanding values - google-bigquery

I'm trying to solve the following problem:
a user took three loans with running times of 3,4 and 5 months.
How to calculate in BigQuery for each point in time, how much he owns?
I know to do this calculation in R or Python but would clearly prefer a BigQuery/SQL solution.
Thank you!
I have the data:
Take Date Return Date Sum
2016-01-01 2016-03-31 10
2016-02-01 2016-05-31 20
2016-03-01 2016-07-31 50
I need the output like this:
Date Sum
2016-01-01 10
2016-02-01 30
2016-03-01 80
2016-04-01 70
2016-05-01 70
2016-06-01 50
2016-07-01 50
2016-08-01 0

Below is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, DATE '2016-01-01' take_date, DATE '2016-03-31' return_date, 10 amount
UNION ALL SELECT 1, DATE '2016-02-01', DATE '2016-05-31', 20
UNION ALL SELECT 1, DATE '2016-03-01', DATE '2016-07-31', 50
), dates AS (
SELECT id, day
FROM (
SELECT id, GENERATE_DATE_ARRAY(
MIN(take_date),
DATE_ADD(DATE_TRUNC(MAX(return_date), MONTH), INTERVAL 1 MONTH),
INTERVAL 1 MONTH
) days
FROM `project.dataset.table`
GROUP BY id
), UNNEST(days) day
)
SELECT d.id, d.day, SUM(IF(d.day BETWEEN t.take_date AND t.return_date, amount, 0)) amount
FROM dates d
LEFT JOIN `project.dataset.table` t
ON d.id = t.id
GROUP BY d.id, d.day
ORDER BY d.day
with result as
Row id day amount
1 1 2016-01-01 10
2 1 2016-02-01 30
3 1 2016-03-01 80
4 1 2016-04-01 70
5 1 2016-05-01 70
6 1 2016-06-01 50
7 1 2016-07-01 50
8 1 2016-08-01 0

Related

SQL statement to return the Min and Max amount of stock per article for a given Month

I have a table from which I am trying to return the quantity per day that the article was in the system.
Example is in table Bestand the are multiple palletes of a different articles that each have a Booking In and Out date; I am try to find out the Min and Max amount of stock that was in the system per article and month.
My thinking is that if I can return the stock quantity for each day and then read out the Min and Max values.
The Timespan would be set at the time of running the SQL and the articles would be fixed.
To find out the quantity for each day I have used the following SQL:
SELECT DISTINCT
a.artbez1 AS Artikelbezeichnung,
b.artikelnr AS Artikelnummer,
SUM(CASE WHEN TO_DATE('2019-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') BETWEEN b.neu_datum AND b.aender_datum THEN 1 * b.menge_ist ELSE 0 END) AS "01 Nov 2019"
FROM
artikel a, bestand b
WHERE
b.artikelnr IN ('273632002', .... (huge long list of numbers) ....)
AND b.artikelnr = a.artikelnr
GROUP BY
a.artbez1, b.artikelnr;
This returns for example:
ARTIKELBEZEICHNUNG
ARTIKELNUMMER
01 Nov 2019
SC-4400.CW
220450002
39
S-320.FK120
220502004
0
H-595.FK120
220800004
35
AC-548.FK209
220948032
0
AS-6800.CW
221355002
20
I would like return this for each day of the Month and then from that return the Min and Max Value for each Article
I have the following SQL to return the days of a given Month and was wondering if anyone had any ideas on how they could be combined (If at all possible):
SELECT to_date('01.11.2019','dd.mm.yyyy')+LEVEL-1
FROM dual
CONNECT BY LEVEL <= TO_CHAR(LAST_DAY(to_date('01.11.2019','dd.mm.yyyy')),'DD')
DATES
2019-11-01 00:00:00
2019-11-02 00:00:00
2019-11-03 00:00:00
2019-11-04 00:00:00
2019-11-05 00:00:00
2019-11-06 00:00:00
2019-11-07 00:00:00
The result i am try to get would be something like:
ARTIKELBEZEICHNUNG
ARTIKELNUMMER
Nov 19 Min
Nov 19 Max
SC-4400.CW
220450002
5
39
S-320.FK120
220502004
0
15
H-595.FK120
220800004
2
35
AC-548.FK209
220948032
0
0
AS-6800.CW
221355002
10
20
Is this at all possible in SQL?
Thanks for taking the time to read my post.
JeRi
You can use a partitioned outer join:
WITH calendar ( day ) AS (
SELECT DATE '2019-11-01'
FROM DUAL
UNION ALL
SELECT day + INTERVAL '1' DAY
FROM calendar
WHERE day < LAST_DAY( DATE '2019-11-01' )
),
daily_totals ( artbez1, Artikelnr, Day, total_menge_ist ) AS (
SELECT MAX( ab.artbez1 ),
ab.artikelnr,
c.day,
COALESCE( SUM( ab.menge_ist ), 0 )
FROM calendar c
LEFT OUTER JOIN
( SELECT a.artikelnr,
a.artbez1,
b.neu_datum,
b.aender_datum,
b.menge_ist
FROM artikel a
LEFT JOIN bestand b
ON ( a.artikelnr = b.artikelnr )
-- WHERE b.artikelnr IN ('273632002', .... (huge long list of numbers) ....)
) ab
PARTITION BY ( ab.artikelnr, ab.artbez1 )
ON ( c.day BETWEEN ab.neu_datum AND ab.aender_datum )
GROUP BY ab.artikelnr, c.day
)
SELECT MAX( artbez1 ) AS Artikelbezeichnung,
artikelnr AS Artikelnummer,
TRUNC( day, 'MM' ) AS month,
MIN( total_menge_ist ) AS min_total_menge_ist,
MAX( total_menge_ist ) AS max_total_menge_ist
FROM daily_totals
GROUP BY artikelnr, TRUNC( day, 'MM' );
Which, for the sample data:
CREATE TABLE artikel ( artikelnr, artbez1 ) AS
SELECT 220450002, 'SC-4400.CW' FROM DUAL UNION ALL
SELECT 220502004, 'S-320.FK120' FROM DUAL UNION ALL
SELECT 220800004, 'H-595.FK120' FROM DUAL UNION ALL
SELECT 220948032, 'AC-548.FK209' FROM DUAL UNION ALL
SELECT 221355002, 'AS-6800.CW' FROM DUAL;
CREATE TABLE bestand ( artikelnr, neu_datum, aender_datum, menge_ist ) AS
SELECT 220450002, DATE '2019-10-30', DATE '2019-11-01', 20 FROM DUAL UNION ALL
SELECT 220450002, DATE '2019-11-01', DATE '2019-11-05', 19 FROM DUAL UNION ALL
SELECT 220502004, DATE '2019-11-05', DATE '2019-11-03', 5 FROM DUAL UNION ALL
SELECT 220800004, DATE '2019-11-01', DATE '2019-11-15', 35 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-20', DATE '2019-11-05', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-25', DATE '2019-11-10', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-28', DATE '2019-11-13', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-30', DATE '2019-11-15', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-11-05', DATE '2019-11-20', 5 FROM DUAL;
Outputs:
ARTIKELBEZEICHNUNG | ARTIKELNUMMER | MONTH | MIN_TOTAL_MENGE_IST | MAX_TOTAL_MENGE_IST
:----------------- | ------------: | :------------------ | ------------------: | ------------------:
SC-4400.CW | 220450002 | 2019-11-01 00:00:00 | 0 | 39
S-320.FK120 | 220502004 | 2019-11-01 00:00:00 | 0 | 0
AC-548.FK209 | 220948032 | 2019-11-01 00:00:00 | 0 | 0
H-595.FK120 | 220800004 | 2019-11-01 00:00:00 | 0 | 35
AS-6800.CW | 221355002 | 2019-11-01 00:00:00 | 0 | 25
db<>fiddle here

How to fill the time gap after grouping date record for months in postgres

I have table records as -
date n_count
2020-02-19 00:00:00 4
2020-07-14 00:00:00 1
2020-07-17 00:00:00 1
2020-07-30 00:00:00 2
2020-08-03 00:00:00 1
2020-08-04 00:00:00 2
2020-08-25 00:00:00 2
2020-09-23 00:00:00 2
2020-09-30 00:00:00 3
2020-10-01 00:00:00 11
2020-10-05 00:00:00 12
2020-10-19 00:00:00 1
2020-10-20 00:00:00 1
2020-10-22 00:00:00 1
2020-11-02 00:00:00 376
2020-11-04 00:00:00 72
2020-11-11 00:00:00 1
I want to be grouped all the records into months for finding month total count which is working, but there is a missing of month. how to fill this gap.
time month_count
"2020-02-01" 4
"2020-07-01" 4
"2020-08-01" 5
"2020-09-01" 5
"2020-10-01" 26
"2020-11-01" 449
This is what I have tried.
SELECT (date_trunc('month', date))::date AS time,
sum(n_count) as month_count
FROM table1
group by time
order by time asc
You can use generate_series() to generate all starts of months between the earliest and latest date available in the table, then bring the table with a left join:
select d.dt, coalesce(sum(t.n_count), 0) as month_count
from (
select generate_series(date_trunc('month', min(date)), date_trunc('month', max(date)), '1 month') as dt
from table1
) as d(dt)
left join table1 t on t.date >= d.dt and t.date < d.dt + interval '1 month'
group by d.dt
order by d.dt
I would simply UNION a date series, generated from MIN and MAX date:
demo:db<>fiddle
WITH cte AS ( -- 1
SELECT
*,
date_trunc('month', date)::date AS time
FROM
t
)
SELECT
time,
SUM(n_count) as month_count --3
FROM (
SELECT
time,
n_count
FROM cte
UNION
SELECT -- 2
generate_series(
(SELECT MIN(time) FROM cte),
(SELECT MAX(time) FROM cte),
interval '1 month'
)::date,
0
) s
GROUP BY time
ORDER BY time
Use CTE to calculate date_trunc only once. Could be left out if you like to call your table twice in the UNION below
Generate monthly date series from MIN to MAX date containing your n_count value = 0. Add it to the table
Do your calculation

ORACLE SQL - How to find the number of reliefs each teacher has, each day, 2 months before the teacher resigned?

I need some help in finding the number of reliefs each teacher has, every single day, 2 months before the teacher resigns.
Join_dt - teacher's join date,
Resign_dt - teacher's resign date,
Relief_ID - Relief teacher's ID,
Start_dt - Relief's start date,
End_dt - Relief's end date,
note that there may be overlapping dates between 2 or more different reliefs and so I need to find the number of distinct reliefs each teacher has for each date.
This is what I am given:
Teacher_ID Join_dt Resign_dt Relief_ID Start_dt End_dt
12 2006-08-30 2019-08-01 20 2017-02-07 2019-07-04
12 2006-08-30 2019-08-01 20 2016-11-10 2019-01-30
12 2006-08-30 2019-08-01 103 2016-08-20 2019-07-29
12 2006-08-30 2019-08-01 17 2016-01-30 2017-12-30
23 2017-10-01 2018-11-12 44 2018-10-19 2018-11-11
23 2017-10-01 2018-11-12 29 2018-04-01 2018-12-02
23 2017-10-01 2018-11-12 06 2017-11-25 2018-05-02
05 2015-02-11 2019-10-02 38 2019-01-17 2019-07-21
05 2015-02-11 2019-10-02 11 2018-11-02 2019-02-05
05 2015-02-11 2019-10-02 15 2018-09-30 2018-10-03
Expected result:
Teacher_ID Dates No_of_reliefs
12 2019-07-31 0
12 2019-07-30 0
12 2019-07-29 1
12 2019-07-28 1
12 2019-07-27 1
... ...
12 2019-07-04 2
... ...
12 2016-05-30 2
12 2016-05-29 2
12 2016-05-28 2
12 2016-05-27 2
12 2016-05-26 1
23 2018-10-31 2
... ...
For date 2019-07-29, No_of_reliefs = 1 because of Relief_ID 103.
For date 2017-07-04, No_of_reliefs = 2 because of Relief_ID 20 & 103.
Dates are supposed to start from 1 month before the teacher resigned. For Teacher_ID 23, since she resigned on 2019-11-12, dates shall start from 2019-10-31.
I have tried using connect by but the execution time is really long since it involves a large amount of data.
Any other methods will be greatly appreciated!!
Thank you kind souls!!!
You can use
connect by level <= last_day(add_months(Resign_dt,-1)) - add_months(Resign_dt,-2) clause :
I suppose you mean 2 months before resignment for the starting date, and ending on the last day of the previous month.
with t1(Teacher_ID,Resign_dt,Relief_ID,start_dt,end_dt) as
(
select 12,date'2019-08-01',20 ,date'2017-02-07',date'2019-07-04' from dual union all
select 12,date'2019-08-01',20 ,date'2016-11-10',date'2019-01-30' from dual union all
select 12,date'2019-08-01',103,date'2016-08-20',date'2019-07-29' from dual
......
), t2 as
(
select distinct last_day(add_months(Resign_dt,-1)) - level + 1 as Resign_dt, Teacher_ID
from t1
connect by level <= last_day(add_months(Resign_dt,-1)) - add_months(Resign_dt,-2)
and prior Teacher_ID = Teacher_ID and prior sys_guid() is not null
)
select Teacher_ID, to_char(Resign_dt,'yyyy-mm-dd') as Dates,
(select count(distinct Relief_ID)
from t1
where t2.Resign_dt between start_dt and end_dt
and t2.Teacher_ID = Teacher_ID
)
from t2
order by Teacher_ID, Resign_dt desc;
Demo
select d.dt
, tr.Teacher_ID
--, tr.Join_dt
--, tr.Resign_dt
, count(tr.Relief_ID)
--, tr.Start_dt
--, tr.End_dt
from tr
right outer join (
SELECT dt
FROM (
SELECT DATE '2006-01-01' + ROWNUM - 1 dt
FROM DUAL CONNECT BY ROWNUM < 5000
) q
WHERE EXTRACT(YEAR FROM dt) < EXTRACT(YEAR FROM sysdate) + 2
--order by 1
) d on d.dt between tr.Join_dt and tr.End_dt
and d.dt between tr.Start_dt and tr.Resign_dt
group by d.dt
, tr.Teacher_ID
order by d.dt desc

BigQuery: Computing the timestamp diff in time ordered rows in a group

Given a table like this, I would like to compute the time duration of each state before changing to a different state:
id state timestamp
1 1 2018-08-17 10:40:00
1 2 2018-08-17 12:40:00
1 1 2018-08-17 14:40:00
2 1 2018-08-17 09:00:00
2 2 2018-08-17 12:00:00
The output I want is:
id state date duration
1 1 2018-08-17 2 hours
1 2 2018-08-17 2 hours
1 1 2018-08-17 9 hours 20 minutes (until the end of the day in this case)
2 1 2018-08-17 3 hours
2 2 2018-08-17 12 hours (until the end of the day in this case)
I am not so sure whether this is doable in SQL. I feel like I have to write a UDF against aggregated state and timestamp (grouped by id and ordered by ts) which outputs an array of struct (id, state, date, and duration). This array can be flattened.
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, state,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
You can test, play with above using dummy data from your question:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 state, TIMESTAMP('2018-08-17 10:40:00') ts UNION ALL
SELECT 1, 2, '2018-08-17 12:40:00' UNION ALL
SELECT 1, 1, '2018-08-17 14:40:00' UNION ALL
SELECT 2, 1, '2018-08-17 09:00:00' UNION ALL
SELECT 2, 2, '2018-08-17 12:00:00'
)
SELECT id, state,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
-- ORDER BY id, ts
with result as below
Row id state duration_minutes
1 1 1 120
2 1 2 120
3 1 1 560
4 2 1 180
5 2 2 720
If you need your output formatted exactly the qay you showed in question - use below
#standardSQL
SELECT id, state, ts, duration_minutes,
FORMAT('%i hours %i minutes', DIV(duration_minutes, 60), MOD(duration_minutes, 60)) duration
FROM (
SELECT id, state, ts,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
)
In this case you output will look like below
Row id state ts duration_minutes duration
1 1 1 2018-08-17 10:40:00 UTC 120 2 hours 0 minutes
2 1 2 2018-08-17 12:40:00 UTC 120 2 hours 0 minutes
3 1 1 2018-08-17 14:40:00 UTC 560 9 hours 20 minutes
4 2 1 2018-08-17 09:00:00 UTC 180 3 hours 0 minutes
5 2 2 2018-08-17 12:00:00 UTC 720 12 hours 0 minutes
Sure, you will most likely still need to adjust above to your particular case - but you've got a good start I think

Calculate Every n record SQL

I have the following table:
oDateTime oValue
------------------------------------
2017-09:30 23:00:00 8
2017-09-30 23:15:00 7
2017-09-30 23:30:00 7
2017-09-30 23:45:00 7
2017-10-01 00:00:00 6
2017-10-01 00:15:00 5
2017-10-01 00:30:00 8
2017-10-01 00:45:00 7
2017-10-01 01:00:00 6
2017-10-01 01:15:00 9
2017-10-01 01:30:00 5
2017-10-01 01:45:00 6
2017-10-01 02:00:00 7
The table will have one record every 15 minutes. I want to SUM or Average those records every 15 minutes.
So, the result should be:
oDateTime Sum_Value Avg_Value
---------------------------------------------------
2017-10-01 00:00:00 35 7
2017-10-01 01:00:00 32 6.4
2017-10-01 02:00:00 33 6.6
the SUM for 2017-10-01 00:00:00 is taken from 5 records before it and so on.
does anyone know how to achieve this?
Thank you.
Here is one method in SQL Server 2008:
select t.oDateTime, tt.sum_value, tt.avg_value
from (select oDateTime
from t
where datepart(minute, oDateTime) = 0
) t outer apply
(select sum(oValue) as sum_value, avg(oValue) as avg_Value
from (select top 5 t2.*
from t t2
where t2.oDateTime <= t.oDateTime
order by t2.oDateTime desc
) tt
) tt;
In more recent versions of SQL Server, you can use window functions for this purpose.
Just join the table to itself, and group by the master timestamp
This below is easily adjustable, to include how many minutes back you want. Handles change in frequency, i.e. doesn't assume 5 rows wanted, so if the data came in in 5 minutes intervals this is handled.
select cast('2017-09-30 23:00:00' as datetime) t,8 o
into #a
union all
select '2017-09-30 23:15:00',7 union all
select '2017-09-30 23:30:00',7 union all
select '2017-09-30 23:45:00',7 union all
select '2017-10-01 00:00:00',6 union all
select '2017-10-01 00:15:00',5 union all
select '2017-10-01 00:30:00',8 union all
select '2017-10-01 00:45:00',7 union all
select '2017-10-01 01:00:00',6 union all
select '2017-10-01 01:15:00',9 union all
select '2017-10-01 01:30:00',5 union all
select '2017-10-01 01:45:00',6 union all
select '2017-10-01 02:00:00',7
select x.t,sum(x2.o),avg(cast(x2.o as float))
from #a x, #a x2
where x2.t between dateadd(mi,-60,x.t) and x.t
group by x.t