oracle SQL counter restarts when time difference is > x - sql

I want to create a new column in my query whereby it takes into account the difference of the current rows datetime - previous datetime. This column could be a counter where if the difference is <-100, it stays as 1, but once there difference is > -100, the column is 0.
Ideally then I would want to only pull in the rows that come after the last 0 record.
My query:
with products as (
select * from (
select distinct
ID,
UnixDateTime,
OrderNumber,
to_date('1970-01-01','YYYY-MM-DD') + numtodsinterval(UnixDateTime,'SECOND')+ 1/24 as "Date_Time"
from DB
where
(date '1970-01-01' + UnixDateTime * interval '1' second) + interval '1' hour
> sysdate - interval '2' day
)
),
prod_prev AS (
SELECT p.*,
lag("Date_Time")over(order by "Date_Time" ASC) as Previous_Time,
lag(UnixDateTime)over(order by "Date_Time" ASC) as UnixDateTime_Previous_Time "Date_Time") - "Date_Time" AS diff
FROM products p
),
run_sum AS (
SELECT p.*, "Date_Time"-Previous_Time as "Diff", UnixDateTime_Previous_Time-UnixDateTime AS "UnixDateTime_Diff"
FROM prod_prev p
)
SELECT * FROM run_sum
ORDER By UnixDateTime, "Date_Time" DESC
my query result from above query:
ID
UnixDateTime
OrderNumber
Date_Time
Previous_Time
diff
UnixDateTime_Diff
1
1662615688
100
08-SEP-2022 06:41:28
(null)
(null)
(null)
2
1662615752
100
08-SEP-2022 06:42:32
08-SEP-2022 06:41:28
0.00074
-64
3
1662615765
100
08-SEP-2022 06:42:45
008-SEP-2022 06:42:32
0.000150
-13
4
1662615859
100
08-SEP-2022 06:44:19
08-SEP-2022 06:42:45
0.001088
-128
5
1662615987
100
08-SEP-2022 06:46:27
08-SEP-2022 06:44:19
0.00148
-44
6
1662616031
100
08-SEP-2022 06:47:11
08-SEP-2022 06:46:27
0.00051
-36
the counter is the below example should be 1 if the UnixDateTime_Diff is < -100 and 0 if its >-100
then if I could only pull in records AFTER the most recent 0 record.

You use:
lag("Date_Time")over(order by "Date_Time" DESC)
And get the previous value when the values are ordered in DESCending order; this will get the previous higher value. If you want the previous lower value then either use:
lag("Date_Time") over (order by "Date_Time" ASC)
or
lead("Date_Time") over (order by "Date_Time" DESC)
If you want to perform row-by-row processing then, from Oracle 12, you can use MATCH_RECOGNIZE:
SELECT id,
unixdatetime,
ordernumber,
date_time,
next_unixdatetime,
next_unixdatetime - unixdatetime AS diff,
CASE cls
WHEN 'WITHIN_100' THEN 1
ELSE 0
END AS within_100
from (
select distinct
ID,
UnixDateTime,
OrderNumber,
TIMESTAMP '1970-01-01 00:00:00 UTC' + UnixDateTime * INTERVAL '1' SECOND
AS Date_Time
from DB
where TIMESTAMP '1970-01-01 00:00:00 UTC' + UnixDateTime * INTERVAL '1' SECOND
> SYSTIMESTAMP - INTERVAL '2' DAY
)
MATCH_RECOGNIZE(
ORDER BY unixdatetime
MEASURES
NEXT(unixdatetime) AS next_unixdatetime,
classifier() AS cls
ALL ROWS PER MATCH
PATTERN (within_100* any_row)
DEFINE
within_100 AS NEXT(unixdatetime) < unixdatetime + 100
) m
Which, for the sample data:
CREATE TABLE db (ID, UnixDateTime, OrderNumber) AS
SELECT 1, 1662615688, 100 FROM DUAL UNION ALL
SELECT 2, 1662615752, 100 FROM DUAL UNION ALL
SELECT 3, 1662615765, 100 FROM DUAL UNION ALL
SELECT 4, 1662615859, 100 FROM DUAL UNION ALL
SELECT 5, 1662615987, 100 FROM DUAL UNION ALL
SELECT 6, 1662616031, 100 FROM DUAL;
Outputs:
ID
UNIXDATETIME
ORDERNUMBER
DATE_TIME
NEXT_UNIXDATETIME
DIFF
WITHIN_100
1
1662615688
100
2022-09-08 05:41:28.000000000 UTC
1662615752
64
1
2
1662615752
100
2022-09-08 05:42:32.000000000 UTC
1662615765
13
1
3
1662615765
100
2022-09-08 05:42:45.000000000 UTC
1662615859
94
1
4
1662615859
100
2022-09-08 05:44:19.000000000 UTC
1662615987
128
0
5
1662615987
100
2022-09-08 05:46:27.000000000 UTC
1662616031
44
1
6
1662616031
100
2022-09-08 05:47:11.000000000 UTC
null
null
0
fiddle

Related

Calculating compound interest with deposits/withdraws

I'm trying to calculate the total on an interest-bearing account accounting for deposits/withdraws with BigQuery.
Example scenario:
Daily interest rate = 10%
Value added/removed on every day: [100, 0, 29, 0, -100] (negative means amount removed)
The totals for each day are:
Day 1: 0*1.1 + 100 = 100
Day 2: 100*1.1 + 0 = 110
Day 3: 110*1.1 + 29 = 150
Day 4: 150*1.1 + 0 = 165
Day 5: 165*1.1 - 100 = 81.5
This would be trivial to implement in a language like Python
daily_changes = [100, 0, 29, 0, -100]
interest_rate = 0.1
result = []
for day, change in enumerate(daily_changes):
if day == 0:
result.append(change)
else:
result.append(result[day-1]*(1+interest_rate) + change)
print(result)
# Result: [100, 110.00000000000001, 150.00000000000003, 165.00000000000006, 81.50000000000009]
My difficulty lies in calculating values for row N when they depend on row N-1 (the usual SUM(...) OVER (ORDER BY...) solution does not suffice here).
Here's a CTE to test with the mock data in this example.
with raw_data as (
select 1 as day, numeric '100' as change union all
select 2 as day, numeric '0' as change union all
select 3 as day, numeric '29' as change union all
select 4 as day, numeric '0' as change union all
select 5 as day, numeric '-100' as change
)
select * from raw_data
You may try below:
SELECT day,
ROUND((SELECT SUM(c * POW(1.1, day - o - 1))
FROM t.changes c WITH OFFSET o), 2) AS totals
FROM (
SELECT *, ARRAY_AGG(change) OVER (ORDER BY day) changes
FROM raw_data
) t;
+-----+--------+
| day | totals |
+-----+--------+
| 1 | 100.0 |
| 2 | 110.0 |
| 3 | 150.0 |
| 4 | 165.0 |
| 5 | 81.5 |
+-----+--------+
Another option with use of recursive CTE
with recursive raw_data as (
select 1 as day, numeric '100' as change union all
select 2 as day, numeric '0' as change union all
select 3 as day, numeric '29' as change union all
select 4 as day, numeric '0' as change union all
select 5 as day, numeric '-100' as change
), iterations as (
select *, change as total
from raw_data where day = 1
union all
select r.day, r.change, 1.1 * i.total + r.change
from iterations i join raw_data r
on r.day = i.day + 1
)
select *
from iterations
with output

Project data and cumulative sum forward

I am trying to push the last value of a cumulative dataset forward to present time.
Initialise test data:
drop table if exists test_table;
create table test_table
as select data_date::date, floor(random() * 10) as data_value
from
generate_series('2021-08-25'::date, '2021-08-31'::date, '1 day') data_date;
The above test data produces something like this:
data_date data_value cumulative_value
2021-08-25 1 1
2021-08-26 7 8
2021-08-27 8 16
2021-08-28 7 23
2021-08-29 2 25
2021-08-30 2 27
2021-08-31 7 34
What I wish to do, is push the last data value (2021-08-31 7) forward to present time. For example, say today's date was 2021-09-03, I would want the result to be something like:
data_date data_value cumulative_value
2021-08-25 1 1
2021-08-26 7 8
2021-08-27 8 16
2021-08-28 7 23
2021-08-29 2 25
2021-08-30 2 27
2021-08-31 7 34
2021-09-01 7 41
2021-09-02 7 48
2021-09-03 7 55
You need to get the value of the last date in the table. Common table expression is a good way to do that:
with cte as (
select data_value as last_val
from test_table
order by data_date desc
limit 1)
select
gen_date::date as data_date,
coalesce(data_value, last_val) as data_value,
sum(coalesce(data_value, last_val)) over (order by gen_date) as cumulative_sum
from generate_series('2021-08-25'::date, '2021-09-03', '1 day') as gen_date
left join test_table on gen_date = data_date
cross join cte
Test it in db<>fiddle.
You may use union and a scalar subquery to find the latest value of data_value for for the new rows. cumulative_value is re-evaluated.
select *, sum(data_value) over (rows between unbounded preceding and current row) as cumulative_value
from
(
select data_date, data_value from test_table
UNION all
select rd, (select data_value from test_table where data_date = '2021-08-31')
from generate_series('2021-09-01'::date, '2021-09-03', '1 day') rd
) t
order by data_date;
And here it is a bit smarter w/o fixed date literals.
with cte(latest_date) as (select max(data_date) from test_table)
select *, sum(data_value) over (rows between unbounded preceding and current row) as cumulative_value
from
(
select data_date, data_value from test_table
UNION ALL
select rd::date, (select data_value from test_table, cte where data_date = latest_date)
from generate_series((select latest_date from cte) + 1, CURRENT_DATE, '1 day') rd
) t
order by data_date;
SQL Fiddle here.

Create date pairs from list of dates in one column table

I have a problem with a SQL query. I have a list of dates in one column, I would like to create pairs of dates. The dates are sequenced, so I have to match the first date with the second and create a record, then the third date with the fourth date and create a record etc .. as in the following example:
ID DATA
50 10/04/2019
50 12/04/2019
50 13/04/2019
50 17/04/2019
50 18/04/2019
50 19/04/2019
ID DATA_START DATA_END
50 10/04/2019 12/04/2019
50 13/04/2019 17/04/2019
50 18/04/2019 19/04/2019
Thanks very much everyone for the help
You should mark your rows that should be grouped together (into single row) and which date will have which role (start or end).
Here's the code:
with a as (
/*Source data*/
select 50 as id, convert(date, '2019-04-10', 23) as dt union all
select 50 as id, convert(date, '2019-04-12', 23) as dt union all
select 50 as id, convert(date, '2019-04-13', 23) as dt union all
select 50 as id, convert(date, '2019-04-17', 23) as dt union all
select 50 as id, convert(date, '2019-04-18', 23) as dt union all
select 50 as id, convert(date, '2019-04-19', 23) as dt
)
select
id,
[1] as dt_start,
[0] as dt_end
from (
select
id,
dt,
/*
the first row (with modulo = 1) is date from
and the second row (with modulo = 0) is date to
*/
(row_number() over(partition by id order by dt)) % 2 as dt_role,
/*Integer division by 2 will group rows together*/
(row_number() over(partition by id order by dt) + 1) / 2 as dt_group
from a
) as s
pivot (
max(dt) for dt_role in ([0], [1])
) as p
GO
id | dt_start | dt_end
-: | :--------- | :---------
50 | 2019-04-10 | 2019-04-12
50 | 2019-04-13 | 2019-04-17
50 | 2019-04-18 | 2019-04-19
db<>fiddle here

HIVE - compute statistics over partitions with window based on date

I've seen solutions for problems similar to mine, but none quite work for me. Also I'm confident that there should be a way to make it work.
Given a table with
ID
Date
target
1
2020-01-01
1
1
2020-01-02
1
1
2020-01-03
0
1
2020-01-04
1
1
2020-01-04
0
1
2020-06-01
1
1
2020-06-02
1
1
2020-06-03
0
1
2020-06-04
1
1
2020-06-04
0
2
2020-01-01
1
ID is BIGINT, target is Int and Date is DATE
I want to compute, for each ID/Date, the sum and the number of rows for the same ID in the 3 months and 12 months before the Date (inclusive). Example of output:
ID
Date
Sum_3
Count_3
Sum_12
Count_12
1
2020-01-01
1
1
1
1
1
2020-01-02
2
2
2
2
1
2020-01-03
2
3
2
3
1
2020-01-04
3
5
3
5
1
2020-06-01
1
1
4
6
1
2020-06-02
2
2
5
7
1
2020-06-03
2
3
6
8
1
2020-06-04
3
5
7
10
2
2020-01-01
1
1
1
1
How can I get this time of results in HIVE?
I'm not sure if I should use analytical functions (and how), group by, etc...?
If you can live with an approximation of months as a number of days, then you can use window functions in Hive:
select id, date,
count(*) over(
partition by id
order by unix_timestamp(date)
range 60 * 60 * 24 * 90 preceding -- 90 days
) as count_3,
sum(target) over(
partition by id
order by unix_timestamp(date)
range 60 * 60 * 24 * 90 preceding
) as sum_3,
count(*) over(
partition by id
order by unix_timestamp(date)
range 60 * 60 * 24 * 360 preceding -- 360 days
) as count_12,
sum(target) over(
partition by id
order by unix_timestamp(date)
range 60 * 60 * 24 * 360 preceding
) as sum_12
from mytable
You can aggregate in the same query:
select id, date,
sum(count(*)) over(
partition by id
order by unix_timestamp(date)
range 60 * 60 * 24 * 90 preceding -- 90 days
) as count_3,
sum(sum(target)) over(
partition by id
order by unix_timestamp(date)
range 60 * 60 * 24 * 90 preceding
) as sum_3,
sum(count(*)) over(
partition by id
order by unix_timestamp(date)
range 60 * 60 * 24 * 360 preceding -- 360 days
) as count_12,
sum(sum(target)) over(
partition by id
order by unix_timestamp(date)
range 60 * 60 * 24 * 360 preceding
) as sum_12
from mytable
group by id, date, unix_timestamp(date)
If you can do an estimation of interval (1 month = 30 days): (an improvement of GMB's answer)
with t as (
select ID, Date,
sum(target) target,
count(target) c_target
from table
group by ID, Date
)
select ID, Date,
sum(target) over(
partition by ID
order by unix_timestamp(Date, 'yyyy-MM-dd')
range 60 * 60 * 24 * 90 preceding
) sum_3,
sum(c_target) over(
partition by ID
order by unix_timestamp(Date, 'yyyy-MM-dd')
range 60 * 60 * 24 * 90 preceding
) count_3,
sum(target) over(
partition by ID
order by unix_timestamp(Date, 'yyyy-MM-dd')
range 60 * 60 * 24 * 360 preceding
) sum_12,
sum(c_target) over(
partition by ID
order by unix_timestamp(Date, 'yyyy-MM-dd')
range 60 * 60 * 24 * 360 preceding
) count_12
from t
Or if you want exact intervals, you can do self joins (but expensive):
with t as (
select ID, Date,
sum(target) target,
count(target) c_target
from table
group by ID, Date
)
select
t_3month.ID,
t_3month.Date,
t_3month.sum_3,
t_3month.count_3,
sum(t3.target) sum_12,
sum(t3.c_target) count_12
from (
select
t1.ID,
t1.Date,
sum(t2.target) sum_3,
sum(t2.c_target) count_3
from t t1
left join t t2
on t2.Date > t1.Date - interval 3 month and
t2.Date <= t1.Date and
t1.ID = t2.ID
group by t1.ID, t1.Date
) t_3month
left join t t3
on t3.Date > t_3month.Date - interval 12 month and
t3.Date <= t_3month.Date and
t_3month.ID = t3.ID
group by t_3month.ID, t_3month.Date, t_3month.sum_3, t_3month.count_3
order by ID, Date;

Calculate standdard deviation over time

I have information about sales per day. For example:
Date - Product - Amount
01-07-2020 - A - 10
01-03-2020 - A - 20
01-02-2020 - B - 10
Now I would like to know the average sales per day and the standard deviation for the last year. For average I can just count the number of entries per item, and then count 365-amount of entries and take that many 0's, but I wonder what the best way is to calculate the standard deviation while incorporating the 0's for the days there are not sales.
Use a hierarchical (or recursive) query to generate daily dates for the year and then use a PARTITION OUTER JOIN to join it to your product data then you can find the average and standard deviation with the AVG and STDDEV aggregation functions and use COALESCE to fill in NULL values with zeroes:
WITH start_date ( dt ) AS (
SELECT DATE '2020-01-01' FROM DUAL
),
calendar ( dt ) AS (
SELECT dt + LEVEL - 1
FROM start_date
CONNECT BY dt + LEVEL - 1 < ADD_MONTHS( dt, 12 )
)
SELECT product,
AVG( COALESCE( amount, 0 ) ) AS average_sales_per_day,
STDDEV( COALESCE( amount, 0 ) ) AS stddev_sales_per_day
FROM calendar c
LEFT OUTER JOIN (
SELECT t.*
FROM test_data t
INNER JOIN start_date s
ON (
s.dt <= t."DATE"
AND t."DATE" < ADD_MONTHS( s.dt, 12 )
)
) t
PARTITION BY ( t.product )
ON ( c.dt = t."DATE" )
GROUP BY product
So, for your sample data:
CREATE TABLE test_data ( "DATE", Product, Amount ) AS
SELECT DATE '2020-07-01', 'A', 10 FROM DUAL UNION ALL
SELECT DATE '2020-03-01', 'A', 20 FROM DUAL UNION ALL
SELECT DATE '2020-02-01', 'B', 10 FROM DUAL;
This outputs:
PRODUCT | AVERAGE_SALES_PER_DAY | STDDEV_SALES_PER_DAY
:------ | ----------------------------------------: | ----------------------------------------:
A | .0819672131147540983606557377049180327869 | 1.16752986363678031669548047505759328696
B | .027322404371584699453551912568306010929 | .5227083734893166933219264686616717636897
db<>fiddle here