Rolling min/max in SQL Query? - sql

The database engine is SQLite3. It's a simple table:
CREATE TABLE T (ID INTEGER, DATE STRING, VALUE NUMERIC);
-- rows of T:
id date value
1 2020-01-01 11
2 2020-01-01 23
3 2020-01-01 32
4 2020-01-01 41
5 2020-01-01 57
6 2020-01-01 62
How can I create a rolling min/max? Say of period 3:
id date val min3 max3
1 2020-01-01 11 11 11
2 2020-01-01 23 11 11
3 2020-01-01 32 11 32
4 2020-01-01 41 23 41
5 2020-01-01 57 32 57
5 2020-01-01 62 41 62
I keep getting min 11 Max 62 for everything because I don't know how to do the rolling min/max

You can use window functions:
select t.*,
min(val) over (order by date rows between 2 preceding and current row) min3,
max(val) over (order by date rows between 2 preceding and current row) max3
from t;

Related

Why does my cumulative column not work as expected?

Here is my query , I have a column called cum_balance which is supposed to calculate the cumulative balance but after row number 10 there is an anamoly and it doesn't work as expected , all I notice is that from row number 10 onwards the hour column has same value. What's the right syntax for this?
[select
hour,
symbol,
amount_usd,
category,
sum(amount_usd) over (
order by
hour asc RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) as cum_balance
from
combined_transfers_usd_netflow
order by
hour][1]
I have tried removing RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW , adding a partition by hour and group by hour. None of them gave the expected result or errors
Row Number
Hour
SYMBOL
AMOUNT_USD
CATEGORY
CUM_BALANCE
1
2021-12-02 23:00:00
WETH
227.2795
in
227.2795
2
2021-12-03 00:00:00
WETH
-226.4801153
out
0.7993847087
3
2022-01-05 21:00:00
WETH
5123.716203
in
5124.515587
4
2022-01-18 14:00:00
WETH
-4466.2366
out
658.2789873
5
2022-01-19 00:00:00
WETH
2442.618599
in
3100.897586
6
2022-01-21 14:00:00
USDC
99928.68644
in
103029.584
7
2022-03-01 16:00:00
UNI
8545.36098
in
111574.945
8
2022-03-04 22:00:00
USDC
-2999.343
out
108575.602
9
2022-03-09 22:00:00
USDC
-5042.947675
out
103532.6543
10
2022-03-16 21:00:00
USDC
-4110.6579
out
98594.35101
11
2022-03-16 21:00:00
UNI
-3.209306045
out
98594.35101
12
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
13
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
14
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
15
2022-03-16 21:00:00
UNI
-6.418612089
out
98594.35101
The "problem" with your data in all the ORDER BY values after row 10 are the same.
So if we shrink the data down a little, and use for groups to repeat the experiment:
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,sum(val) over ( partition by grp order by date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as cum_val_1
,sum(val) over ( partition by grp order by date ) as cum_val_2
from data as d
order by 1,2;
we get:
GRP
DATE
VAL
CUM_VAL_1
CUM_VAL_2
1
2021-01-01
10
10
10
1
2021-01-02
11
21
21
1
2021-01-03
12
33
33
2
2021-01-01
20
20
20
2
2021-01-02
21
63
63
2
2021-01-02
22
63
63
2
2021-01-04
23
86
86
we see with group 1 that values accumulate as we expect. So for group 2 we put duplicate values as see those rows get the same value, but rows after "work as expected again".
This tells us how this function work across unstable data (values that sort the same) is that they are all stepped in one leap.
Thus if you want each row to be different they will need better ORDER distinctness. This could be forced by add random values of literal random nature, or feeling non random ROW_NUMBER, but really they would be random, albeit not explicit, AND if you use random, you might get duplicates, thus really should use ROW_NUMBER or SEQx to have unique values.
Also the second formula shows they are equal, and it's the ORDER BY problem not the framing of "which rows" are used.
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,seq8() as s
,sum(val) over ( partition by grp order by date ) as cum_val_1
,sum(val) over ( partition by grp order by date, s ) as cum_val_2
,sum(val) over ( partition by grp order by date, seq8() ) as cum_val_3
from data as d
order by 1,2;
gives:
GRP
DATE
VAL S
CUM_VAL_1
CUM_VAL_2
CUM_VAL_2_2
1
2021-01-01
10
0
10
10
1
2021-01-02
11
1
21
21
1
2021-01-03
12
2
33
33
2
2021-01-01
20
3
20
20
2
2021-01-02
21
4
63
41
2
2021-01-02
22
5
63
63
2
2021-01-04
23
6
86
86

sum every 7 rows from column sales while ints representing n days away from installation of promotion-material (before and after the installation)

2 Stores, each with its sales data per day. Both get equipped with promotion material but not at the same day. After the pr_day the promotion material will stay there. Meaning, there should be a sales boost from the day of the installation of the promotion material.
Installation Date:
Store A - 05/15/2019
Store B - 05/17/2019
To see if the promotion was a success we measure the sales before the pr-date and after by returning number of sales (not revenue but pieces sold) next to the int, indicating how far away it was from the pr-day: (sum of sales from both stores)
pr_date| sales
-28 | 35
-27 | 40
-26 | 21
-25 | 36
-24 | 29
-23 | 36
-22 | 43
-21 | 31
-20 | 32
-19 | 21
-18 | 17
-17 | 34
-16 | 34
-15 | 37
-14 | 32
-13 | 29
-12 | 25
-11 | 45
-10 | 43
-9 | 26
-8 | 27
-7 | 33
-6 | 36
-5 | 17
-4 | 34
-3 | 33
-2 | 21
-1 | 28
1 | 16
2 | 6
3 | 16
4 | 29
5 | 32
6 | 30
7 | 30
8 | 30
9 | 17
10 | 12
11 | 35
12 | 30
13 | 15
14 | 28
15 | 14
16 | 16
17 | 13
18 | 27
19 | 22
20 | 34
21 | 33
22 | 22
23 | 13
24 | 35
25 | 28
26 | 19
27 | 17
28 | 29
you may noticed, that i already removed the day from the installation of the promotion material.
The issue starts with the different installation date of the pr-material. If I group by weekday it will combine the sales from different days away from the installation. It will just start at whatever weekday i define:
Select DATEDIFF(wk, change_date, sales_date), sum(sales)
from tbl_sales
group by DATEDIFF(wk, change_date, sales_date)
result:
week | sales
-4 | 75
-3 | 228
-2 | 204
-1 | 235
0 | 149
1 | 173
2 | 151
3 | 167
4 | 141
the numbers are not from the right days and there is one week to many. Guess this is comming from sql grouping the sales starting from Sunday and because the pr_dates are different it generates more than just the 8 weeks (4 before, 4 after)
trying to find a sustainable solution i couldn't find the right fit and decided to post it here. Very thankfull for every thoughts of the community about this topics. Quite sure there is a smart solution for this problem cause it doesn't look like a rare request to me
I tried it with over as well but i don't see how to sum the 7 days together as they are not date days anymore but delta to the pr-date
Desired Result:
week | sales
-4 | 240
-3 | 206
-2 | 227
-1 | 202
1 | 159
2 | 167
3 | 159
4 | 163
Attachment from my analysis by hand what the Results should be:
Why do i need the weekly summary -> the Stores are performing differently depending on the weekday. With summing 7 days together I make sure we don't compare mondays to sundays and so on. Furthermore, the result will be represented in a Line- or Barchart where you could see the weekday variation in a ugly way. Meaning it will be hard for your eyes to see the trend/devolopment of the salesnumbers. Whereas the weekly comparison will absorb this variations.
If anything is unclear please feel free to let me know so i could provide you with futher details
Thank you very much
Additional the different Installation date overview:
Shop A:
store A
delta date sales
-28 17.04.2019 20
-27 18.04.2019 20
-26 19.04.2019 13
-25 20.04.2019 25
-24 21.04.2019 16
-23 22.04.2019 20
-22 23.04.2019 26
-21 24.04.2019 15
-20 25.04.2019 20
-19 26.04.2019 13
-18 27.04.2019 13
-17 28.04.2019 20
-16 29.04.2019 21
-15 30.04.2019 20
-14 01.05.2019 17
-13 02.05.2019 13
-12 03.05.2019 9
-11 04.05.2019 34
-10 05.05.2019 28
-9 06.05.2019 19
-8 07.05.2019 14
-7 08.05.2019 23
-6 09.05.2019 18
-5 10.05.2019 9
-4 11.05.2019 22
-3 12.05.2019 17
-2 13.05.2019 14
-1 14.05.2019 19
0 15.05.2019 11
1 16.05.2019 0
2 17.05.2019 0
3 18.05.2019 1
4 19.05.2019 19
5 20.05.2019 18
6 21.05.2019 14
7 22.05.2019 11
8 23.05.2019 12
9 24.05.2019 8
10 25.05.2019 7
11 26.05.2019 19
12 27.05.2019 15
13 28.05.2019 15
14 29.05.2019 11
15 30.05.2019 5
16 31.05.2019 8
17 01.06.2019 10
18 02.06.2019 19
19 03.06.2019 14
20 04.06.2019 21
21 05.06.2019 22
22 06.06.2019 7
23 07.06.2019 6
24 08.06.2019 23
25 09.06.2019 17
26 10.06.2019 9
27 11.06.2019 8
28 12.06.2019 23
Shop B:
store B
delta date sales
-28 19.04.2019 15
-27 20.04.2019 20
-26 21.04.2019 8
-25 22.04.2019 11
-24 23.04.2019 13
-23 24.04.2019 16
-22 25.04.2019 17
-21 26.04.2019 16
-20 27.04.2019 12
-19 28.04.2019 8
-18 29.04.2019 4
-17 30.04.2019 14
-16 01.05.2019 13
-15 02.05.2019 17
-14 03.05.2019 15
-13 04.05.2019 16
-12 05.05.2019 16
-11 06.05.2019 11
-10 07.05.2019 15
-9 08.05.2019 7
-8 09.05.2019 13
-7 10.05.2019 10
-6 11.05.2019 18
-5 12.05.2019 8
-4 13.05.2019 12
-3 14.05.2019 16
-2 15.05.2019 7
-1 16.05.2019 9
0 17.05.2019 9
1 18.05.2019 16
2 19.05.2019 6
3 20.05.2019 15
4 21.05.2019 10
5 22.05.2019 14
6 23.05.2019 16
7 24.05.2019 19
8 25.05.2019 18
9 26.05.2019 9
10 27.05.2019 5
11 28.05.2019 16
12 29.05.2019 15
13 30.05.2019 17
14 31.05.2019 9
15 01.06.2019 8
16 02.06.2019 3
17 03.06.2019 8
18 04.06.2019 8
19 05.06.2019 13
20 06.06.2019 11
21 07.06.2019 15
22 08.06.2019 7
23 09.06.2019 12
24 10.06.2019 11
25 11.06.2019 10
26 12.06.2019 9
27 13.06.2019 6
28 14.06.2019 9
Try
select wk, sum(sales)
from (
select
isnull(sa.sales,0) + isnull(sb.sales,0) sales
, isnull(sa.delta , sb.delta) delta
, case when isnull(sa.delta , sb.delta) = 0 then 0
else case when isnull(sa.delta , sb.delta) > 0 then (isnull(sa.delta , sb.delta) -1) /7 +1
else (isnull(sa.delta , sb.delta) +1) /7 -1
end
end wk
from shopA sa
full join shopB sb on sa.delta=sb.delta
) t
group by wk;
sql fiddle
A more readable version, it doesn't run faster, just using CROSS APLLY this way allows to indroduce sort of intermediate variables for cleaner code.
select wk, sum(sales)
from (
select
isnull(sa.sales,0) + isnull(sb.sales,0) sales
, dlt delta
, case when dlt = 0 then 0
else case when dlt > 0 then (dlt - 1) / 7 + 1
else (dlt + 1) / 7 - 1
end
end wk
from shopA sa
full join shopB sb on sa.delta=sb.delta
cross apply (
select dlt = isnull(sa.delta, sb.delta)
) tmp
) t
group by wk;
Finally, if you already have a query which produces a dataset with the (pr_date, sales) columns
select wk, sum(sales)
from (
select sales
, case when pr_date = 0 then 0
else case when pr_date > 0 then (pr_date - 1) / 7 + 1
else (pr_date + 1) / 7 - 1
end
end wk
from (
-- ... you query here ...
)pr_date_sales
) t
group by wk;
I think you just need to take the day difference and use arithmetic. Using datediff() with week counts week-boundaries -- which is not what you want. That is, it normalizes the weeks to calendar weeks.
You want to leave out the day of the promotion, which makes this a wee bit more complicated.
I think this is the logic:
Select v.week_diff, sum(sales)
from tbl_sales s cross join
(values (case when change_date < sales_date
then (datediff(day, change_date, sales_date) + 1) / 7
else (datediff(day, change_date, sales_date) - 1) / 7
end)
) v(week_diff)
where change_date <> sales_date
group by v.week_diff;
There might be an off-by-one problem, depending on what you really want to do when the dates are the same.

Aggregate result from query by quarter SQL

Lets say I have a table which holds all exports for some time back in Microsoft SQL database:
Name:
ExportTable
Columns:
id - numeric(18)
exportdate - datetime
In order to get the number of exports per week I can run the following query:
SELECT DATEPART(ISO_WEEK,[exportdate]) as 'exportdate', count(exportdate) as 'totalExports'
FROM [ExportTable]
Group By DATEPART(ISO_WEEK,[exportdate])
order by exportdate;
Returns:
exportdate totalExports
---------- ------------
27 13
28 12
29 15
30 8
31 17
32 10
33 7
34 15
35 4
36 18
37 10
38 14
39 14
40 21
41 19
Would it be possible to aggregate the week results by quarter so the output becomes something like the bellow?
UPDATE
Sorry for not being crystal clear, I would like the current result to add upp with previous result up to a new quarter.
Note week 41 contains 21+19 = 40
Week 39 contains 157 (13+12+15+8+17+10+7+15+4+18+10+14+14)
exportdate totalExports Quarter
---------- ------------ -------
27 13 3
28 25 3
29 40 3
30 48 3
31 65 3
32 75 3
33 82 3
34 97 3
35 101 3
36 119 3
37 129 3
38 143 3
39 157 3 -- Sum of 3 Quarter values.
40 21 4 -- New Quarter show current week value
41 40 4 -- (21+19)
You can use this.
SELECT
DATEPART(ISO_WEEK,[exportdate]) as 'exportdate'
, SUM( count(exportdate) ) OVER ( PARTITION BY DATEPART(QUARTER,MIN([exportdate])) ORDER BY DATEPART(ISO_WEEK,[exportdate]) ROWS UNBOUNDED PRECEDING ) as 'totalExports'
, DATEPART(QUARTER,MIN([exportdate])) [Quarter]
FROM [ExportTable]
Group By DATEPART(ISO_WEEK,[exportdate])
order by exportdate;
You could use a case statement to separate the dates into quarters.
e.g.
CASE
WHEN EXPORT_DATE BETWEEN '1' AND '4' THEN 1
WHEN Export_Date BETWEEN '5' and '9' THEN 2
ELSE 0 AS [Quarter]
END
Its just an example but you get the idea.
You could then use the alias from the case
SELECT DATEPART(ISO_WEEK,[exportdate]) as 'exportdate', count(exportdate) as 'totalExports', DATEPART(quarter,[exportdate]) as quarter FROM [ExportTable] Group By DATEPART(ISO_WEEK,[exportdate]), DATEPART(quarter,[exportdate]) order by exportdate;

How to calculate a running total that is a distinct sum of values

Consider this dataset:
id site_id type_id value date
------- ------- ------- ------- -------------------
1 1 1 50 2017-08-09 06:49:47
2 1 2 48 2017-08-10 08:19:49
3 1 1 52 2017-08-11 06:15:00
4 1 1 45 2017-08-12 10:39:47
5 1 2 40 2017-08-14 10:33:00
6 2 1 30 2017-08-09 07:25:32
7 2 2 32 2017-08-12 04:11:05
8 3 1 80 2017-08-09 19:55:12
9 3 2 75 2017-08-13 02:54:47
10 2 1 25 2017-08-15 10:00:05
I would like to construct a query that returns a running total for each date by type. I can get close with a window function, but I only want the latest value for each site to be summed for the running total (a simple window function will not work because it sums all values up to a date--not just the last values for each site). So I guess it could be better described as a running distinct total?
The result I'm looking for would be like this:
type_id date sum
------- ------------------- -------
1 2017-08-09 06:49:47 50
1 2017-08-09 07:25:32 80
1 2017-08-09 19:55:12 160
1 2017-08-11 06:15:00 162
1 2017-08-12 10:39:47 155
1 2017-08-15 10:00:05 150
2 2017-08-10 08:19:49 48
2 2017-08-12 04:11:05 80
2 2017-08-13 02:54:47 155
2 2017-08-14 10:33:00 147
The key here is that the sum is not a running sum. It should only be the sum of the most recent values for each site, by type, at each date. I think I can help explain it by walking through the result set I've provided above. For my explanation, I'll walk through the original data chronologically and try to explain the expected result.
The first row of the result starts us off, at 2017-08-09 06:49:47, where chronologically, there is only one record of type 1 and it is 50, so that is our sum for 2017-08-09 06:49:47.
The second row of the result is at 2017-08-09 07:25:32, at this point in time we have 2 unique sites with values for type_id = 1. They have values of 50 and 30, so the sum is 80.
The third row of the result occurs at 2017-08-09 19:55:12, where now we have 3 sites with values for type_id = 1. 50 + 30 + 80 = 160.
The fourth row is where it gets interesting. At 2017-08-11 06:15:00 there are 4 records with a type_id = 1, but 2 of them are for the same site. I'm only interested in the most recent value for each site so the values I'd like to sum are: 30 + 80 + 52 resulting in 162.
The 5th row is similar to the 4th since the value for site_id:1, type_id:1 has changed again and is now 45. This results in the latest values for type_id:1 at 2017-08-12 10:39:47 are now: 30 + 80 + 45 = 155.
Reviewing the 6th row is also interesting when we consider that at 2017-08-15 10:00:05, site 2 has a new value for type_id 1, which gives us: 80 + 45 + 25 = 150 for 2017-08-15 10:00:05.
You can get a cumulative total (running total) by including an ORDER BY clause in your window frame.
select
type_id,
date,
sum(value) over (partition by type_id order by date) as sum
from your_table;
The ORDER BY works because
The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
SELECT type_id,
date,
SUM(value) OVER (PARTITION BY type_id ORDER BY type_id, date) - (SUM(value) OVER (PARTITION BY type_id, site_id ORDER BY type_id, date) - value) AS sum
FROM your_table
ORDER BY type_id,
date

Difference between rows in sql based on rows number (SQL Server)

I have a problem. I have table with following columns and sample data:
RN Date Time
---------------------
1 2015-02-02 12
2 2015-02-02 25
3 2015-02-02 27
1 2015-02-08 42
2 2015-02-08 45
1 2015-03-01 60
2 2015-03-01 62
3 2015-03-01 63
4 2015-03-01 63
I need get a difference between time start and time end of every day.
For example:
27-12
45-42
63-60
Any suggestions? :)
select
Date, max(Time) as mx, min(Time) as mn, max(Time) - min(Time) as diff
from table_name
group by Date