yearly average from monthly daterange data - sql

I have the following table in postgresql;
Value period
1 [2017-01-01,2017-02-01)
2 [2017-02-01,2017-03-01)
3 [2017-03-01,2017-04-01)
4 [2017-04-01,2017-05-01)
5 [2017-05-01,2017-06-01)
6 [2017-06-01,2017-07-01)
7 [2017-07-01,2017-08-01)
8 [2017-08-01,2017-09-01)
9 [2017-09-01,2017-10-01)
10 [2017-10-01,2017-11-01)
11 [2017-11-01,2017-12-01)
12 [2017-12-01,2018-01-01)
13 [2018-01-01,2018-02-01)
14 [2018-02-01,2018-03-01)
15 [2018-03-01,2018-04-01)
16 [2018-04-01,2018-05-01)
17 [2018-05-01,2018-06-01)
18 [2018-06-01,2018-07-01)
19 [2018-07-01,2018-08-01)
20 [2018-08-01,2018-09-01)
21 [2018-09-01,2018-10-01)
22 [2018-10-01,2018-11-01)
23 [2018-11-01,2018-12-01)
24 [2018-12-01,2019-01-01)
25 [2019-01-01,2019-02-01)
26 [2019-02-01,2019-03-01)
27 [2019-03-01,2019-04-01)
28 [2019-04-01,2019-05-01)
29 [2019-05-01,2019-06-01)
30 [2019-06-01,2019-07-01)
31 [2019-07-01,2019-08-01)
32 [2019-08-01,2019-09-01)
33 [2019-09-01,2019-10-01)
34 [2019-10-01,2019-11-01)
35 [2019-11-01,2019-12-01)
36 [2019-12-01,2020-01-01)
37 [2020-01-01,2020-02-01)
38 [2020-02-01,2020-03-01)
39 [2020-03-01,2020-04-01)
40 [2020-04-01,2020-05-01)
41 [2020-05-01,2020-06-01)
42 [2020-06-01,2020-07-01)
How can I get yearly average from monthly data in postgresql?
Note: Column Value is type integer and column period is type daterange.
The expected result should be
6.5 2017
18.5 2018
30.5 2019
39.5 2020

If your periods are always taking one month, including the lower bound and excluding the upper, you could try this
select
avg(value * 1.0) as average,
extract(year from lower(period)) as year
from table
group by year

Related

T-SQL creating a hierarchy out of orderly numbers

I have such table:
Id code
1 10
2 11
3 20
4 21
5 30
6 31
7 32
8 40
9 10
10 11
11 20
12 21
13 30
14 31
15 32
16 40
17 20
18 21
19 30
20 31
21 32
22 40
23 20
24 21
25 30
26 31
27 32
28 40
29 20
30 21
31 30
32 31
33 32
34 40
35 20
36 21
37 30
38 31
39 32
40 40
41 41
42 90
The column id represents simply the order of the records.
The column code represent the type of record.
The problem is that the records are part of a hierarchy, as shown here:
What I need to obtain is the parent of every record:
Id code Parent
1 10 1
2 11 1
3 20 1
4 21 3
5 30 3
6 31 3
7 32 3
8 40 3
9 10 9
10 11 9
11 20 9
12 21 11
13 30 11
14 31 11
15 32 11
16 40 11
17 20 9
18 21 17
19 30 17
20 31 17
21 32 17
22 40 17
23 20 9
24 21 23
25 30 23
26 31 23
27 32 23
28 40 23
29 20 9
30 21 29
31 30 29
32 31 29
33 32 29
34 40 29
35 20 9
36 21 35
37 30 35
38 31 35
39 32 35
40 40 35
41 41 40
42 90 42
The parent of every record should be expressed as its Id.
The rules are like this:
10s are their own parents since they are the roots
90s are their own parents since they are the end of data
20s parent is the previous 10
21 30 31 32 33 parent is the previous 20
40 and 50 parents is the previous 20
41 parent is the previous 40
As you can see the order in which records are is very important.
I tried to solve this declaratively (with lag() etc) and imperatively with loops but I could not find a solution.
Please help
This should work. Probably not optimal performance, but its pretty clear what its doing so should be easy to modify if (when!) your hierarchy changes.
It can obviously produce nulls if your hierarchy or ordering is not as you have prescribed
CREATE TABLE #data(id INT, code INT);
INSERT INTO #data values
(1 , 10),(2 , 11),(3 , 20),(4 , 21),(5 , 30),(6 , 31),(7 , 32),(8 , 40),(9 , 10),(10 , 11),
(11 , 20),(12 , 21),(13 , 30),(14 , 31),(15 , 32),(16 , 40),(17 , 20),(18 , 21),(19 , 30),(20 , 31),
(21 , 32),(22 , 40),(23 , 20),(24 , 21),(25 , 30),(26 , 31),(27 , 32),(28 , 40),(29 , 20),(30 , 21),
(31 , 30),(32 , 31),(33 , 32),(34 , 40),(35 , 20),(36 , 21),(37 , 30),(38 , 31),(39 , 32),(40 , 40),
(41 , 41),(42 , 90);
WITH
tens AS (SELECT id FROM #data WHERE code = 10),
twenties AS (SELECT id FROM #data WHERE code = 20),
forties AS (SELECT id FROM #data WHERE code = 40)
SELECT #data.id,
#data.code,
CASE WHEN code IN (10,90) THEN #data.id
WHEN code IN (11,20) THEN prev_ten.id
WHEN code IN (21,30,31,32,33,40,50) THEN prev_twenty.id
WHEN code = 41 THEN prev_forty.id
ELSE NULL
END AS Parent
FROM #data
OUTER APPLY (SELECT TOP (1) id FROM tens WHERE tens.id < #data.id ORDER BY tens.id DESC) AS prev_ten
OUTER APPLY (SELECT TOP (1) id FROM twenties WHERE twenties.id < #data.id ORDER BY twenties.id DESC) AS prev_twenty
OUTER APPLY (SELECT TOP (1) id FROM forties WHERE forties.id < #data.id ORDER BY forties.id DESC) AS prev_forty;
i think u should add FOREIGN KEY parentId referencing Id to existing table, fill this new column by UPDATE or gain data to fill it from external source and then u should do SELECT * FROM tableName ORDER BY parentId to receive tree structure

Display rows where multiple columns are different

I have data that looks like this. Thousands of rows returned, but this is just a sample.
Most days have the same numbers in them, but some do not. Note that ID 1 and 5 have identical numbers every day.
ID
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
1
26
26
26
26
26
26
26
2
44
44
30
30
44
44
44
3
55
55
55
55
80
90
55
4
12
12
43
43
43
43
43
5
36
36
36
36
36
36
36
I'd like to only return rows where the days of the week have different numbers.
In this case, the only IDs returned should be 2, 3 & 4.
What would I want this query to look like?
Thanks!
One idea that should work in most RDBMS (with some syntax tweaks) is the following.
This is SQL Server compatible: pivot the days into rows and count the distinct values and filter accordingly:
select id
from t
cross apply (
select Count(distinct d) from (
values(sunday),(monday),(tuesday),(wednesday),(thursday),(friday),(saturday)
)d(d)
)d(v)
where d.v>1

Hive Summing up data in the table based on the date range

Have a table with the following schema design and the data residing inside it is like:
ID HITS MISS DDATE
1 10 3 20180101
1 33 21 20180122
1 84 11 20180901
1 11 2 20180405
1 54 23 20190203
1 33 43 20190102
4 54 22 20170305
4 56 88 20180115
5 87 22 20180809
5 66 48 20180617
5 91 53 20170606
DataTypes:
ID INT
HITS INT
MISS INT
DDATE STRING
The requirement is to calculate the total of the given (HITS and MISS) on yearly basis i.e 2017,2018,2019...
Written the following query:
SELECT ID,
SUM(HITS) AS HITS,SUM(MISS) AS MISS,
CASE
WHEN DDATE BETWEEN '201701' AND '201712' THEN '2017' ELSE
'NOTHING' END AS TTL_YR17_DATA
CASE
WHEN DDATE BETWEEN '201801' AND '201812' THEN '2018' ELSE
'NOTHING' END AS TTL_YR18_DATA
CASE
WHEN DDATE BETWEEN '201901' AND '201912' THEN '2019' ELSE
'NOTHING' END AS TTL_YR19_DATA
FROM
HST_TABLE
WHERE
DDATE BETWEEN '201801' AND '201812'
GROUP BY
ID,DDATE;
But, the query is not fetching the expected result.
Actual O/P:
1 10 3 2018
1 33 21 2018
1 84 11 2018
1 11 2 2018
1 54 23 2019
1 33 43 2019
4 54 22 2017
4 56 88 2018
5 87 22 2018
5 66 48 2018
5 91 53 2017
Expected O/P:
1 138 37 2018
4 56 88 2018
5 153 70 2018
1 87 66 2019
5 91 53 2017
Another related question:
Is there a way that I can avoid passing the DDATE range in the query? As this should be given by the user and shouldn't be hardcoded.
Any help/advice to achieve the above two requirements will be really helpful.
OK,it's easy to implement this with the substring function in HIVE, as below:
select
substring(dddate,0,4) as the_year,
id,
sum(hits) as hits_num,
sum(miss) as miss_num
from
hst_table
group by
substring(dddate,0,4),
id
order by
the_year,
id
The answer above by #Shawn.X is correct but has a logical flaw. Below is the corrected one:
select
substring(ddate,0,4) as the_year,
id,
sum(hits) as hits_num,
sum(miss) as miss_num
from
hst_table
group by
substring(ddate,0,4),
id
order by
the_year,
id;

sum every 7 rows from column sales while ints representing n days away from installation of promotion-material (before and after the installation)

2 Stores, each with its sales data per day. Both get equipped with promotion material but not at the same day. After the pr_day the promotion material will stay there. Meaning, there should be a sales boost from the day of the installation of the promotion material.
Installation Date:
Store A - 05/15/2019
Store B - 05/17/2019
To see if the promotion was a success we measure the sales before the pr-date and after by returning number of sales (not revenue but pieces sold) next to the int, indicating how far away it was from the pr-day: (sum of sales from both stores)
pr_date| sales
-28 | 35
-27 | 40
-26 | 21
-25 | 36
-24 | 29
-23 | 36
-22 | 43
-21 | 31
-20 | 32
-19 | 21
-18 | 17
-17 | 34
-16 | 34
-15 | 37
-14 | 32
-13 | 29
-12 | 25
-11 | 45
-10 | 43
-9 | 26
-8 | 27
-7 | 33
-6 | 36
-5 | 17
-4 | 34
-3 | 33
-2 | 21
-1 | 28
1 | 16
2 | 6
3 | 16
4 | 29
5 | 32
6 | 30
7 | 30
8 | 30
9 | 17
10 | 12
11 | 35
12 | 30
13 | 15
14 | 28
15 | 14
16 | 16
17 | 13
18 | 27
19 | 22
20 | 34
21 | 33
22 | 22
23 | 13
24 | 35
25 | 28
26 | 19
27 | 17
28 | 29
you may noticed, that i already removed the day from the installation of the promotion material.
The issue starts with the different installation date of the pr-material. If I group by weekday it will combine the sales from different days away from the installation. It will just start at whatever weekday i define:
Select DATEDIFF(wk, change_date, sales_date), sum(sales)
from tbl_sales
group by DATEDIFF(wk, change_date, sales_date)
result:
week | sales
-4 | 75
-3 | 228
-2 | 204
-1 | 235
0 | 149
1 | 173
2 | 151
3 | 167
4 | 141
the numbers are not from the right days and there is one week to many. Guess this is comming from sql grouping the sales starting from Sunday and because the pr_dates are different it generates more than just the 8 weeks (4 before, 4 after)
trying to find a sustainable solution i couldn't find the right fit and decided to post it here. Very thankfull for every thoughts of the community about this topics. Quite sure there is a smart solution for this problem cause it doesn't look like a rare request to me
I tried it with over as well but i don't see how to sum the 7 days together as they are not date days anymore but delta to the pr-date
Desired Result:
week | sales
-4 | 240
-3 | 206
-2 | 227
-1 | 202
1 | 159
2 | 167
3 | 159
4 | 163
Attachment from my analysis by hand what the Results should be:
Why do i need the weekly summary -> the Stores are performing differently depending on the weekday. With summing 7 days together I make sure we don't compare mondays to sundays and so on. Furthermore, the result will be represented in a Line- or Barchart where you could see the weekday variation in a ugly way. Meaning it will be hard for your eyes to see the trend/devolopment of the salesnumbers. Whereas the weekly comparison will absorb this variations.
If anything is unclear please feel free to let me know so i could provide you with futher details
Thank you very much
Additional the different Installation date overview:
Shop A:
store A
delta date sales
-28 17.04.2019 20
-27 18.04.2019 20
-26 19.04.2019 13
-25 20.04.2019 25
-24 21.04.2019 16
-23 22.04.2019 20
-22 23.04.2019 26
-21 24.04.2019 15
-20 25.04.2019 20
-19 26.04.2019 13
-18 27.04.2019 13
-17 28.04.2019 20
-16 29.04.2019 21
-15 30.04.2019 20
-14 01.05.2019 17
-13 02.05.2019 13
-12 03.05.2019 9
-11 04.05.2019 34
-10 05.05.2019 28
-9 06.05.2019 19
-8 07.05.2019 14
-7 08.05.2019 23
-6 09.05.2019 18
-5 10.05.2019 9
-4 11.05.2019 22
-3 12.05.2019 17
-2 13.05.2019 14
-1 14.05.2019 19
0 15.05.2019 11
1 16.05.2019 0
2 17.05.2019 0
3 18.05.2019 1
4 19.05.2019 19
5 20.05.2019 18
6 21.05.2019 14
7 22.05.2019 11
8 23.05.2019 12
9 24.05.2019 8
10 25.05.2019 7
11 26.05.2019 19
12 27.05.2019 15
13 28.05.2019 15
14 29.05.2019 11
15 30.05.2019 5
16 31.05.2019 8
17 01.06.2019 10
18 02.06.2019 19
19 03.06.2019 14
20 04.06.2019 21
21 05.06.2019 22
22 06.06.2019 7
23 07.06.2019 6
24 08.06.2019 23
25 09.06.2019 17
26 10.06.2019 9
27 11.06.2019 8
28 12.06.2019 23
Shop B:
store B
delta date sales
-28 19.04.2019 15
-27 20.04.2019 20
-26 21.04.2019 8
-25 22.04.2019 11
-24 23.04.2019 13
-23 24.04.2019 16
-22 25.04.2019 17
-21 26.04.2019 16
-20 27.04.2019 12
-19 28.04.2019 8
-18 29.04.2019 4
-17 30.04.2019 14
-16 01.05.2019 13
-15 02.05.2019 17
-14 03.05.2019 15
-13 04.05.2019 16
-12 05.05.2019 16
-11 06.05.2019 11
-10 07.05.2019 15
-9 08.05.2019 7
-8 09.05.2019 13
-7 10.05.2019 10
-6 11.05.2019 18
-5 12.05.2019 8
-4 13.05.2019 12
-3 14.05.2019 16
-2 15.05.2019 7
-1 16.05.2019 9
0 17.05.2019 9
1 18.05.2019 16
2 19.05.2019 6
3 20.05.2019 15
4 21.05.2019 10
5 22.05.2019 14
6 23.05.2019 16
7 24.05.2019 19
8 25.05.2019 18
9 26.05.2019 9
10 27.05.2019 5
11 28.05.2019 16
12 29.05.2019 15
13 30.05.2019 17
14 31.05.2019 9
15 01.06.2019 8
16 02.06.2019 3
17 03.06.2019 8
18 04.06.2019 8
19 05.06.2019 13
20 06.06.2019 11
21 07.06.2019 15
22 08.06.2019 7
23 09.06.2019 12
24 10.06.2019 11
25 11.06.2019 10
26 12.06.2019 9
27 13.06.2019 6
28 14.06.2019 9
Try
select wk, sum(sales)
from (
select
isnull(sa.sales,0) + isnull(sb.sales,0) sales
, isnull(sa.delta , sb.delta) delta
, case when isnull(sa.delta , sb.delta) = 0 then 0
else case when isnull(sa.delta , sb.delta) > 0 then (isnull(sa.delta , sb.delta) -1) /7 +1
else (isnull(sa.delta , sb.delta) +1) /7 -1
end
end wk
from shopA sa
full join shopB sb on sa.delta=sb.delta
) t
group by wk;
sql fiddle
A more readable version, it doesn't run faster, just using CROSS APLLY this way allows to indroduce sort of intermediate variables for cleaner code.
select wk, sum(sales)
from (
select
isnull(sa.sales,0) + isnull(sb.sales,0) sales
, dlt delta
, case when dlt = 0 then 0
else case when dlt > 0 then (dlt - 1) / 7 + 1
else (dlt + 1) / 7 - 1
end
end wk
from shopA sa
full join shopB sb on sa.delta=sb.delta
cross apply (
select dlt = isnull(sa.delta, sb.delta)
) tmp
) t
group by wk;
Finally, if you already have a query which produces a dataset with the (pr_date, sales) columns
select wk, sum(sales)
from (
select sales
, case when pr_date = 0 then 0
else case when pr_date > 0 then (pr_date - 1) / 7 + 1
else (pr_date + 1) / 7 - 1
end
end wk
from (
-- ... you query here ...
)pr_date_sales
) t
group by wk;
I think you just need to take the day difference and use arithmetic. Using datediff() with week counts week-boundaries -- which is not what you want. That is, it normalizes the weeks to calendar weeks.
You want to leave out the day of the promotion, which makes this a wee bit more complicated.
I think this is the logic:
Select v.week_diff, sum(sales)
from tbl_sales s cross join
(values (case when change_date < sales_date
then (datediff(day, change_date, sales_date) + 1) / 7
else (datediff(day, change_date, sales_date) - 1) / 7
end)
) v(week_diff)
where change_date <> sales_date
group by v.week_diff;
There might be an off-by-one problem, depending on what you really want to do when the dates are the same.

Aggregate result from query by quarter SQL

Lets say I have a table which holds all exports for some time back in Microsoft SQL database:
Name:
ExportTable
Columns:
id - numeric(18)
exportdate - datetime
In order to get the number of exports per week I can run the following query:
SELECT DATEPART(ISO_WEEK,[exportdate]) as 'exportdate', count(exportdate) as 'totalExports'
FROM [ExportTable]
Group By DATEPART(ISO_WEEK,[exportdate])
order by exportdate;
Returns:
exportdate totalExports
---------- ------------
27 13
28 12
29 15
30 8
31 17
32 10
33 7
34 15
35 4
36 18
37 10
38 14
39 14
40 21
41 19
Would it be possible to aggregate the week results by quarter so the output becomes something like the bellow?
UPDATE
Sorry for not being crystal clear, I would like the current result to add upp with previous result up to a new quarter.
Note week 41 contains 21+19 = 40
Week 39 contains 157 (13+12+15+8+17+10+7+15+4+18+10+14+14)
exportdate totalExports Quarter
---------- ------------ -------
27 13 3
28 25 3
29 40 3
30 48 3
31 65 3
32 75 3
33 82 3
34 97 3
35 101 3
36 119 3
37 129 3
38 143 3
39 157 3 -- Sum of 3 Quarter values.
40 21 4 -- New Quarter show current week value
41 40 4 -- (21+19)
You can use this.
SELECT
DATEPART(ISO_WEEK,[exportdate]) as 'exportdate'
, SUM( count(exportdate) ) OVER ( PARTITION BY DATEPART(QUARTER,MIN([exportdate])) ORDER BY DATEPART(ISO_WEEK,[exportdate]) ROWS UNBOUNDED PRECEDING ) as 'totalExports'
, DATEPART(QUARTER,MIN([exportdate])) [Quarter]
FROM [ExportTable]
Group By DATEPART(ISO_WEEK,[exportdate])
order by exportdate;
You could use a case statement to separate the dates into quarters.
e.g.
CASE
WHEN EXPORT_DATE BETWEEN '1' AND '4' THEN 1
WHEN Export_Date BETWEEN '5' and '9' THEN 2
ELSE 0 AS [Quarter]
END
Its just an example but you get the idea.
You could then use the alias from the case
SELECT DATEPART(ISO_WEEK,[exportdate]) as 'exportdate', count(exportdate) as 'totalExports', DATEPART(quarter,[exportdate]) as quarter FROM [ExportTable] Group By DATEPART(ISO_WEEK,[exportdate]), DATEPART(quarter,[exportdate]) order by exportdate;