From date dimension table to hierarchy - sql

This is my table:
idDate timeformat timeformatdate idYear YearName idSemester semestername idquarter quartername idmonth month idweek week idday day
20160101 2016-01-01 01-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 53 W53 1 Friday , 1
20160102 2016-01-02 02-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 53 W53 2 Saturday , 2
20160103 2016-01-03 03-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 53 W53 3 Sunday , 3
20160104 2016-01-04 04-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 1 W1 4 Monday , 4
20160105 2016-01-05 05-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 1 W1 5 Tuesday , 5
20160106 2016-01-06 06-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 1 W1 6 Wednesday, 6
20160107 2016-01-07 07-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 1 W1 7 Thursday , 7
20160108 2016-01-08 08-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 1 W1 8 Friday , 8
20160109 2016-01-09 09-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 1 W1 9 Saturday , 9
20160110 2016-01-10 10-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 1 W1 10 Sunday , 10
20160111 2016-01-11 11-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 2 W2 11 Monday , 11
20160112 2016-01-12 12-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 2 W2 12 Tuesday , 12
20160113 2016-01-13 13-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 2 W2 13 Wednesday, 13
20160114 2016-01-14 14-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 2 W2 14 Thursday , 14
20160115 2016-01-15 15-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 2 W2 15 Friday , 15
20160116 2016-01-16 16-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 2 W2 16 Saturday , 16
20160117 2016-01-17 17-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 2 W2 17 Sunday , 17
20160118 2016-01-18 18-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 3 W3 18 Monday , 18
20160119 2016-01-19 19-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 3 W3 19 Tuesday , 19
20160120 2016-01-20 20-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 3 W3 20 Wednesday, 20
20160121 2016-01-21 21-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 3 W3 21 Thursday , 21
20160122 2016-01-22 22-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 3 W3 22 Friday , 22
20160123 2016-01-23 23-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 3 W3 23 Saturday , 23
20160124 2016-01-24 24-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 3 W3 24 Sunday , 24
20160125 2016-01-25 25-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 4 W4 25 Monday , 25
20160126 2016-01-26 26-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 4 W4 26 Tuesday , 26
20160127 2016-01-27 27-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 4 W4 27 Wednesday, 27
20160128 2016-01-28 28-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 4 W4 28 Thursday , 28
20160129 2016-01-29 29-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 4 W4 29 Friday , 29
20160130 2016-01-30 30-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 4 W4 30 Saturday , 30
20160131 2016-01-31 31-Jan-16 2016 2016 1 S1 1 Q1 201601 Jan 4 W4 31 Sunday , 31
I'm trying to create a time_hierarchy via a SQL statement.
What i want to achieve is for example:
Time
[2016] as Year
[Q1, 2016] as Quarter
[Jan, 2016] as Month
[Jan, 1, 2016] as Day
Etc...
I've really no idea how to achieve this or if my table do not support this kind of hierarchy.
Could you help me?
Thanks

Perhaps something like this?
SELECT
CASE WHEN quarterName IS NULL THEN 'Year'
WHEN month IS NULL THEN 'Quarter'
WHEN idDay IS NULL THEN 'Month'
ELSE 'Day'
END
AS aggregateLevel,
CASE WHEN quarterName IS NULL THEN YearName
WHEN month IS NULL THEN QuarterName || ', ' || YearName
WHEN idDay IS NULL THEN Month || ', ' || YearName
ELSE Month || ', ' || CAST(idDay AS VARCHAR(2)) || YearName
END
AS periodTitle,
MIN(idDate) AS periodFirstDate,
MAX(idDate) AS periodFinalDate
FROM
yourTable
GROUP BY
GROUPING SETS (
(YearName),
(YearName, quarterName),
(YearName, quarterName, month),
(YearName, quarterName, month, idDay)
)
ORDER BY
MIN(idDate),
MAX(idDate) DESC

Related

Pandas Sort Two Columns with Day of Year Wrap-Around to New Year

I have data that may at certain times of the year around the first of each year, that a day_of_year sequence involves changing the "year" column to the new year when day_of_year ==1. It is a trick that I have not been able to figure out and in some ways not sure how to start so any help here is much appreciated. My data looks like this:
Here is my df1 =
day_of_year year var_1
364 2017 17.71666667
364 2018 5.166666667
364 2019 2
364 2020 1.595833333
364 2021 3.75
364 2022 6.8875
365 2017 14.83333333
365 2018 2.758333333
365 2019 4.108333333
365 2020 5.766666667
365 2021 5.291666667
365 2022 10.58636364
1 2017 2.0125
1 2018 14.0125
1 2019 -0.504166667
1 2020 7.666666667
1 2021 5.520833333
1 2022 1.229166667
2 2017 1.7625
2 2018 15.10416667
2 2019 -0.391666667
2 2020 9.5
2 2021 7.645833333
2 2022 0.9125
And, after the re-formatting, I need it to look like the below sorted df with "n/a" for any missing or expected data in a year that might be missing data. thank you again,
final df:
day_of_year year var_1
364 2017 17.71666667
365 2017 14.83333333
1 2018 14.0125
2 2018 15.10416667
364 2018 5.166666667
365 2018 2.758333333
1 2019 -0.504166667
2 2019 -0.391666667
364 2019 2
365 2019 4.108333333
1 2020 7.666666667
2 2020 9.5
364 2020 1.595833333
365 2020 5.766666667
1 2021 5.520833333
2 2021 7.645833333
364 2021 3.75
365 2021 5.291666667
1 2022 1.229166667
2 2022 0.9125
364 2022 6.8875
365 2022 10.58636364
n/a n/a n/a
n/a n/a n/a
Why would you change the year based on the day? Just sort by the two columns:
df.sort_values(by=['year', 'day_of_year'])
Output:
day_of_year year var_1
12 1 2017 2.012500
18 2 2017 1.762500
0 364 2017 17.716667
6 365 2017 14.833333
13 1 2018 14.012500
19 2 2018 15.104167
1 364 2018 5.166667
7 365 2018 2.758333
14 1 2019 -0.504167
20 2 2019 -0.391667
2 364 2019 2.000000
8 365 2019 4.108333
15 1 2020 7.666667
21 2 2020 9.500000
3 364 2020 1.595833
9 365 2020 5.766667
16 1 2021 5.520833
22 2 2021 7.645833
4 364 2021 3.750000
10 365 2021 5.291667
17 1 2022 1.229167
23 2 2022 0.912500
5 364 2022 6.887500
11 365 2022 10.586364
If for some reason you really need to fix the year, use a conditional with mask:
(df.assign(year=df['year'].mask(df['day_of_year'].le(2), df['year'].add(1)))
.sort_values(by=['year', 'day_of_year'])
)
Or, if you want to update the years after a change from 365 to a lower day:
(df.assign(year=df['year'].add(df['day_of_year'].diff().lt(0).cumsum()))
.sort_values(by=['year', 'day_of_year'])
)
Output:
day_of_year year var_1
0 364 2017 17.716667
6 365 2017 14.833333
12 1 2018 2.012500
18 2 2018 1.762500
1 364 2018 5.166667
7 365 2018 2.758333
13 1 2019 14.012500
19 2 2019 15.104167
2 364 2019 2.000000
8 365 2019 4.108333
14 1 2020 -0.504167
20 2 2020 -0.391667
3 364 2020 1.595833
9 365 2020 5.766667
15 1 2021 7.666667
21 2 2021 9.500000
4 364 2021 3.750000
10 365 2021 5.291667
16 1 2022 5.520833
22 2 2022 7.645833
5 364 2022 6.887500
11 365 2022 10.586364
17 1 2023 1.229167
23 2 2023 0.912500
I would convert everything to date time first. Just run:
pd.to_datetime(df['day_of_year'].astype(str) + '-' + df['year'].astype(str),
format='%j-%Y')
I assign it to column ymd and sort, yielding the following:
>>> df.sort_values('ymd')
day_of_year year var_1 ymd
12 1 2017 2.012500 2017-01-01
18 2 2017 1.762500 2017-01-02
0 364 2017 17.716667 2017-12-30
6 365 2017 14.833333 2017-12-31
13 1 2018 14.012500 2018-01-01
19 2 2018 15.104167 2018-01-02
1 364 2018 5.166667 2018-12-30
7 365 2018 2.758333 2018-12-31
14 1 2019 -0.504167 2019-01-01
20 2 2019 -0.391667 2019-01-02
2 364 2019 2.000000 2019-12-30
8 365 2019 4.108333 2019-12-31
15 1 2020 7.666667 2020-01-01
21 2 2020 9.500000 2020-01-02
3 364 2020 1.595833 2020-12-29
9 365 2020 5.766667 2020-12-30
16 1 2021 5.520833 2021-01-01
22 2 2021 7.645833 2021-01-02
4 364 2021 3.750000 2021-12-30
10 365 2021 5.291667 2021-12-31
17 1 2022 1.229167 2022-01-01
23 2 2022 0.912500 2022-01-02
5 364 2022 6.887500 2022-12-30
11 365 2022 10.586364 2022-12-31

Get the last 4 weeks prior to current week of and the same 4 weeks of last year

I have a list of date, fiscal week, and fiscal year:
DATE_VALUE FISCAL_WEEK FISCAL_YEAR_VALUE
14-Dec-20 51 2020
15-Dec-20 51 2020
16-Dec-20 51 2020
17-Dec-20 51 2020
18-Dec-20 51 2020
19-Dec-20 51 2020
20-Dec-20 51 2020
21-Dec-20 52 2020
22-Dec-20 52 2020
23-Dec-20 52 2020
24-Dec-20 52 2020
25-Dec-20 52 2020
26-Dec-20 52 2020
27-Dec-20 52 2020
28-Dec-20 1 2021
29-Dec-20 1 2021
30-Dec-20 1 2021
31-Dec-20 1 2021
1-Jan-21 1 2021
2-Jan-21 1 2021
3-Jan-21 1 2021
4-Jan-21 2 2021
5-Jan-21 2 2021
6-Jan-21 2 2021
7-Jan-21 2 2021
8-Jan-21 2 2021
9-Jan-21 2 2021
10-Jan-21 2 2021
11-Jan-21 3 2021
12-Jan-21 3 2021
13-Jan-21 3 2021
14-Jan-21 3 2021
15-Jan-21 3 2021
16-Jan-21 3 2021
17-Jan-21 3 2021
18-Jan-21 4 2021
19-Jan-21 4 2021
20-Jan-21 4 2021
21-Jan-21 4 2021
22-Jan-21 4 2021
23-Jan-21 4 2021
24-Jan-21 4 2021
20-Dec-21 52 2021
21-Dec-21 52 2021
22-Dec-21 52 2021
23-Dec-21 52 2021
24-Dec-21 52 2021
25-Dec-21 52 2021
26-Dec-21 52 2021
27-Dec-21 53 2021
28-Dec-21 53 2021
29-Dec-21 53 2021
30-Dec-21 53 2021
31-Dec-21 53 2021
1-Jan-22 53 2021
2-Jan-22 53 2021
3-Jan-22 1 2022
4-Jan-22 1 2022
5-Jan-22 1 2022
6-Jan-22 1 2022
7-Jan-22 1 2022
8-Jan-22 1 2022
9-Jan-22 1 2022
10-Jan-22 2 2022
11-Jan-22 2 2022
12-Jan-22 2 2022
13-Jan-22 2 2022
14-Jan-22 2 2022
15-Jan-22 2 2022
16-Jan-22 2 2022
17-Jan-22 3 2022
18-Jan-22 3 2022
19-Jan-22 3 2022
20-Jan-22 3 2022
21-Jan-22 3 2022
22-Jan-22 3 2022
23-Jan-22 3 2022
24-Jan-22 4 2022
25-Jan-22 4 2022
26-Jan-22 4 2022
27-Jan-22 4 2022
28-Jan-22 4 2022
29-Jan-22 4 2022
30-Jan-22 4 2022
I want to pull the last 4 weeks prior to the current week AND the same 4 weeks of the year before. Please see example 1. This works fine when all 4 weeks are within the same year. But when it comes to the beginning of a year when 1 or more weeks are in the current year but the other are in the previous year, I am not able to get the desired output below:
FISCAL_YEAR_VALUE FISCAL_WEEK
2020 51
2020 52
2021 2
2021 1
2021 52
2021 53
2022 1
2022 2
The code I have is below. I am using the date of 21-JAN-22 as an example:
SELECT
FISCAL_YEAR_VALUE,
FISCAL_WEEK
FROM TABLE_NAME
WHERE FISCAL_YEAR_VALUE IN (SELECT *
FROM (WITH T AS (
SELECT DISTINCT FISCAL_YEAR_VALUE
FROM TABLE_NAME
WHERE TRUNC(DATE_VALUE) <= TRUNC(TO_DATE('21-JAN-22'))--TEST DATE
ORDER BY FISCAL_YEAR_VALUE DESC
FETCH NEXT 2 ROWS ONLY
)
SELECT FISCAL_YEAR_VALUE
FROM T ORDER BY FISCAL_YEAR_VALUE
)
)
AND FISCAL_WEEK IN (SELECT *
FROM (WITH T AS (
SELECT DISTINCT FISCAL_WEEK, FISCAL_YEAR_VALUE
FROM TABLE_NAME
WHERE TRUNC(DATE_VALUE) <= TRUNC(TO_DATE('21-JAN-22'))--TEST DATE
ORDER BY FISCAL_YEAR_VALUE DESC, FISCAL_WEEK DESC
OFFSET 1 ROWS
FETCH NEXT 4 ROWS ONLY
)
SELECT FISCAL_WEEK
FROM T ORDER BY FISCAL_YEAR_VALUE, FISCAL_WEEK
)
)
GROUP BY FISCAL_YEAR_VALUE, FISCAL_WEEK
ORDER BY FISCAL_YEAR_VALUE, FISCAL_WEEK
Output of the code is:
FISCAL_YEAR_VALUE FISCAL_WEEK
2021 2
2021 1
2021 52
2021 53
2022 1
2022 2
As you can see, the last 2 weeks of year 2020 are not included. Please see example 2. How can I also include this exception in the code to make it dynamic? Any help would be greatly appreciated!
To find the values this year, you can use:
SELECT DISTINCT fiscal_year_value, fiscal_week
FROM table_name
WHERE date_value < TRUNC(SYSDATE, 'IW')
AND date_value >= TRUNC(SYSDATE, 'IW') - INTERVAL '28' DAY
To find the values from the previous year, you can find the maximum fiscal week from this year and subtract 1 from the year and then use that to find the upper bound of the date_value for last fiscal year and, given that can use a similar range for last year:
WITH this_year (fiscal_year_value, fiscal_week) AS (
SELECT fiscal_year_value, fiscal_week
FROM table_name
WHERE date_value < TRUNC(SYSDATE, 'IW')
AND date_value >= TRUNC(SYSDATE, 'IW') - INTERVAL '28' DAY
),
max_last_year (max_date_value) AS (
SELECT MAX(date_value) + INTERVAL '1' DAY
FROM table_name
WHERE (fiscal_year_value, fiscal_week) IN (
SELECT fiscal_year_value - 1, fiscal_week
FROM this_year
ORDER BY fiscal_year_value DESC, fiscal_week DESC
FETCH FIRST ROW ONLY
)
)
SELECT fiscal_year_value, fiscal_week
FROM this_year
UNION
SELECT t.fiscal_year_value, t.fiscal_week
FROM table_name t
INNER JOIN max_last_year m
ON ( t.date_value < m.max_date_value
AND t.date_value >= m.max_date_value - INTERVAL '28' DAY);
Which, for the sample data:
Create Table table_name(DATE_VALUE DATE, FISCAL_WEEK INT, FISCAL_YEAR_VALUE INT);
INSERT INTO table_name (date_value, fiscal_week, fiscal_year_value)
SELECT DATE '2019-12-30' + LEVEL - 1, CEIL(LEVEL/7), 2020
FROM DUAL
CONNECT BY LEVEL <= 7 * 52
UNION ALL
SELECT DATE '2020-12-28' + LEVEL - 1, CEIL(LEVEL/7), 2021
FROM DUAL
CONNECT BY LEVEL <= 7 * 53
UNION ALL
SELECT DATE '2022-01-03' + LEVEL - 1, CEIL(LEVEL/7), 2022
FROM DUAL
CONNECT BY LEVEL <= 7 * 52;
Outputs:
FISCAL_YEAR_VALUE
FISCAL_WEEK
2022
38
2022
39
2022
40
2022
41
2021
38
2021
39
2021
40
2021
41
And if today's date was 2022-01-01, would output:
FISCAL_YEAR_VALUE
FISCAL_WEEK
2021
52
2021
53
2022
1
2022
2
2020
51
2020
52
2021
1
2021
2
There may be a simpler method but without any knowledge of how you calculate a fiscal year that is not immediately possible.
fiddle

Getiing This Year sales and Last Year Sales (for the same week number) in one row BigQuery

In BigQuery I am trying to create a view with Sales TY (This Year Financial) and Sales LY in one row. The Sales LY measure should be for the same corresponding week number TY as per the example below.
I am using the following code:
SUM(table_1.sales) OVER (PARTITION BY table_1.client ORDER BY table_1.fw_end_date DESC ROWS BETWEEN 53 FOLLOWING AND 53 FOLLOWING ) as SALES_LY
The problem is that sometimes there are missing weeks which means that the window function will not reflect it because it calculates number of rows not the number of actual weeks.
In the example below week 19 and week 20 are missing in this financial year hence the corresponding results for the sales_ly measure are wrong.
Could it be possible fixed by adding/joining additional rows for the missing dates with NULL values?
row_number client fy wofy fw_end_date year sales sales_LY
1 1111 2020 34 2020-02-23 TY 74.97971177 75.63215281
2 1111 2020 33 2020-02-16 TY 42.42109122 68.14894689
3 1111 2020 32 2020-02-09 TY 19.85037174 87.12654065
4 1111 2020 31 2020-02-02 TY 16.56226835 3.122137476
5 1111 2020 30 2020-01-26 TY 1.800185225 10.74736963
6 1111 2020 29 2020-01-19 TY 24.75012318 38.63631908
7 1111 2020 28 2020-01-12 TY 25.25663409 1.387588554
8 1111 2020 27 2020-01-05 TY 72.26309414 0.873563634
9 1111 2020 26 2019-12-29 TY 48.42015566 81.57363107
10 1111 2020 25 2019-12-22 TY 29.94857681 41.80248501
11 1111 2020 24 2019-12-15 TY 60.84110104 26.66495665
12 1111 2020 23 2019-12-08 TY 6.810985936 48.21324987
13 1111 2020 22 2019-12-01 TY 55.25000762 8.681731452
14 1111 2020 21 2019-11-24 TY 7.314435129 61.50365765
15 1111 2020 18 2019-11-03 TY 28.86674749 84.68936948 --wrong
16 1111 2020 17 2019-10-27 TY 93.06304298
17 1111 2020 16 2019-10-20 TY 31.18234699
18 1111 2020 15 2019-10-13 TY 56.40700057
19 1111 2020 14 2019-10-06 TY 70.7995385
20 1111 2020 13 2019-09-29 TY 88.80009525
21 1111 2020 12 2019-09-22 TY 75.12011037
22 1111 2020 11 2019-09-15 TY 28.54977137
23 1111 2020 10 2019-09-08 TY 38.05238915
24 1111 2020 9 2019-09-01 TY 7.256419393
25 1111 2020 8 2019-08-25 TY 32.81145188
26 1111 2020 7 2019-08-18 TY 32.04938194
27 1111 2020 6 2019-08-11 TY 53.890913
28 1111 2020 5 2019-08-04 TY 17.4527262
29 1111 2020 4 2019-07-28 TY 66.08187866
30 1111 2020 3 2019-07-21 TY 9.331124689
31 1111 2020 2 2019-07-14 TY 61.35972079
32 1111 2020 1 2019-07-07 TY 68.02729471
33 1111 2019 52 2019-06-23 LY 65.09319706
34 1111 2019 51 2019-06-16 LY 46.3647103
35 1111 2019 50 2019-06-09 LY 45.04742519
36 1111 2019 49 2019-06-02 LY 10.72003618
37 1111 2019 48 2019-05-26 LY 69.73143446
38 1111 2019 47 2019-05-19 LY 8.3106988
39 1111 2019 46 2019-05-12 LY 18.53940931
40 1111 2019 45 2019-05-05 LY 78.27473501
41 1111 2019 44 2019-04-28 LY 4.799286544
42 1111 2019 43 2019-04-21 LY 94.58933139
43 1111 2019 42 2019-04-14 LY 27.06059414
44 1111 2019 41 2019-04-07 LY 53.42375151
45 1111 2019 40 2019-03-31 LY 22.92189479
46 1111 2019 39 2019-03-24 LY 95.3449253
47 1111 2019 38 2019-03-17 LY 62.34159147
48 1111 2019 37 2019-03-10 LY 73.99278829
49 1111 2019 36 2019-03-03 LY 39.517017
50 1111 2019 35 2019-02-24 LY 50.46146309
51 1111 2019 34 2019-02-17 LY 75.63215281
52 1111 2019 33 2019-02-10 LY 68.14894689
53 1111 2019 32 2019-02-03 LY 87.12654065
54 1111 2019 31 2019-01-27 LY 3.122137476
55 1111 2019 30 2019-01-20 LY 10.74736963
56 1111 2019 29 2019-01-13 LY 38.63631908
57 1111 2019 28 2019-01-06 LY 1.387588554
58 1111 2019 27 2018-12-30 LY 0.873563634
59 1111 2019 26 2018-12-23 LY 81.57363107
60 1111 2019 25 2018-12-16 LY 41.80248501
61 1111 2019 24 2018-12-09 LY 26.66495665
62 1111 2019 23 2018-12-02 LY 48.21324987
63 1111 2019 22 2018-11-25 LY 8.681731452
64 1111 2019 21 2018-11-18 LY 61.50365765
65 1111 2019 20 2018-11-11 LY 84.68936948
Instead of jumping 53 rows, go and search for the exact 1 year ago date:
WITH data AS (
SELECT *
FROM `fh-bigquery.weather_gsod.all`
WHERE date BETWEEN '2018-12-01' AND '2020-02-24'
AND name LIKE 'SAN FRANCISCO INTERNATIONAL A'
), main_query AS (
SELECT name, date, temp
, ARRAY_AGG(STRUCT(date, temp)) OVER(PARTITION BY name ORDER BY date ROWS BETWEEN 366 PRECEDING AND 310 PRECEDING ) over_array
FROM data a
)
SELECT * EXCEPT(over_array)
, (SELECT temp FROM UNNEST(over_array) WHERE date=DATE_SUB(a.date, INTERVAL 1 year)) prev_year
FROM main_query a
ORDER BY name, date DESC
I did this with days - you can do the same with weeks instead.
Solved for weeks:
WITH data AS (
SELECT ROUND(AVG(temp),1) temp, DATE_TRUNC(date, week) week, name
FROM `fh-bigquery.weather_gsod.all`
WHERE date BETWEEN '2018-12-01' AND '2020-02-24'
AND name LIKE 'SAN FRANCISCO INTERNATIONAL A'
GROUP BY name, week
), main_query AS (
SELECT name, week, temp
, ARRAY_AGG(STRUCT(week, temp))
OVER(PARTITION BY name ORDER BY week
ROWS BETWEEN 53 PRECEDING AND 40 PRECEDING ) over_array
FROM data a
)
SELECT * EXCEPT(over_array)
, (SELECT temp FROM UNNEST(over_array)
WHERE EXTRACT(YEAR FROM week)+1=EXTRACT(YEAR FROM a.week)
AND EXTRACT(WEEK FROM week)=EXTRACT(WEEK FROM a.week)) prev_year
FROM main_query a
ORDER BY name, week DESC

Segregating data based on last 3 months and this time last year

I need to filter out my data into two different index.
(1) last three months, includes December as current month minus three
(2) current month (December 2019) and current month values from the year before
pDate Name Date Year Month
11/17/2019 12:18 A 2019/11 2019 11
12/23/2018 11:52 B 2018/12 2018 12
12/1/2019 11:42 C 2019/12 2019 12
12/10/2018 14:31 D 2018/12 2018 12
12/14/2018 12:42 E 2018/12 2018 12
10/15/2019 15:19 F 2019/10 2019 10
10/23/2019 10:50 G 2019/10 2019 10
12/2/2018 15:14 H 2018/12 2018 12
I was able to group them based upon their last 3 months values, relatively quick as:
df1 = df.sort_values(by="pDate",ascending=True).set_index("pDate").last("3M")
How do I get a dataframe which maps December 2019 (current month) and December 2018 only.
Idea is create month periods by Series.dt.to_period and then you can subtract values for past periods filtering by Series.between with boolean indexing:
$changed sample datetimes
df['pDate'] = pd.to_datetime(df['pDate'])
df = df.sort_values(by="pDate")
print (df)
pDate Name Date Year Month
7 2018-12-02 15:14:00 H 2018/12 2018 12
4 2018-12-14 12:42:00 E 2018/12 2018 12
3 2019-10-10 14:31:00 D 2018/12 2018 12
5 2019-10-15 15:19:00 F 2019/10 2019 10
6 2019-10-23 10:50:00 G 2019/10 2019 10
2 2019-11-01 11:42:00 C 2019/12 2019 12
1 2019-12-23 11:52:00 B 2018/12 2018 12
0 2020-01-17 12:18:00 A 2019/11 2019 11
nowp = pd.to_datetime('now').to_period('m')
print (nowp)
2020-01
df['per'] = df['pDate'].dt.to_period('m')
df = df[df['per'].between(nowp-4, nowp-1) | df['per'].eq(nowp-13)]
print (df)
pDate Name Date Year Month per
7 2018-12-02 15:14:00 H 2018/12 2018 12 2018-12
4 2018-12-14 12:42:00 E 2018/12 2018 12 2018-12
3 2019-10-10 14:31:00 D 2018/12 2018 12 2019-10
5 2019-10-15 15:19:00 F 2019/10 2019 10 2019-10
6 2019-10-23 10:50:00 G 2019/10 2019 10 2019-10
2 2019-11-01 11:42:00 C 2019/12 2019 12 2019-11
1 2019-12-23 11:52:00 B 2018/12 2018 12 2019-12
Detail:
print (nowp)
2020-01
print (nowp-1)
2019-12
print (nowp-13)
2018-12
print (nowp-4)
2019-09

Get all dates for all date ranges in table using SQL Server

I have table dbo.WorkSchedules(Id, From, To) where I store date ranges for work schedules. I want to create a view that will have all dates for all rows of WorkSchedules. Thanks to this I have 1 view with all dates for all schedules.
On web I only found solutions for 1 row like 2 parameters start and end. My issue is different where I have multiple rows with start and end range.
Example:
WorkSchedules
Id | From | To
---+------------+-----------
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
Desired result
1 | 2018-01-01
2 | 2018-01-02
3 | 2018-01-03
4 | 2018-01-04
5 | 2018-01-05
6 | 2018-01-08
7 | 2018-01-09
8 | 2018-01-10
9 | 2018-01-11
10| 2018-01-12
If you are regularly dealing with "jobs" and "schedules" then I propose that you need a permanent calendar table (a table where each row is a unique date). You can create rows for dates dynamically but why do this many times when you can do it once and just re-use?
A calendar table, even of several decades, isn't "big" and when indexed they can be very fast as well. You can also store information about holidays and/or fiscal periods etc.
There are many scripts available to produce these tables, here's an answer with 2 scripts on this site: https://stackoverflow.com/a/5635628/2067753
Assuming you use the second (more comprehensive) script, then you can exclude weekends, or other conditions such as holidays, from query results.
Once you have a permanent Calendar table this style of query may be used:
CREATE TABLE WorkSchedules(
Id INTEGER NOT NULL PRIMARY KEY
,[From] DATE NOT NULL
,[To] DATE NOT NULL
);
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (1,'2018-01-01','2018-01-05');
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (2,'2018-01-12','2018-01-12');
with range as (
select min(ws.[From]) as dt_from, max(ws.[To]) dt_to
from WorkSchedules as ws
)
select c.*
from calendar as c
inner join range on c.date between range.dt_from and range.dt_to
where c.KindOfDay = 'BANKDAY'
order by c.date
and the result looks like this (note: "News Years Day" has been excluded)
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- -------------
1 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
2 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
3 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
4 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
5 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
6 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
7 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
8 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
9 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
Without the where clause the full range is:
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- ----------------
1 01.01.2018 00:00:00 2018 1 1 1 1 1 1 2018 1 1 HOLIDAY New Year's Day
2 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
3 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
4 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
5 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
6 06.01.2018 00:00:00 2018 1 1 1 6 6 6 2018 1 1 SATURDAY NULL
7 07.01.2018 00:00:00 2018 1 1 1 7 7 7 2018 1 1 SUNDAY NULL
8 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
9 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
10 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
11 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
12 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
and weekends and holidays may be excluded using the column KindOfDay
See this as a demonstration (with build of calendar table) here: http://rextester.com/CTSW63441
Ok, I worked this out for you, thinking you mean that you meant 01/08/2018 as a From date in the second row.
/*WorkSchedules
Id| From | To
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
*/
--DROP TABLE #WorkSchedules;
CREATE TABLE #WorkSchedules (
ID int,
[DateFrom] DATE,
[DateTo] DATE
)
INSERT INTO #WorkSchedules
SELECT 1, '2018-01-01', '2018-01-05'
UNION
SELECT 2, '2018-01-08', '2018-01-12'
;WITH CTEDATELIMITS AS (
SELECT [DateFrom], [DateTo]
FROM #WorkSchedules
)
,CTEDATES AS
(
SELECT [DateFrom] as [DateResult] FROM CTEDATELIMITS
UNION ALL
SELECT DATEADD(Day, 1, [DateResult]) FROM CTEDATES
JOIN CTEDATELIMITS ON CTEDATES.[DateResult] >= CTEDATELIMITS.[DateFrom]
AND CTEDATES.dateResult < CTEDATELIMITS.[DateTo]
)
SELECT [DateResult] FROM CTEDATES
ORDER BY [DateResult]
You would use a recursive CTE:
with dates as (
select from, to, from as date
from WorkSchedules
union all
select from, to, dateadd(day, 1, date)
from dates
where date < to
)
select row_number() over (order by date), date
from dates;
Note that from and to are reserved words in SQL. They are lousy names for identifiers. I have not escaped them because I assume they are not the actual names of the columns.