Get a date from iso week and year in SQL - sql

From iso week and year, I would like to get a date.
The date should be first day of the week.
First day of the week is Monday.
For example iso week 10 and iso year should convert to 2019-03-04.
I am using Snowflake

The date expression to do this is a little complex, but not impossible:
SELECT
DATEADD( /* Calculate start of ISOWeek as offset from Jan 1st */
DAY,
WEEK * 7 - CASE WHEN DAYOFWEEKISO(DATE_FROM_PARTS(YEAR, 1, 1)) < 5 THEN 7 ELSE 0 END
+ 1 - DAYOFWEEKISO(DATE_FROM_PARTS(YEAR, 1, 1)),
DATE_FROM_PARTS(YEAR, 1, 1)
)
FROM (VALUES (2000, 1), (2000, 2), (2001, 1), (2002, 1), (2003, 1)) v(YEAR, WEEK);

Unfortunately, Snowflake doesn't support this functionality natively.
While it's possible to compute manually the date from ISO week and year, it's very complex. So like others suggested, generating a Date Dimension table for this is much easier.
Example of a query that can generate it for the lookups (note - this is not a full Date Dimension table - that is typically one row per day, this is one row per week).
create or replace table iso_week_lookup as
select
date_part(yearofweek_iso, d) year_iso,
date_part(week_iso, d) week_iso,
min(d) first_day
from (
select dateadd(day, row_number() over (order by 1) - 1, '2000-01-03'::date) AS d
from table(generator(rowCount=>10000))
)
group by 1, 2 order by 1,2;
select * from iso_week_lookup limit 2;
----------+----------+------------+
YEAR_ISO | WEEK_ISO | FIRST_DAY |
----------+----------+------------+
2000 | 1 | 2000-01-03 |
2000 | 2 | 2000-01-10 |
----------+----------+------------+
select min(first_day), max(first_day) from iso_week_lookup;
----------------+----------------+
MIN(FIRST_DAY) | MAX(FIRST_DAY) |
----------------+----------------+
2000-01-03 | 2027-05-17 |
----------------+----------------+
select * from iso_week_lookup where year_iso = 2019 and week_iso = 10;
----------+----------+------------+
YEAR_ISO | WEEK_ISO | FIRST_DAY |
----------+----------+------------+
2019 | 10 | 2019-03-04 |
----------+----------+------------+
Note, you can play with the constants in create table to create a table of the range you want. Just remember to use Monday as the starting day, otherwise you'll get a wrong value for the first week in the table :)

If you do not have Date Dimension table and/or utilities, as mentioned in the comments, you should parsing it from a textual form. But it would be DBMS implementation dependent:
In MySQL: STR_TO_DATE(CONCAT(year, ' ', week), '%x %v')
In PostgreSQL: TO_DATE(year || ' ' || week, 'IYYY IW')
(also Oracle DB would be something similar)

Related

How to increment weeks by adding number

I have a table that contains week number in string and number. I want to sum number with week and get the next week.
for example
tableA
week num
2022-1 1
2022-3 3
output
week num new_week
2022-1 1 2022-2
2022-3 3 2022-6
2022-52 2 2023-2
As a result, I converted the week into the date, added the week to the date, and finally converted the date back to the week. However, when I try to work date to week, I have issues. The SQL below is what I'm using
CONCAT(YEAR(DATEADD('week', num, date)), WEEK(DATEADD('week', num, date)))
I am not using the calendar year. Due to the fact that my week begins on the first Friday of every year, the calculation is incorrect. Would it be possible to avoid the need to convert week into date and date into week?
I wrote a small JS UDF to do your "week" math. It seems if December 31 is Thursday, then that year has 53 weeks. Good thing is, you don't need to convert your "year-week" to dates.
create or replace function addweeks( spcweek VARCHAR, num VARCHAR ) returns VARCHAR
LANGUAGE JAVASCRIPT
AS
$$
year = parseInt(SPCWEEK.substring( 0, 4 ));
week = parseInt(SPCWEEK.substring( 5 ));
week = week + parseInt(NUM);
weekinyear = (new Date(year, 11, 31).getDay() == 4 ? 53 : 52);
while (week > weekinyear ) {
week = week - weekinyear;
weekinyear = (new Date(year, 11, 31).getDay() == 4 ? 53 : 52);
year ++;
}
return year + "-" + week;
$$
;
select myweek, num, addweeks( myweek, num) new_week
from mydata;
+---------+-----+----------+
| MYWEEK | NUM | NEW_WEEK |
+---------+-----+----------+
| 2022-1 | 1 | 2022-2 |
| 2022-3 | 3 | 2022-6 |
| 2022-52 | 2 | 2023-2 |
| 2020-52 | 2 | 2021-1 |
+---------+-----+----------+
I think you can correct my logic if there is an error in calculating the total weeks of the year.
With a bit of string fiddling you could do the calulation like this.
SELECT week, num, CONCAT( SUBSTRING(week FROM 1 for 5), num + SUBSTRING(week FROM INSTR(week, '-')+1))
FROM table;

Querying across months and days

My access logs database stores time as epoch and extracts year month and day as integers. Further, the partitioning of the database is based on the extracted Y/m/d and I have a 35 day retention.
If I run this query:
select *
from mydb
where year in (2017, 2018)
and month in (12, 1)
and day in (31, 1)
On the 29th of January, 2018, I will get data for 12/31/2017 and 1/1/2018.
On the 5th of January, 2018, I will get data for 12/1/2017, 12/31/2017, and 1/1/2018 (undesirable)
I also realize that I can do something like this:
select *
from mydb
where (year = 2017 and month = 12 and day = 31)
or (year = 2018 and month = 1 and day = 1)
But what I am really looking for is this: a good way to write a query where I give the year month and day number as the start and then a fourth value (number of days +) and then get all the data for 12/31/2017 + 5 days for example.
Is there a native way in SQL to accomplish this? I have an enormous data set and if I don't specify the days and have to rely on the epoch to do this, the query takes forever. I also have no influence over the partitioning configuration.
With Impala as the dbms and SQL dialect you will be able to use common table expressions but not recursion. In addition there may be problems inserting parameters as well.
Below is an untested suggestion that will require you to locate some function alternatives. First it generates a set of rows with an integer from 0 to 999 (in the example). It is quite easy to expand the number of rows if required. From those rows it is possible to add the number of days to a timestamp literal using date_add(timestamp startdate, int days/interval expression) and then with year(timestamp date) and month(timestamp date) and day(timestamp date) see Date and Time functions create the columns needed to match to your data.
Overall then you should be able to build a common table expression that has columns for year, month, day that cover a wanted range, and that you can inner join to your source table and thereby implementing a date range filter.
The code below was produced using T-SQL (SQL Server) and it can be tested here.
-- produce a set of integers, adjust to suit needed number of these
;WITH
cteDigits AS (
SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
)
, cteTally AS (
SELECT
d1s.digit
+ d10s.digit * 10
+ d100s.digit * 100 /* add more like this as needed */
-- + d1000s.digit * 1000 /* add more like this as needed */
AS num
FROM cteDigits d1s
CROSS JOIN cteDigits d10s
CROSS JOIN cteDigits d100s /* add more like this as needed */
-- CROSS JOIN cteDigits d1000s /* add more like this as needed */
)
, DateRange AS (
select
num
, dateadd(day,num,'20181227') dt
, year(dateadd(day,num,'20181227')) yr
, month(dateadd(day,num,'20181227')) mn
, day(dateadd(day,num,'20181227')) dy
from cteTally
where num < 10
)
select
*
from DateRange
I think these are the Impala equivalents for the function calls used above:
, DateRange AS (
select
num
, date_add(to_timestamp('20181227','yyyyMMdd'),num) dt
, year( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) yr
, month( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) mn
, day( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) dy
from cteTally
where num < 10
Hopefully you can work out how to use these. Ultimately the purpose is to use the generated date range like so:
select * from mydb t
inner join DateRange on t.year = DateRange.yr and t.month = DateRange.mn and t.day = DateRange.dy
original post
Well in the absence of knowing what database to propose solutions for, here is a suggestion using SQL Server:
This suggestion involves a recursive common table expression, which may then be used as an inner join to your source data to limit the results to a date range.
--Sql Server 2014 Express Edition
--https://rextester.com/l/sql_server_online_compiler
declare #yr as integer = 2018
declare #mn as integer = 12
declare #dy as integer = 27
declare #du as integer = 10
;with CTE as (
select
datefromparts(#yr, #mn, #dy) as dt
, #yr as yr
, #mn as mn
, #dy as dy
union all
select
dateadd(dd,1,cte.dt)
, datepart(year,dateadd(dd,1,cte.dt))
, datepart(month,dateadd(dd,1,cte.dt))
, datepart(day,dateadd(dd,1,cte.dt))
from cte
where cte.dt < dateadd(dd,#du-1,datefromparts(#yr, #mn, #dy))
)
select
*
from cte
This produces the following result:
+----+---------------------+------+----+----+
| | dt | yr | mn | dy |
+----+---------------------+------+----+----+
| 1 | 27.12.2018 00:00:00 | 2018 | 12 | 27 |
| 2 | 28.12.2018 00:00:00 | 2018 | 12 | 28 |
| 3 | 29.12.2018 00:00:00 | 2018 | 12 | 29 |
| 4 | 30.12.2018 00:00:00 | 2018 | 12 | 30 |
| 5 | 31.12.2018 00:00:00 | 2018 | 12 | 31 |
| 6 | 01.01.2019 00:00:00 | 2019 | 1 | 1 |
| 7 | 02.01.2019 00:00:00 | 2019 | 1 | 2 |
| 8 | 03.01.2019 00:00:00 | 2019 | 1 | 3 |
| 9 | 04.01.2019 00:00:00 | 2019 | 1 | 4 |
| 10 | 05.01.2019 00:00:00 | 2019 | 1 | 5 |
+----+---------------------+------+----+----+
and:
select * from mydb t
inner join cte on t.year = cte.yr and t.month = cte.mn and t.day = cte.dy
Instead of a recursive common table expression a table of integers may be used instead (or use a set unioned select queries to generate a set of integers) - often known as a tally table. The method one chooses will depend of dbms type and version being used.
Again, depending on database, it may be more efficient to persist the result seen above as a temporary table and add an index to that.

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

Calculating moving average using previous record as a variable

I want to calculate a moving average for a column, that requires an arithmetic calculation using parameters from the previous record.
I have records for a meter reading X, with dates, I want to perform a calculation
to determine the average rate, using formula
(reading x - reading y)/(reading date # x - reading date # y)
Where Y is always the meter reading from the previous record. The DATEDIFF is in days.
Meter | Reading | Date
-------+---------+------------
1 | 39,000 | 1 Jan 2016
1 | 39,200 | 1 Feb 2016
1 | 39,300 | 1 Mar 2016
I would like an additional column that inserts the calculated field,
it would have to read from the latest record, and process backwards -
since I have 2 years of readings, and not the first.
Meter | Reading | Date | Rate
------+---------+------------+--------------------
1 | 39,000 | 1 Aug 2016 | (200 / 31) = 6.45
1 | 39,200 | 1 Sep 2016 | (100 / 30) = 3.33
1 | 39,300 | 1 Oct 2016 | Z
I want to select this into a table for reporting.
-- EDIT --
I was getting Divide by 0 errors and decided to calculate the Reading X - Reading Y seperately as ReadingDiff.
LEAD(MeterReading, 1, 0) OVER (PARTITION BY MeterID ORDER BY MeterReading) - MeterReading AS MeterDiff
Because there are more than 1 MeterIDs in the select list, how would i prevent it from calculating a MeterDiff between the last record of MeterID 1 and the first record of MeterID 2 ? Can I not set the first record for each MeterID to 0 ?
It would be something like this:
select t.*,
( (reading - lag(reading) over (partition by meter order by date)) /
nullif(datediff(day, lag(date) over (partition by meter order by date), date), 0)
)
from t;
If reading is an integer, then be careful, because SQL Server does integer division. So, you might want:
select t.*,
( (1.0*reading - lag(reading) over (partition by meter order by date)) /
nullif(datediff(day, lag(date) over (partition by meter order by date), date), 0)
)
from t;
Note: lag() is ANSI standard functionality implemented in SQL Server since version 2012. Prior to that, you would need to use a more computationally intensive method, such as outer apply.

Populating a table with all dates in a given range in Google BigQuery

Is there any convenient way to populate a table with all dates in a given range in Google BigQuery? What I need are all dates from 2015-06-01 till CURRENT_DATE(), so something like this:
+------------+
| date |
+------------+
| 2015-06-01 |
| 2015-06-02 |
| 2015-06-03 |
| ... |
| 2016-07-11 |
+------------+
Optimally, the next step would be to also get all weeks between the two dates, i.e.:
+---------+
| week |
+---------+
| 2015-23 |
| 2015-24 |
| 2015-25 |
| ... |
| 2016-28 |
+---------+
I've been fiddling around with the following answers I found, but I can't get them to work, mostly because core functions aren't supported and I can't find proper ways to replace them.
Easiest way to populate a temp table with dates between and including 2 date parameters
Generate Dates between date ranges
Your help is very much appreciated!
Best,
Max
Mikhail's answer works for BigQuery's legacy sql syntax perfectly. This solution is a slightly easier one if you're using the standard SQL syntax.
BigQuery standard SQL syntax actually has a built in function, GENERATE_DATE_ARRAY for creating an array from a date range. It takes a start date, end date and INTERVAL. For example:
SELECT day
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 DAY)
) AS day
If you wanted the week and year you could use
SELECT EXTRACT(YEAR FROM day), EXTRACT(WEEK FROM day)
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 WEEK)
) AS day
all dates from 2015-06-01 till CURRENT_DATE()
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
all weeks between the two dates
SELECT YEAR(DAY) AS y, WEEK(DAY) AS w
FROM (
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
)
GROUP BY y, w