Convert arbitrary string values to timestamps SQL - sql

I am wondering if there is a way to convert arbitrary string values (such as the examples below) to something that can be interpreted as a timestamp, perhaps in days.
Dropdown_values
Desired Output(days)
12 weeks
84
1 Week 4 Days
11
1 Year
365
1 Year 1 Week 2 Days
374
The idea I had was to split part out the values since they are all separated by spaces and then do the addition in a separate column, are there other (better) ways to do this? Thank you.

To expand on my comment as an answer:
select extract(epoch from '12 Week'::interval)/86400; 84
select extract(epoch from '1 Year 1 Week 2 Days'::interval)/86400; 11
select extract(epoch from '1 Year 1 Week 2 Days'::interval)/86400; 374.25
The above is how I usually deal with this sort of thing. Extract the epoch value from a the interval and then divide by the number of seconds in a day. It would be a good idea to read in the docs this Interval input and Interval output to understand how an interval is constructed and returned and the assumptions used. Note: the queries will return a float value not a timestamp. A value like 84 cannot be timestamp. You could turn it into an interval like: 84 * '1 day'::interval 84 days. If at all possible it is good idea to store data as actual timestamps(start and end) and then derive intervals from that.

Related

Google Big Query to look at data of 2 specific dates

I am new to Big Query. I am trying to do a where condition to only select yesterday's data and that of same day last year (in this case, 10/25/2021 data and 10/25/2020 data). I know how to select a range of data, but I couldn't figure out a way to only select those 2 days of data. Any help is appreciated.
I recommend using BigQuery functions to define dates. You can read about them here.
WHERE DATE(your_date_field) IN ((DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY),
DATE_SUB(DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY), INTERVAL 1 YEAR))
This is dynamic to any day that you run the query. It will take the current date, then subtract 1 day. For the other date, it will take the current date and subtract 1 day and then 1 year, making it yesterday's date 1 year prior.
WHERE date_my_field IN (DATE('2021-10-25'), DATE('2020-10-25'))
Use IN which is a short cut for OR operator
Consider below (less verbose approach - especially if you remove time zone)
select
current_date('America/Los_Angeles') - 1 as yesterday,
date(current_date('America/Los_Angeles') - 1 - interval 1 year) same_day_last_year
with output
So, now you can use it in your WHERE clause as in below example (with dummy data via CTE)
with data as (
select your_date_field
from unnest(generate_date_array(current_date() - 1000, current_date())) your_date_field
)
select *
from data
where your_date_field in (
current_date('America/Los_Angeles') - 1,
date(current_date('America/Los_Angeles') - 1 - interval 1 year)
)
with output

Get last 30 days from table in SQL with YYYY-MM-DDtHH-MM-SS (BigQuery)

In BigQuery, I have a table of values with one column containing dates in YYYY-MM-DDtHH-MM-SS format, for example, 2020-07-24T20:13:35.
I want to pull only the rows from the past 30 days and exclude any rows that are more than 30 days old.
I believe I found out how to do it for date formatted as YYYY-MM-DD:
(Column name is "dates")
SELECT DATE_SUB(dates, INTERVAL 30 DAY)
This does not work when it is formatted as YYYY-MM-DDtHH-MM-SS though.
You would simply use:
where col > datetime_add(current_datetime, interval -30 day)
or
where col > timestamp_add(current_timestamp, interval -30 day)
depending on whether the column is a datetime or timestamp.
You can also use
Select * from table where date(date_column) >= date_sub(current_date, interval 30 day)
This way you will get all the records from 30 days before, starting from 12am. Using current_datetime or current_timestamp will only give results later than 30 days ago at time you run your query.

How to extract year from interval in postgres

I found out that the following line:
select extract(year from '2021-01-01'::timestamp - '2020-01-01');
returns 0.
Even if we go a bit further:
select extract(year from '2021-01-01'::timestamp - '2010-01-01');
The result is still 0.
I understand the rationale behind this. If we run a query to check the interval between consecutive New Years:
select '2021-01-01'::timestamp - '2020-01-01';
We're getting the following result:
0 years 0 mons 366 days 0 hours 0 mins 0.00 secs
1 year wouldn't be precise enough - it can mean 365 or 366 days.
Question: Is there an elegant method to retrieve year count from interval being the difference between two timestamps? Something like the first query, where I would expect result as 1.
You should use AGE instead of the difference:
SELECT EXTRACT(YEAR FROM AGE('2021-01-01'::TIMESTAMP, '2010-01-01'::TIMESTAMP));
date_part
-----------
11
See: https://www.postgresql.org/docs/current/functions-datetime.html
Subtract arguments, producing a “symbolic” result that uses years and months, rather than just days

How to select and group fortnightly in postgreql

I am trying to group the rows in a table fortnightly, but can't seem to work out how to do it - especially, as the date_part function does not have a 'fortnight' keyword argument.
This is what I have so far:
CREATE TABLE foo(
dt DATE NOT NULL,
f1 REAL NOT NULL,
f2 REAL NOT NULL,
f3 REAL NOT NULL,
f4 REAL NOT NULL
);
SELECT AVG((f1+f2+f3+f4)/4) as fld_avg FROM
(
SELECT date_part('year', dt) AS year_part,
date_part('fortnight', dt) AS fortnight_part,
f1, f2, f3, f4
FROM foo
WHERE dt >= date_trunc('day', NOW() - '3 month')
) foo
GROUP BY year_part, fortnight_part
How may I rewrite (or modify) the query above so as to group data fortnightly?
Basic idea
What we need to do, is take intervals of 14 consecutive days and map them to unique buckets and then group by those buckets. These buckets can of any type, int, char, timstamp, as long as we have unique value.
Division
A simple way to accomplish this is division. Divide by 14 days and truncate the result to date precision.
For example, we can extract the number of seconds since 1970-01-01, the UNIX epoch, and divide by the number of seconds in a fortnight: 14 * 24 * 60 * 60 = 14 * 86400 = 1209600. (I'll use Vao Tsun's example data)
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT (EXTRACT(EPOCH FROM d)::int/86400)/14 fortnight FROM c
which yields fortnights since 1970-01-01 (a Thursday):
fortnight
-----------
1251
1252
1254
1254
(4 rows)
The integer values we get, represent the number of fortnights since 1970-01-01, but we don't have to care about this. The important thing is, that it uniquely identifies a fortnight.
Due to 1970-01-01 being a Thursday, all fortnights will start at a Thursday. We might want to vary the starting point of our fortnight to a different day of the week (e.g. Monday) by adding:
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT (EXTRACT(EPOCH FROM d)::int/86400 + 4)/14 fortnight FROM c
By adding four days to Thursday we end up at Monday.
If you rather want fortnights with respect to the beginning of the year, instead of some arbitrary absolute date, such as 1970-01-01, we can use the day of the year instead:
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT EXTRACT(year FROM d) * 26 + EXTRACT(doy FROM d)::int/14 AS fortnight FROM c;
which yields
fortnight
-----------
52467
52468
52469
52470
(4 rows)
We need to multiply the extracted year by 26, because there are 26.1… fortnights in a year.
Truncation
Instead of division another approach is truncation. We map each day of a specific fortnight to the first timestamp of that fortnight.
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT d - make_interval(secs => EXTRACT(EPOCH FROM d)::int % (86400 * 14)) AS fortnight FROM c;
which yields
fortnight
---------------------
2017-12-14 00:00:00
2017-12-28 00:00:00
2018-01-25 00:00:00
2018-01-25 00:00:00
(4 rows)
This might seems a bit more complicated, but has some benefits. The result is still a date/time type and other code does not need to worry about the fact, that we used fortnights.
Again, instead of absolute fortnights, we can calculate this with respect to the beginning of the year:
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT d - make_interval(days => EXTRACT(dow FROM d)::int % 14) AS fortnight FROM c;
which yields
fortnight
---------------------
2017-12-17 00:00:00
2017-12-31 00:00:00
2018-01-21 00:00:00
2018-01-28 00:00:00
(4 rows)
The result is of type timestamp, you might want to have date instead. This can be addressed by casting:
(d - make_interval(days => EXTRACT(dow FROM d)::int % 14))::date
or subtracting int instead of interval from date:
d - (EXTRACT(dow FROM d)::int % 14)
There are much more possibilities. With this scheme, we can calculate the fortnight or any other interval with respect to the beginning of the month, some arbitrary date, etc.
update
fortnight is a two week period - one even the other odd. eg week 1 and 2, 3 and 4, 5 and 6.
closer: 2 is even, mod(2,2)=0 and 1 is odd, mod(1,2)=1
4 is even, mod(4,2)=0 and 3 is odd, mod(3,2)=1
6 is even, mod(6,2)=0 and 5 is odd, mod(5,2)=1
thus you can make an assumption that each one week's in year consecutive number divided by two reminder is 1, and each next one weeks number/2 reminders 0
The general idea is - using the sequential number of week in a year. To avoid Jan 1st to be first and Dec31 (possible be the 53rd - and thus two odds in a row), I use IW
week number of ISO 8601 week-numbering year (01-53; the first Thursday
of the year is in week 1)
then I assume that if one week number will be odd, next will be even, so we divide all the time in parts of two weeks - even+odd.
SQL Example:
o=# with c(d) as (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
select d,to_char(d,'IW'),right(to_char(d,'IW'),1)::int,mod(right(to_char(d,'IW'),1)::int, 2) from c;
d | to_char | right | mod
------------+---------+-------+-----
2017-12-21 | 51 | 1 | 1
2017-12-31 | 52 | 2 | 0
2018-01-26 | 04 | 4 | 0
2018-02-01 | 05 | 5 | 1
(4 rows)
mod is either 0 or 1 - group by this column
https://www.postgresql.org/docs/current/static/functions-math.html
https://www.postgresql.org/docs/current/static/functions-formatting.html
Of course you would need to add outer join on generate_series if you want data without gaps...
I post another answer to explain how I was wrong and why my "smart-n-neat"
way failed...
the schema build and queries are at:
https://www.db-fiddle.com/f/j5i2Td8CvxCVXQQYePKzCe/0
the first (and correct) query:
select distinct w2, avg(c) over (partition by w2)
from d
join generate_series('2016.11.28'::date,'2017.02.23'::date,'2 weeks'::interval) w2
on gs >= w2 and gs < w2 + '2 weeks'::interval
order by w2;
Is a long, simple and correct approach. with idea is to join on two weeks interval. It's working, reliable and all good.
Now the second query:
select distinct div(to_char(gs,'IW')::int,2), min(gs) over w, avg(c) over w
from d
window w as (partition by div(to_char(gs,'IW')::int,2))
order by min;
Is much shorter, neater and smarter, yet has a huge limitation and is unusable. Here's why:
My approach splits next to last two-weeks-interval to two parts: last week of 2016 and first week of 2017, thus dividing the result by half. If you multiply a sum of averages for those two weeks by a half, the result for both queries will match. Alas introducing CASE WHEN logic for the edge year weeks makes neat solution a heavy and overhead. And thus the very point is lost.
TL;DR the neat and lightweight solution works only on interval of one year, farther then two weeks from end or start of the year and lastly if our fortnightly interval starts from Monday.
Now the idea behind lightweight solution: round(2/2, 0)=1 and round(3/2, 0)=1, so you can divide year in intervals of two weeks and use it for grouping by.
Also I deliberately took not this New Year switch, because this 2018 Jan 1 is Monday, so IW is same as WW - which usually is not the case.
Lastly my first answer with odd and even weeks is not viable at all. It divides year not in two-weeks interval, but rather in two parts - for even and odd weeks... I deceived myself with "something close" idea and worked on the reminder, while I should do the opposite the whole value of division...

How does the INTERVAL datatype work?

I'm struggling to understand how INTERVAL works .
SELECT INTERVAL '300' MONTH,
INTERVAL '54-2' YEAR TO MONTH,
INTERVAL '11-12:10.1234567' HOUR TO SECOND
FROM dual;
The output is shown as follows :
+25-00, +54-02, +00 11:12:10.1234567
What I do not get, is why does the first column show as +25 , but then the 2nd and 3rd columns are simply exactly the same?
An interval of 300 months is exactly the same thing as an interval of 25 years, and they are of the same data type too: year intervals, year-month intervals and month intervals are just three ways of expressing the same type.
You're shown +25-00 because one of the representations had to be picked. It could have been any of them, really.
There are two types of INTERVAL values: year-to-month and day-to-second (which includes sub-second precision). If your interval contains a year and/or month, it will be represented as years and months. Thus, 300 months is shown as 25 years, 0 months and 54 years, 2 months is shown as the same.
If you notice, your day-to-second interval is not displayed the same. There is a leading 00, which represents the number of days, in this case zero. If you had instead used
SELECT INTERVAL '36:12:10.1234567' HOUR TO SECOND FROM dual
You would have seen the following output:
+01 12:12:10.1234567
So, your interval value will always be displayed as either year-to-month or day-to-second, regardless if you are not actually using all of the precision. This final example should make everything clear and obvious:
SELECT INTERVAL '75:00' HOUR TO MINUTE FROM dual
+00 01:15:00.000000