I have a dataset where certain operations occur during the overnight hours which I'd like to attribute to the day before.
For example, anything happening between 2/23 8pm and 2/24 6am should be included in 2/23's metrics rather than 2/24. Anything from 6:01 am to 7:59pm should be counted in 2/24's metrics.
I've seen a few posts about decrementing time by 6 hours but that doesn't work in this case.
Is there a way to use an If function to specify that midnight-6am should be counted as date-1 rather than date without affecting the metrics for the 6am - 7:59pm hours?
Thanks in advance! Also, a SQL newbie here so apologies if I have lots of followup questions.
You can use date_add with -6 hours and then optionally cast the timestamp as a date.
create table t (dcol datetime);
insert into t values
('2022-02-25 06:01:00'),
('2022-02-25 06:00:00'),
('2022-02-25 05:59:00');
SELECT CAST(DATE_ADD(dcol, INTERVAL -6 HOUR)AS DATE) FROM t;
| CAST(DATE_ADD(dcol, INTERVAL -6 HOUR)AS DATE) |
| :-------------------------------------------- |
| 2022-02-25 |
| 2022-02-25 |
| 2022-02-24 |
db<>fiddle here
As said in the comments, your requirement is the occurrences in a 6 AM to 6 AM day instead of a 12-12 day. You can achieve this by decreasing the time by 6 hours as shown in #Kendle’s answer. Another way to do it is to use an IF condition as shown below. Here, the date is decremented if the time is before 6 AM on each day and the new date is put in a new column.
Query:
SELECT
IF
(TIME(eventTime) <= "06:00:00",
DATE_ADD(DATE(eventTime), INTERVAL -1 DAY),
DATE(eventTime)) AS newEventTime
FROM
`project.dataset.table`
ORDER BY
eventTime;
Output from sample data:
As seen in the output, timestamps before 6 AM are considered for the previous day while the ones after are considered in the current day.
I am trying to group the rows in a table fortnightly, but can't seem to work out how to do it - especially, as the date_part function does not have a 'fortnight' keyword argument.
This is what I have so far:
CREATE TABLE foo(
dt DATE NOT NULL,
f1 REAL NOT NULL,
f2 REAL NOT NULL,
f3 REAL NOT NULL,
f4 REAL NOT NULL
);
SELECT AVG((f1+f2+f3+f4)/4) as fld_avg FROM
(
SELECT date_part('year', dt) AS year_part,
date_part('fortnight', dt) AS fortnight_part,
f1, f2, f3, f4
FROM foo
WHERE dt >= date_trunc('day', NOW() - '3 month')
) foo
GROUP BY year_part, fortnight_part
How may I rewrite (or modify) the query above so as to group data fortnightly?
Basic idea
What we need to do, is take intervals of 14 consecutive days and map them to unique buckets and then group by those buckets. These buckets can of any type, int, char, timstamp, as long as we have unique value.
Division
A simple way to accomplish this is division. Divide by 14 days and truncate the result to date precision.
For example, we can extract the number of seconds since 1970-01-01, the UNIX epoch, and divide by the number of seconds in a fortnight: 14 * 24 * 60 * 60 = 14 * 86400 = 1209600. (I'll use Vao Tsun's example data)
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT (EXTRACT(EPOCH FROM d)::int/86400)/14 fortnight FROM c
which yields fortnights since 1970-01-01 (a Thursday):
fortnight
-----------
1251
1252
1254
1254
(4 rows)
The integer values we get, represent the number of fortnights since 1970-01-01, but we don't have to care about this. The important thing is, that it uniquely identifies a fortnight.
Due to 1970-01-01 being a Thursday, all fortnights will start at a Thursday. We might want to vary the starting point of our fortnight to a different day of the week (e.g. Monday) by adding:
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT (EXTRACT(EPOCH FROM d)::int/86400 + 4)/14 fortnight FROM c
By adding four days to Thursday we end up at Monday.
If you rather want fortnights with respect to the beginning of the year, instead of some arbitrary absolute date, such as 1970-01-01, we can use the day of the year instead:
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT EXTRACT(year FROM d) * 26 + EXTRACT(doy FROM d)::int/14 AS fortnight FROM c;
which yields
fortnight
-----------
52467
52468
52469
52470
(4 rows)
We need to multiply the extracted year by 26, because there are 26.1… fortnights in a year.
Truncation
Instead of division another approach is truncation. We map each day of a specific fortnight to the first timestamp of that fortnight.
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT d - make_interval(secs => EXTRACT(EPOCH FROM d)::int % (86400 * 14)) AS fortnight FROM c;
which yields
fortnight
---------------------
2017-12-14 00:00:00
2017-12-28 00:00:00
2018-01-25 00:00:00
2018-01-25 00:00:00
(4 rows)
This might seems a bit more complicated, but has some benefits. The result is still a date/time type and other code does not need to worry about the fact, that we used fortnights.
Again, instead of absolute fortnights, we can calculate this with respect to the beginning of the year:
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT d - make_interval(days => EXTRACT(dow FROM d)::int % 14) AS fortnight FROM c;
which yields
fortnight
---------------------
2017-12-17 00:00:00
2017-12-31 00:00:00
2018-01-21 00:00:00
2018-01-28 00:00:00
(4 rows)
The result is of type timestamp, you might want to have date instead. This can be addressed by casting:
(d - make_interval(days => EXTRACT(dow FROM d)::int % 14))::date
or subtracting int instead of interval from date:
d - (EXTRACT(dow FROM d)::int % 14)
There are much more possibilities. With this scheme, we can calculate the fortnight or any other interval with respect to the beginning of the month, some arbitrary date, etc.
update
fortnight is a two week period - one even the other odd. eg week 1 and 2, 3 and 4, 5 and 6.
closer: 2 is even, mod(2,2)=0 and 1 is odd, mod(1,2)=1
4 is even, mod(4,2)=0 and 3 is odd, mod(3,2)=1
6 is even, mod(6,2)=0 and 5 is odd, mod(5,2)=1
thus you can make an assumption that each one week's in year consecutive number divided by two reminder is 1, and each next one weeks number/2 reminders 0
The general idea is - using the sequential number of week in a year. To avoid Jan 1st to be first and Dec31 (possible be the 53rd - and thus two odds in a row), I use IW
week number of ISO 8601 week-numbering year (01-53; the first Thursday
of the year is in week 1)
then I assume that if one week number will be odd, next will be even, so we divide all the time in parts of two weeks - even+odd.
SQL Example:
o=# with c(d) as (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
select d,to_char(d,'IW'),right(to_char(d,'IW'),1)::int,mod(right(to_char(d,'IW'),1)::int, 2) from c;
d | to_char | right | mod
------------+---------+-------+-----
2017-12-21 | 51 | 1 | 1
2017-12-31 | 52 | 2 | 0
2018-01-26 | 04 | 4 | 0
2018-02-01 | 05 | 5 | 1
(4 rows)
mod is either 0 or 1 - group by this column
https://www.postgresql.org/docs/current/static/functions-math.html
https://www.postgresql.org/docs/current/static/functions-formatting.html
Of course you would need to add outer join on generate_series if you want data without gaps...
I post another answer to explain how I was wrong and why my "smart-n-neat"
way failed...
the schema build and queries are at:
https://www.db-fiddle.com/f/j5i2Td8CvxCVXQQYePKzCe/0
the first (and correct) query:
select distinct w2, avg(c) over (partition by w2)
from d
join generate_series('2016.11.28'::date,'2017.02.23'::date,'2 weeks'::interval) w2
on gs >= w2 and gs < w2 + '2 weeks'::interval
order by w2;
Is a long, simple and correct approach. with idea is to join on two weeks interval. It's working, reliable and all good.
Now the second query:
select distinct div(to_char(gs,'IW')::int,2), min(gs) over w, avg(c) over w
from d
window w as (partition by div(to_char(gs,'IW')::int,2))
order by min;
Is much shorter, neater and smarter, yet has a huge limitation and is unusable. Here's why:
My approach splits next to last two-weeks-interval to two parts: last week of 2016 and first week of 2017, thus dividing the result by half. If you multiply a sum of averages for those two weeks by a half, the result for both queries will match. Alas introducing CASE WHEN logic for the edge year weeks makes neat solution a heavy and overhead. And thus the very point is lost.
TL;DR the neat and lightweight solution works only on interval of one year, farther then two weeks from end or start of the year and lastly if our fortnightly interval starts from Monday.
Now the idea behind lightweight solution: round(2/2, 0)=1 and round(3/2, 0)=1, so you can divide year in intervals of two weeks and use it for grouping by.
Also I deliberately took not this New Year switch, because this 2018 Jan 1 is Monday, so IW is same as WW - which usually is not the case.
Lastly my first answer with odd and even weeks is not viable at all. It divides year not in two-weeks interval, but rather in two parts - for even and odd weeks... I deceived myself with "something close" idea and worked on the reminder, while I should do the opposite the whole value of division...
In my Spiceworks database there is a table, tickets, with two columns I am concerned with, first_response_secs and created_at.
I have been tasked with finding the average response time of tickets for every week.
So if I run the following query:
select AVG(first_response_secs) from (
select first_response_secs,created_at
from tickets
where created_at BETWEEN '2017-03-19' and '2017-03-25'
)
I will get back the average first response seconds for that week. But that's as far as my limited SQL gets me. I need 6 months worth of data and I don't want to manually edit the date range and rerun the query 24 times.
I would like to write a query that will return output similar to the following:
WEEK AVERAGE RESPONSE TIME(secs)
-----------------------------------------------------------
2017-02-26 - 2017-03-04 21447
2017-03-05 - 2017-03-11 20564
2017-03-12 - 2017-03-18 25883
2017-03-19 - 2017-03-25 12244
Or something like that, back 6 months.
Weeks are tricky. How about:
select min(created_at) as weekstart, first_response_secs, created_at
from tickets
group by floor(julianday('2017-03-25) - julianday(created_at)) % 7 = 0
order by weekstart
One dirty way is to use case to define week boundaries:
select week, avg(first_response_secs)
from (
select case
when created_at between '2017-02-26' and '2017-03-04' then '2017-02-26 - 2017-03-04'
when created_at between '2017-03-05' and '2017-03-11' then '2017-03-05 - 2017-03-11'
when created_at between '2017-03-12' and '2017-03-18' then '2017-03-12 - 2017-03-18'
when created_at between '2017-03-19' and '2017-03-25' then '2017-03-19 - 2017-03-25'
end as week,
first_response_secs
from tickets
) t
group by week;
Demo
Note that this method is a general purpose one and can be modified to change the boundaries as you wish.
I am trying to add a filter condition in the DB2 database. I am new to it and come from an Oracle background. I am trying to get records with dates in between today at 4 AM and today at 5 PM only. I currently have the below query that returns zero results:
db2 => select datetimeColumn from datetimeExample WHERE datetimeColumn BETWEEN timestamp(current date) - 1 day + 4 hour AND timestamp(current date) - 1 day + 13 hour
DATETIMECOLUMN
--------------------------
0 record(s) selected.
And here is the data in the table that I believe should show but there is something wrong with condition statement, any help is appreciated
db2 => select * from datetimeExample
DATETIMECOLUMN
--------------------------
2016-06-16-09.38.53.759000
1988-12-25-17.12.30.000000
2016-12-25-17.10.30.000000
2016-06-16-04.10.30.000000
2016-06-16-05.10.30.000000
1988-12-25-15.12.30.000000
1988-12-25-14.12.30.000000
2016-06-16-12.10.30.000000
2016-06-16-07.10.30.000000
2016-06-16-08.10.30.000000
10 record(s) selected.
The query should work when you leave out the - 1 day. The reason is that timestamp(current date) returns the timestamp for today at zero hours. Then you add 4 hours and are at the required start time. Similar maths for the end time (and 5 pm should be + 17 hours).
select datetimeColumn from datetimeExample
WHERE datetimeColumn
BETWEEN timestamp(current date) + 4 hours AND timestamp(current date) + 17 hours
I have a date field in a hive table following this format:
'YYYY-MM-DD'
I'm looking for a function (let's call this yet-to-be-discovered-by-me function dayofweek()) that will return 'friday' when run on today's date. So, to be explicitly clear, this query:
SELECT DAYOFWEEK(DT.ds), DT.ds
FROM dateTable DT
WHERE DT.ds = '2014-11-14'
LIMIT 1
would return this value:
'friday' '2014-11-14'
Any help would be greatly appreciated :) google searching as of yet unfruitful.
Clark
P.S. The response to this question did not work for me...error returned: 'Invalid function 'dayofweek''
Should you care for an equation, following is C code hopefully simple enough to translate into SQL.
Important to use integer math.
#define MARCH 3
int dow_Sunday0(int year, int month, int day) {
if (month < MARCH) {
month += 12;
year--;
}
// Add days for each year and leap years
day += year + year/4 - year/100 + year/400;
// add days for the month
day += month*30 + ((month-MARCH)*39 + 25)/64;
// modulo 7
return (day+3)%7;
}
This works for valid Gregorian calendar dates.
Use DAYNAME() function, like this:
mysql> select dayname('2014-11-14');
+-----------------------+
| dayname('2014-11-14') |
+-----------------------+
| Friday |
+-----------------------+
1 row in set (0.00 sec)
So, your query will become:
SELECT DAYNAME(DT.ds), DT.ds
FROM dateTable DT
WHERE DT.ds = '2014-11-14'
LIMIT 1