I want to calculate the number of orders per each time interval for each day.
The format of the date is timestamp without timezone. I can't seem to extract only the time. I use this query for each day, but is there a way to have the time intervals for each day in the month in one table?
CASE WHEN date_created_utc >= timestamp '2020-09-01 08:00:00' AND date_created_utc <= timestamp '2020-09-01 11:00:00' THEN 'Q1'
WHEN date_created_utc >= timestamp '2020-09-01 11:00:01' AND date_created_utc <= timestamp '2020-09-01 14:00:00' THEN 'Q2'
WHEN date_created_utc >= timestamp '2020-09-01 14:00:01' AND date_created_utc <= timestamp '2020-09-01 16:00:00' THEN 'Q3'
WHEN date_created_utc >= timestamp '2020-09-01 16:00:01' AND date_created_utc <= timestamp '2020-09-01 20:00:00' THEN 'Q4'
WHEN date_created_utc >= timestamp '2020-09-01 20:00:01' AND date_created_utc <= timestamp '2020-09-01 23:59:00' THEN 'Q5'
END AS interval,
COUNT(id) as cnt
FROM order_processing
GROUP BY 1;
The desired output table:
Day Q1 Q2 Q3 Q4 Q5
1 28 57 50 65 27
2 23 50 60 90 66
3 58 60 80 70 67
You can convert to a time and then use comparisons. For aggregation:
COUNT(*) FILTER (WHERE date_created_utc::time >= '08:00:00' and date_created_utc::time < '11:00:00') as cnt_1
You just need the hour part to implement the logic: you can use extract():
select
date_created_utc::date day,
count(*) filter(where extract(hour from date_created_utc) between 8 and 10) q1,
count(*) filter(where extract(hour from date_created_utc) between 11 and 14) q2,
...
from order_processing
group by date_created_utc::date
Related
Edit 1: so the issue is '<=' is acting as '<' in google query which is
strange. But '>=' acts normally. Any idea why this is happening?
Goal: to get data for May 2019.
Info about database here: https://packaging.python.org/en/latest/guides/analyzing-pypi-package-downloads/
Query 1 uses timestamp > '2019-04-30' AND timestamp < '2019-06-01'
SELECT file.project AS package, COUNT(file.project) AS installs, FORMAT_DATETIME('%Y-%m', timestamp) AS month
FROM `bigquery-public-data.pypi.file_downloads`
WHERE timestamp > '2019-04-30' AND timestamp < '2019-06-01'
GROUP BY month, package;
Query 2 uses timestamp >= '2019-05-01' AND timestamp <= '2019-05-31'
SELECT file.project AS package, COUNT(file.project) AS installs, FORMAT_DATETIME('%Y-%m', timestamp) AS month
FROM `bigquery-public-data.pypi.file_downloads`
WHERE timestamp >= '2019-05-01' AND timestamp <= '2019-05-31'
GROUP BY month, package;
Both query one and two should scan same amount of data - May 2019 but both query gives different results and scans different amount of data as you can see in attached images.
Which one is correct and why both are not matching?
You're comparing timestamp with a date literal. When a date literal is implicitly cast as timestamp, it will have '00:00:00' time.
Query 1 uses timestamp > '2019-04-30' AND timestamp < '2019-06-01'
This is same as
timestamp > '2019-04-30 00:00:00 UTC' AND timestamp < '2019-06-01 00:00:00 UTC'
which includes data between 2019-04-30 00:00:01 UTC and 2019-04-30 23:59:59 UTC.
Query 2 uses timestamp >= '2019-05-01' AND timestamp <= '2019-05-31'
same as
timestamp >= '2019-05-01 00:00:00 UTC' AND timestamp <= '2019-05-31 00:00:00 UTC'
in this case, you're missing data between 2019-05-31 00:00:01 UTC and 2019-05-31 23:59:59 UTC which is incorrect.
Correct Condition
You might want to use:
timestamp >= '2019-05-01' AND timestamp < '2019-06-01'
Note that since BEWEEN condition is inclusive, following conditions will not be what you want also.
WHERE timestamp BETWEEN '2019-05-01' AND '2019-05-31' --> this will ignore data on last day of May except '2019-05-31 00:00:00 UTC'.
or
WHERE timestamp BETWEEN '2019-05-01' AND '2019-06-01' --> this will include '2019-06-01 00:00:00 UTC' data like below screenshot.
SELECT EXTRACT(MONTH FROM timestamp) month, COUNT(1) cnt
FROM `bigquery-public-data.pypi.file_downloads`
WHERE timestamp BETWEEN '2019-05-01' AND '2019-06-01' -- scan 22.57 GB
GROUP BY 1
(update)
SELECT EXTRACT(DAY FROM timestamp) day, COUNT(1) cnt
FROM `bigquery-public-data.pypi.file_downloads`
WHERE timestamp BETWEEN '2019-05-29' AND '2019-05-31'
GROUP BY 1
;
output:
+-----+-----+-----------+
| Row | day | cnt |
+-----+-----+-----------+
| 1 | 30 | 116744449 |
| 2 | 29 | 120865824 |
| 3 | 31 | 1027 | -- should be 112116613
+-----+-----+-----------+
The two filters are different, you can simply check the difference in the result by the below script.
Differences
SELECT timestamp, FORMAT_DATETIME('%Y-%m', timestamp) AS month
FROM `bigquery-public-data.pypi.file_downloads`
WHERE
timestamp > '2019-04-30' AND timestamp < '2019-06-01'
AND NOT (timestamp >= '2019-05-01' AND timestamp <= '2019-05-31')
;
Results
Personal Preference
SELECT file.project AS package, COUNT(file.project) AS installs, FORMAT_DATETIME('%Y-%m', timestamp) AS month
FROM `bigquery-public-data.pypi.file_downloads`
WHERE timestamp BETWEEN '2019-05-01' AND '2019-05-31'
p.s. As you can check out in the doc, the ordering of the standard SQL is below. The filter of WHERE happens before the SELECT, thus you might want to store the result of the SELECT statement and do the filtering to filter by date, not datetime.
FROM -> WHERE -> GROUP BY -> HAVING -> ...
I am trying to calculate the worked hours for specific days in Google BigQuery (SQL).
The pay wage is $10 when you work on a day time but $15 when you work on a night time.
Day time is defined as 6am to 10pm whereas night time is defined as 10pm to 6am.
Employees can work flexibly as they are limousine drivers.
The following is an example of my table:
id
start_at
end_at
date
abc123
04:00:00
07:00:00
2020-01-05
abc123
09:00:00
15:32:00
2020-01-05
abc123
23:00:00
23:35:00
2020-01-05
abc123
23:40:00
23:59:00
2020-01-05
abc123
23:59:00
01:35:00
2020-01-05
abc123
02:02:00
04:35:00
2020-01-06
abc123
05:40:00
06:59:00
2020-01-06
So the actual work hours is calculated by taking the difference between start_at and end_at but the day time and night time conditions are becoming a hassle in my query..
*the date column is based on start_at. Even when you start at 11:59pm and end at the next day 12:05am, the date follows the date of the start_at instead of end_at.
Any ideas? Thanks in advance!
Consider below solution
create temp function night_day_split(start_at time, end_at time, date date) as (array(
select as struct
extract(date from time_point) day,
if(extract(hour from time_point) between 6 and 22, 'day', 'night') day_night,
count(1) minutes
from unnest(generate_timestamp_array(
timestamp(datetime(date, start_at)),
timestamp(datetime(if(start_at < end_at, date, date + 1), end_at)),
interval 1 minute
)) time_point
group by 1, 2
));
select id, day,
sum(if(day_night = 'day', minutes, null)) day_minutes,
sum(if(day_night = 'night', minutes, null)) night_minutes
from yourtable,
unnest(night_day_split(start_at, end_at, date)) v
group by id, day
if applied to sample data in your question - output is
You can try following code :-
with mytable as (
select 'abc123' id, cast( '04:00:00' as time) start_dt, cast( '07:00:00' as time) end_dt, date('2020-01-05' ) date union all
select 'abc123', cast( '09:00:00' as time), cast( '15:32:00' as time), date('2020-01-05') union all
select 'abc123', cast( '23:00:00' as time), cast( '23:35:00' as time), date('2020-01-05' ) union all
select 'abc123', cast('23:40:00' as time), cast( '23:59:00' as time), date('2020-01-05') union all
select 'abc123', cast ('23:59:00' as time), cast( '01:35:00' as time), date('2020-01-05') union all
select 'abc123', cast('02:02:00' as time), cast( '04:35:00' as time), date('2020-01-06') union all
select 'abc123', cast('05:40:00' as time), cast( '06:59:00' as time), date('2020-01-06')
)
select id, date, sum (value) as sal from(
select id, date,
case when start_dt > cast( '06:00:00' as time) and end_dt < cast( '22:00:00' as time) and start_dt < end_dt then (time_diff(end_dt, start_dt, Minute)/60) * 10
when start_dt < cast( '06:00:00' as time) and end_dt < cast( '06:00:00' as time) then (time_diff(end_dt, start_dt, Minute)/60) * 15
when start_dt < cast( '06:00:00' as time) and end_dt < cast( '22:00:00' as time) then (time_diff(cast( '06:00:00' as time), start_dt, Minute)/60) * 15 + (time_diff( end_dt,cast( '06:00:00' as time), Minute)/60) * 10
when start_dt > cast( '22:00:00' as time) and end_dt < cast( '06:00:00' as time) then (time_diff(cast( '23:59:00' as time), start_dt, Minute)/60) * 15 + (time_diff( end_dt,cast( '00:00:00' as time), Minute)/60) * 15
when start_dt > cast( '22:00:00' as time) and end_dt > cast( '22:00:00' as time) then (time_diff(end_dt, start_dt, Minute)/60) * 15
else 0
end as value
from mytable) group by id, date
Output :-
You can further group by on month for monthly salary.
How to convert dates to ISO week date in Impala SQL?
For example 2019-12-30 in the ISO week date calendar would be written as 2020-W01-1 or 2020W011
ANSWER:
Marked Gordon Linoff answer as correct, as it solves the essential part of the question, the deducing of the year part of ISO week date.
For the week part of the ISO week date there is a ready function, and the day part of the ISO week date can be easily converted from Sunday starting week to Monday starting week.
The query below contains all week dates from Monday to Sunday:
select datecol,
concat(cast(iso_year as string),'-W',lpad(cast(iso_week as string),2,'0'),'-',cast(iso_day as string)) as iso_Year_week_date_long,
concat(cast(iso_year as string),'W',lpad(cast(iso_week as string),2,'0'),cast(iso_day as string)) as iso_Year_week_date_short
from (
SELECT datecol,
(case when weekofyear(datecol) = 1 and
date_part('year',datecol) <> date_part('year',adddate(datecol,+7))
then date_part('year',datecol) + 1
when weekofyear(datecol) in (52, 53) and
date_part('year',datecol) <> date_part('year',adddate(datecol,-7))
then date_part('year',datecol) - 1
else date_part('year',datecol)
end) as iso_year,
weekofyear(datecol) as iso_week,
1+mod(dayofweek(datecol)+5,7) as iso_day
from (
select '2021-12-31' as datecol union
select '2020-12-31' as datecol union
select '2019-12-31' as datecol union
select '2018-12-31' as datecol union
select '2017-12-31' as datecol union
select '2016-12-31' as datecol union
select '2015-12-31' as datecol union
select '2014-12-31' as datecol union
select '2013-12-31' as datecol union
select '2012-12-31' as datecol union
select '2022-01-01' as datecol union
select '2021-01-01' as datecol union
select '2020-01-01' as datecol union
select '2019-01-01' as datecol union
select '2018-01-01' as datecol union
select '2017-01-01' as datecol union
select '2016-01-01' as datecol union
select '2015-01-01' as datecol union
select '2014-01-01' as datecol union
select '2013-01-01' as datecol
) as t1
) as t2
order by datecol;
and shows how January 1st belongs to
the new year, if January 1st is 1st, 2nd, 3rd or 4th day of the week, i.e., if there are at least 4 new year days in the week containing January 1st
the old year, if January 1st is 5th, 6th or 7th day of the week, i.e., if there are 3 or less new year days in the week containing January 1st
datecol |iso_year_week_date_long|iso_year_week_date_short|
----------|-----------------------|------------------------|
2014-12-31|2015-W01-3 |2015W013 |
2015-01-01|2015-W01-4 |2015W014 |
2015-12-31|2015-W53-4 |2015W534 |
2016-01-01|2015-W53-5 |2015W535 |
2016-12-31|2016-W52-6 |2016W526 |
2017-01-01|2016-W52-7 |2016W527 |
2017-12-31|2017-W52-7 |2017W527 |
2018-01-01|2018-W01-1 |2018W011 |
2018-12-31|2019-W01-1 |2019W011 |
2019-01-01|2019-W01-2 |2019W012 |
2019-12-31|2020-W01-2 |2020W012 |
2020-01-01|2020-W01-3 |2020W013 |
2020-12-31|2020-W53-4 |2020W534 |
2021-01-01|2020-W53-5 |2020W535 |
I think Impala returns the iso week for date_part() and extract() -- based on your previous question. There is no documentation to this effect.
If so, you can use conditional logic:
select (case when date_part(week, datecol) = 1 and
date_part(year, datecol) <> date_part(year, datecol + interval 1 week)
then date_part(year, datecol) + 1
when date_part(week, datecol) in (52, 53) and
date_part(year, datecol) <> date_part(year, datecol - interval 1 week)
then date_part(year, datecol) - 1
else date_part(year, datecol)
end) as iso_year,
date_part(week, datecol) as iso_week
Otherwise, you can get the first day of the iso year using:
select (case when to_char('DD', date_trunc(year, datecol), 'DD') in ('THU', 'FRI', 'SAT', 'SUN')
then next_day(date_trunc(year, date_trunc(year, datecol)), 'Monday')
else next_day(date_trunc(year, date_trunc(year, datecol)), 'Monday') - interval 7 day
end) as iso_year_start
You can then calculate the iso week from the start of the iso year using arithmetic.
For example 2019-12-30 in the ISO week date calendar would be written as 2020-W01-1 or 2020W011.
We could make use of string format:
select cast(cast('2019-12-30' as date format 'YYYY-MM-DD') as string format 'iyyy-iw-id')
Returns:
"2020-01-01"
i have two table
the first table contains the record of a ticket with start date and end date
start_date | End_Date
21-02-2017 07:52:32 | 22-02-2017 09:56:32
21-02-2017 09:52:32 | 23-02-2017 17:52:32
the second table contains the details of the weekly shift:
shift_day | Start_Time | End_Time
MON 9:00 18:00
TUE 10:00 19:00
WED 9:00 18:00
THU 10:00 19:00
FRI 9:00 18:00
I am looking to get the time difference in the first table which will only include the time as per the second table.
Use a recursive sub-query factoring clause to generate each day within your time ranges and then correlate that with your shifts to restrict the time for each day to be within the shift hours and then aggregate to get the total:
Oracle 18 Setup:
CREATE TABLE times ( start_date, End_Date ) AS
SELECT DATE '2017-02-21' + INTERVAL '07:52:32' HOUR TO SECOND,
DATE '2017-02-22' + INTERVAL '09:56:32' HOUR TO SECOND
FROM DUAL
UNION ALL
SELECT DATE '2017-02-21' + INTERVAL '09:52:32' HOUR TO SECOND,
DATE '2017-02-23' + INTERVAL '17:52:32' HOUR TO SECOND
FROM DUAL;
CREATE TABLE weekly_shifts ( shift_day, Start_Time, End_Time ) AS
SELECT 'MON', INTERVAL '09:00' HOUR TO MINUTE, INTERVAL '18:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 'TUE', INTERVAL '10:00' HOUR TO MINUTE, INTERVAL '19:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 'WED', INTERVAL '09:00' HOUR TO MINUTE, INTERVAL '18:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 'THU', INTERVAL '10:00' HOUR TO MINUTE, INTERVAL '19:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 'FRI', INTERVAL '09:00' HOUR TO MINUTE, INTERVAL '18:00' HOUR TO MINUTE FROM DUAL;
Query 1:
WITH days ( id, start_date, day_start, day_end, end_date ) AS (
SELECT ROWNUM,
start_date,
start_date,
LEAST( TRUNC( start_date ) + INTERVAL '1' DAY, end_date ),
end_date
FROM times
UNION ALL
SELECT id,
start_date,
day_end,
LEAST( day_end + INTERVAL '1' DAY, end_date ),
end_date
FROM days
WHERE day_end < end_date
)
SELECT start_date,
end_date,
SUM( shift_end - shift_start ) AS days_worked_on_shift
FROM (
SELECT ID,
start_date,
end_date,
GREATEST( day_start, TRUNC( day_start ) + start_time ) AS shift_start,
LEAST( day_end, TRUNC( day_start ) + end_time ) AS shift_end
FROM days d
INNER JOIN
weekly_shifts w
ON ( TO_CHAR( d.day_start, 'DY' ) = w.shift_day )
)
GROUP BY id, start_date, end_date;
Result:
START_DATE END_DATE DAYS_WORKED_ON_SHIFT
------------------- ------------------- --------------------
2017-02-21 07:52:32 2017-02-22 09:56:32 0.414259259259259259
2017-02-21 09:52:32 2017-02-23 17:52:32 1.078148148148148148
I'm working in an Oracle DB and I'm trying to convert from 12 hours to 24 hours.
I have currently updated the HH to HH24 and I'm still seeing the full 24hour time and in fact it's only pulling the first 12 hours of the day. The below is my working query with no errors and output correct results besides the other missing 12 hours.
SELECT
CASE(EXTRACT(HOUR FROM a.TIME))
WHEN 1 THEN '1'
WHEN 2 THEN '2'
WHEN 3 THEN '3'
WHEN 4 THEN '4'
WHEN 5 THEN '5'
WHEN 6 THEN '6'
WHEN 7 THEN '7'
WHEN 8 THEN '8'
WHEN 9 THEN '9'
WHEN 10 THEN '10'
WHEN 11 THEN '11'
WHEN 12 THEN '12'
WHEN 13 THEN '13'
WHEN 14 THEN '14'
WHEN 15 THEN '15'
WHEN 16 THEN '16'
WHEN 17 THEN '17'
WHEN 18 THEN '18'
WHEN 19 THEN '19'
WHEN 20 THEN '20'
WHEN 21 THEN '21'
WHEN 22 THEN '22'
WHEN 23 THEN '23'
WHEN 24 THEN '24'
ELSE 'OTHERS' END AS "ENTRY_TIME_HOUR",
COUNT(*) AS "TOTAL_WITHIN_THE_HOUR"
FROM Table1 a
WHERE
a.TIME >= TO_DATE('2018/01/05 01:00:01', 'YYYY/MM/DD HH24:MI:SS')
AND a.TIME <= TO_DATE('2018/01/05 12:59:59', 'YYYY/MM/DD HH24:MI:SS')
GROUP BY EXTRACT(HOUR FROM a.TIME)
The above query would output something like below but only up to 12 (should be 1-24)
ENTRY_TIME_HOUR TOTAL_WITHIN_THE_HOUR
11 68
8 3
9 83
10 26
12 62
In addition, I have found the TO_CHAR to be useful but I could not run the TO_CHAR and TO_DATE together. The ultimate goal is to SUM all outputs on the TIME for the given hour. The output would return 24 lines with a total count for each hour.
The below is the seperate query that would output the FULL time (not just the hour HH):
SELECT
to_char(a.TIME, 'DD/MM/YYYY HH24:MI:SS') AS "TIME in 24"
FROM TABLE a
WHERE
AND a.TIME >= TO_DATE('2018/01/05 01:00:01', 'YYYY/MM/DD HH24:MI:SS')
AND a.TIME <= TO_DATE('2018/01/05 23:59:59', 'YYYY/MM/DD HH24:MI:SS')
The above query would provide like the below:
05/01/2018 15:00:40
05/01/2018 16:01:45
05/01/2018 09:59:51
05/01/2018 10:04:58
However, I'm not able to merge the TO_CHAR and TO_DATE queries together without running into multiple issues. Would it be possible to merge the second query with the first query the provide the count results of the hour to the full 24 hours of the day?
Thanks
First, the hours of the day go from 0 to 23, not 1 to 24.
Second, you are complicating simple things. What's wrong with
select to_char(a.time, 'HH24') theHour
, count(*) occurrences
from yourTable a
where a.time >= date '2018-01-05'
and a.time < date '2018-01-06'
group by to_char(a.time, 'HH24')
Start by learning to use the DATE and TIMESTAMP keywords. Much simpler for inputting unambiguous timestamps:
WHERE a.TIME >= TIMESTAMP '2018-01-05 01:00:01' AND
a.TIME <= TIMESTAMP '2018-01-05 12:59:59'
Written like this, it is much clearer that you are only choosing hours between 1 and 12, which is why you are only getting those hours. Change the WHERE conditions and you might get additional hours.