date_format() equivalent in BigQuery - sql

I have a date column in my table that contains a string in the format 2017-08-05 09-AM, and I'm trying to format it so that there is a column Date with a date type and column Time with a time type.
SELECT ID, DATE_FORMAT(a.date, "%Y-%d-%m") as date, DATE_FORMAT(a.date, "%T") as time, Symbol
FROM `crypto_market_data.BTC_1H` a
ORDER BY ID
The query runs how I want it to in MySQL, but date_format() is not supported in BigQuery. I'm wondering if there is a similar way to cast my string date to a separate Date and Time object.

Below is for BigQuery Standard SQL
#standardSQL
SELECT id, Symbol,
DATE(PARSE_DATETIME('%Y-%m-%d %H-%p', a.date)) AS `date`,
TIME(PARSE_DATETIME('%Y-%m-%d %H-%p', a.date)) AS time
FROM `project.dataset.table` a
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '2017-08-05 09-AM' `date`, 'x' Symbol UNION ALL
SELECT 2, '2019-02-05 12-AM', 'y' UNION ALL
SELECT 3, '2019-01-31 11-PM', 'z'
)
SELECT id, Symbol,
DATE(PARSE_DATETIME('%Y-%m-%d %H-%p', a.date)) AS `date`,
TIME(PARSE_DATETIME('%Y-%m-%d %H-%p', a.date)) AS time
FROM `project.dataset.table` a
-- ORDER BY id
with result
Row id Symbol date time
1 1 x 2017-08-05 09:00:00
2 2 y 2019-02-05 12:00:00
3 3 z 2019-01-31 11:00:00

There is a PARSE_DATETIME function available for parsing custom datetime formats in BigQuery.
In your case, this should help:
select ID, extract(date from dt) as date, extract(time from dt) as time, Symbol from (
select
a.ID as ID,
parse_datetime('%Y-%d-%m %H-%p', a.date) as dt,
a.Symbol as Symbol
from
`crypto_market_data.BTC_1H` a
order by a.ID
)

SELECT ID, FORMAT_DATE("%Y-%d-%m",a.date) as date, FORMAT_DATE("%T",a.date) as time, Symbol FROM crypto_market_data.BTC_1H a ORDER BY ID
Bigquery uses the FORMAT_DATE For more info, follow this link BIG QUERY DATE FUNCTIONS

Related

Big query, how to split TIMESTAMP data type column

I am trying to split a column that contains TIMESTAMP data type into two separate columns DATE and TIME. Because I am trying to use WHERE clause with condition where time is MORE THAN 2MINUTES in my case : WHERE ride_length > to_timestamp'00:02:00' and is not working.
You can use EXTRACT to get date, time and minutes from a timestamp.
Example:
WITH table1 AS
(
SELECT TIMESTAMP("2022-06-27 10:00:00") AS dt
UNION ALL
SELECT TIMESTAMP("2022-06-27 12:03:00") AS dt
)
SELECT
EXTRACT(DATE FROM dt) AS date,
EXTRACT(TIME FROM dt) AS time
FROM table1
WHERE EXTRACT(MINUTE FROM dt) > 2

get count all with groupby timestamp into hourly intervals

I have a hive table that has a timestamp in string format as below,
20190516093836, 20190304125015, 20181115101358
I want to get row count with an aggregate timestamp into hourly as below
date_time count
-----------------------------
2019:05:16: 00:00:00 23
2019:05:16: 01:00:00 64
I followed several links like this but was unable to generate the desired results yet.
This is my final query:
SELECT
DATE_PART('day', b.date_time) AS date_prt,
DATE_PART('hour', b.date_time) AS hour_prt,
COUNT(*)
FROM
(SELECT
from_unixtime(unix_timestamp(`timestamp`, "yyyyMMddHHmmss")) AS date_time
FROM table_name
WHERE from_unixtime(unix_timestamp(`timestamp`, "yyyyMMddHHmmss"))
BETWEEN '2018-12-10 07:02:30' AND '2018-12-12 08:02:30') b
GROUP BY
date_prt, hour_prt
I hope for some guidance from you, thanks in advance
You can extract date_time already in required format 'yyyy-MM-dd HH:00:00'. I prefer using regexp_replace:
SELECT
date_time,
COUNT(*) as `count`
FROM
(SELECT
regexp_replace(`timestamp`, '^(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})$','$1-$2-$3 $4:00:00') AS date_time
FROM table_name
WHERE regexp_replace(`timestamp`, '^(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})$','$1-$2-$3 $4:$5:$6')
BETWEEN '2018-12-10 07:02:30' AND '2018-12-12 08:02:30') b
GROUP BY
date_time
This will also work:
from_unixtime(unix_timestamp('20190516093836', "yyyyMMddHHmmss"),'yyyy-MM-dd HH:00:00') AS date_time

How to filter date with where on SQLite with '2021-07-31 13:53:26' format?

I wanted to take just the year and month from '2021-07-31 13:53:26' and group them based on count values.
i tried the date, datetime, strftime functions.
Date and Datetime resulting null. strftime result something, but i cant group the Year and Month i get with the count i want, resulting null again
Here is the preview of the data.
expected result example is like '2021-07' with the count of how many times this year and month occurs
This is the syntax i tried with strftime:
select strftime('%Y%m', started_at) year_month, count(year_month) from bike_trip
group by year_month
Thank You
Sqlite doesn't have a date data type so you will need to do string comparison to achieve this.
with d as (
select '2021-07-31 13:53:26' as d, 'A' val union all
select '2021-08-30 13:53:26' as d, 'B' val
)
select substr(d,1,4) as yyyy, substr(d,6,2) as mm, count(*)
from d
group by substr(d,1,4), substr(d,6,2)
in your query:
select substr(started_at,1,4) as yyyy, substr(started_at,6,2) as mm, count(*)
from bike_trip
group by substr(started_at,1,4), substr(started_at,6,2)
Use a CTE to get your answer.
with
-- uncomment to test
/*bike_trip(started_at) as (
values
('2021-07-31 13:53:26'),
('2021-07-17 19:06:01'),
('2021-08-30 13:53:26')
),*/
bike_months(year_month) as (
select strftime('%Y-%m', started_at) year_month from bike_trip
)
select year_month, count(year_month) count_year_month from bike_months
group by year_month;
Output:
year_month|count_year_month
2021-07|2
2021-08|1

ORACLE SQL: Hourly Date to be group by day time and sum of the amount

I have the following situation:
ID DATE_TIME AMOUNT
23 14-MAY-2021 10:47:01 5
23 14-MAY-2021 11:49:52 3
23 14-MAY-2021 12:03:18 4
How can get the sum of the amount and take the DATE by day not hourly?
Example:
ID DATE_TIME TOTAL
23 20210514 12
I tried this way but i got error:
SELECT DISTINCT ID, TO_CHAR(DATE_TIME, 'YYYYMMDD'), SUM(AMOUNT) AS TOTAL FROM MY_TABLE
WHERE ID ='23' AND DATE_TIME > SYSDATE-1
GROUP BY TOTAL, DATE_TIME
You don't need DISTINCT if you use GROUP BY - anything that is grouped must be distinct unless it joined to something else later on that caused it to repeat again
You were almost there too
SELECT ID, TO_CHAR(DATE_TIME, 'YYYYMMDD') AS DATE_TIME, SUM(AMOUNT) AS TOTAL
FROM MY_TABLE
WHERE ID ='23' AND DATE_TIME > SYSDATE-1
GROUP BY ID, TO_CHAR(DATE_TIME, 'YYYYMMDD')
You need to group by the output of the function, not the input. Not every database can GROUP BY aliases used in the select (technically the SELECT hasn't been done by the time the GROUP is done so the aliases don't exist yet, and you wouldnt group by the total because that's an aggregate (the result of summing up every various value in the group)
If you need to do further work with that date, don't convert it to a string.. Cut the time off using TRUNC:
SELECT ID, TRUNC(DATE_TIME) as DATE_TIME, SUM(AMOUNT) AS TOTAL
FROM MY_TABLE
WHERE ID ='23' AND DATE_TIME > SYSDATE-1
GROUP BY ID, TRUNC(DATE_TIME)
TRUNC can cut a date down to other parts, for example TRUNC(DATE_TIME, 'HH24') will remove the minutes and seconds but leave the hours
Convert the DATE column to a string with the required accuracy and then group on that:
SELECT ID,
TO_CHAR("DATE", 'YYYY-MM-DD'),
SUM(AMOUNT) AS TOTAL FROM MY_TABLE
WHERE ID ='23'
AND "DATE" > SYSDATE-1
GROUP BY ID, TO_CHAR("DATE", 'YYYY-MM-DD')
or truncate the value so that the time component is set to midnight for each date:
SELECT ID,
TRUNC("DATE"),
SUM(AMOUNT) AS TOTAL FROM MY_TABLE
WHERE ID ='23'
AND "DATE" > SYSDATE-1
GROUP BY ID, TRUNC("DATE")
(Note: DATE is a keyword and cannot be used as an identifier unless you use a quoted-identifier; and you would need to use the quotes, and the exact case, everytime you refer to the column. You would be better to rename the column to something else that is not a keyword.)

Compare array of datetime objects and pick all rows where difference between each and the next is less than 7 days

My table looks like this:
(can't post images yet)
I want to select all names from my table where the time difference between each of the datetime objects and the next is always more than 7 days.
So from the above I would get only Paul, since Adam's first two times are already only a day apart.
The best I can come up with is to get the time difference between the smallest and largest datetime in the array and then divide by array_length(datetime). So basically the average time all datetime objects, but that's not helping me.
I'm using Standard SQL on BigQuery
SELECT name
FROM dataset.table
WHERE NOT EXISTS(
SELECT 1 FROM UNNEST(datetime) AS dt WITH OFFSET off
WHERE DATETIME_DIFF(
datetime[SAFE_OFFSET(off - 1)], dt, DAY
) <= 7
)
This compares each entry in the array with the one after it, looking for any where the number of days is 7 or less.
You can use unnest():
select t.*
from t
where not exists (select 1
from (select dt, lag(dt) over (order by dt) as prev_dt
from unnest(datetime) dt
) x
where dt < datetime_add(prev_dt, interval 7 day
);
It is still not clear what exactly the schema of your data: based on layout - it looks like datetime is an array, but based on data type you show in the image - it could be just regular field, so below cover both cases (for BigQuery Standard SQL)
Case 1 - repeated field
#standardSQL
SELECT name
FROM `project.dataset.table`
WHERE 7 < (
SELECT DATETIME_DIFF(
datetime,
LAG(datetime) OVER(PARTITION BY name ORDER BY datetime),
DAY) distance
FROM UNNEST(datetime) datetime
ORDER BY IFNULL(distance, 777)
LIMIT 1
)
you can test, play with it using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Adam' name,
[DATETIME '2018-07-26T17:55:03',
'2018-07-27T17:55:03',
'2018-06-29T17:55:03',
'2018-07-16T17:55:03',
'2018-08-19T17:55:03',
'2018-07-14T17:55:03'] datetime UNION ALL
SELECT 'Paul', [DATETIME '2018-08-26T17:55:03',
'2018-08-18T17:55:03',
'2018-06-20T17:55:03',
'2018-08-09T17:55:03',
'2018-07-16T17:55:03']
)
SELECT name
FROM `project.dataset.table`
WHERE 7 < (
SELECT DATETIME_DIFF(
datetime,
LAG(datetime) OVER(PARTITION BY name ORDER BY datetime),
DAY) distance
FROM UNNEST(datetime) datetime
ORDER BY IFNULL(distance, 777)
LIMIT 1
)
Case 2 - regular (not repeated field)
#standardSQL
SELECT name FROM (
SELECT name,
DATETIME_DIFF(
datetime,
LAG(datetime) OVER(PARTITION BY name ORDER BY datetime),
DAY
) distance
FROM `project.dataset.table`
)
GROUP BY name
HAVING MIN(distance) > 7
Dummy data example below:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Adam' name, DATETIME '2018-07-26T17:55:03' datetime UNION ALL
SELECT 'Adam', '2018-07-27T17:55:03' UNION ALL
SELECT 'Adam', '2018-06-29T17:55:03' UNION ALL
SELECT 'Adam', '2018-07-16T17:55:03' UNION ALL
SELECT 'Adam', '2018-08-19T17:55:03' UNION ALL
SELECT 'Adam', '2018-07-14T17:55:03' UNION ALL
SELECT 'Paul', '2018-08-26T17:55:03' UNION ALL
SELECT 'Paul', '2018-08-18T17:55:03' UNION ALL
SELECT 'Paul', '2018-06-20T17:55:03' UNION ALL
SELECT 'Paul', '2018-08-09T17:55:03' UNION ALL
SELECT 'Paul', '2018-07-16T17:55:03'
)
SELECT name FROM (
SELECT name,
DATETIME_DIFF(
datetime,
LAG(datetime) OVER(PARTITION BY name ORDER BY datetime),
DAY
) distance
FROM `project.dataset.table`
)
GROUP BY name
HAVING MIN(distance) > 7
both return same result
Row name
1 Paul