Big query, how to split TIMESTAMP data type column - google-bigquery

I am trying to split a column that contains TIMESTAMP data type into two separate columns DATE and TIME. Because I am trying to use WHERE clause with condition where time is MORE THAN 2MINUTES in my case : WHERE ride_length > to_timestamp'00:02:00' and is not working.

You can use EXTRACT to get date, time and minutes from a timestamp.
Example:
WITH table1 AS
(
SELECT TIMESTAMP("2022-06-27 10:00:00") AS dt
UNION ALL
SELECT TIMESTAMP("2022-06-27 12:03:00") AS dt
)
SELECT
EXTRACT(DATE FROM dt) AS date,
EXTRACT(TIME FROM dt) AS time
FROM table1
WHERE EXTRACT(MINUTE FROM dt) > 2

Related

Select Max difference between two dates in the same column

I have a table of accident date I want to calculate the maximum between the difference of date i and date i + 1 which are in the same column. when we declare an accident date, I want to find the record of days without accidents.
You can use lag(). Assuming a table structure like mytable(dt), where dt is of a date-like datatype, you would do:
select max(diff)
from (select dt - lag(dt) over(order by dt) diff from mytable) t

Appending the result query in bigquery

I am doing a query where the query will append the data from previous date as the outcome in BigQuery.
So, the result data for today will be higher than yesterdays as the data is appending by days.
So far, what I only managed to get the outcome is the data by days (where you can see the number of ID declining and is not appending from previous day) as this result:
What should I do to add appending function in the query so each day will get the result of data from the previous day in bigquery?
code:
WITH
table1 AS (
SELECT
ID,
...
FROM t
WHERE DATE_SUB('2020-01-31', INTERVAL 31 DAY) and '2020-01-31'
),
table2 AS (
SELECT
ID,
COUNTIF((rating < 7) as bad,
COUNTIF((rating >= 7 AND SAFE_CAST(NPS_Rating as INT64) < 9) as intermediate,
COUNTIF((rating as good
FROM
t
WHERE DATE_SUB('2020-01-31', INTERVAL 31 DAY) and '2020-01-31'
)
SELECT
DATE_SUB('2020-01-31', INTERVAL 31 DAY) as date,
*
FROM table1
FULL OUTER JOIN table2 USING (ID)
If you have counts that you want to accumulate, then you want a cumulative sum. The query would look something like this:
select datecol, count(*), sum(count(*)) over (order by datecol)
from t
group by datecol
order by datecol;

adding all columns from mutiple tables

I have a simple question.
I need to count all records from multiple tables with day and hour and add all of them together in a single final table.
So the query for each tab is something like this
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_1
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_2
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_3
and so on so forth
I would like to combine all the results showing number of total records for each day and hour from these tables.
Expected results will be like this
date, hour, number of records of table 1, number of records of table 2, number of records of table 3 ........
What would the most optimum SQL query for this?
Probably the simplest way is to union them together and aggregation:
select timestamp_trunc(timestamp, hour) as hh,
countif(which = 1) as num_1,
countif(which = 2) as num_2
from ((select timestamp, 1 as which
from table_1
) union all
(select timestamp, 2 as which
from table_2
) union all
. . .
) t
group hh
order by hh;
You are using timestamp_trunc(). It returns a timestamp truncated to the hour -- there is no need to also include the date.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
TIMESTAMP_TRUNC(TIMESTAMP, DAY) day,
EXTRACT(HOUR FROM TIMESTAMP) hour,
COUNT(*) cnt,
_TABLE_SUFFIX AS table
FROM `project.dataset.table_*`
GROUP BY day, hour, table

map/sample timeseries data to another timeserie db2

I am trying to combine the results of two SQL (DB2 on IBM bluemix) queries:
The first query creates a timeserie from startdate to enddate:
with dummy(minute) as (
select TIMESTAMP('2017-01-01')
from SYSIBM.SYSDUMMY1 union all
select minute + 1 MINUTES
from dummy
where minute <= TIMESTAMP('2018-01-01')
)
select to_char(minute, 'DD.MM.YYYY HH24:MI') AS minute
from dummy;
The second query selects data from a table which have a timestamp. This data should be joined to the generated timeseries above. The standalone query is like:
SELECT DISTINCT
to_char(date_trunc('minute', TIMESTAMP), 'DD.MM.YYYY HH24:MI') AS minute,
VALUE AS running_ct
FROM TEST
WHERE ID = 'abc'
AND NAME = 'sensor'
ORDER BY minute ASC;
What I suppose to get is a query with one result with contains of two columns:
first column with the timestamp from startdate to enddate and
the second with values which are sorted by there own timestamps to the
first column (empty timestamps=null).
How could I do that?
A better solution, especially if your detail table is large, is to generate a range. This allows the optimizer to use indices to fulfill the bucketing, instead of calling a function on every row (which is expensive).
So something like this:
WITH dummy(temporaer, rangeEnd) AS (SELECT a, a + 1 MINUTE
FROM (VALUES(TIMESTAMP('2017-12-01'))) D(a)
UNION ALL
SELECT rangeEnd, rangeEnd + 1 MINUTE
FROM dummy
WHERE rangeEnd < TIMESTAMP('2018-01-31'))
SELECT Dummy.temporaer, AVG(Test.value) AS TEXT
FROM Dummy
LEFT OUTER JOIN Test
ON Test.timestamp >= Dummy.temporaer
AND Test.timestamp < Dummy.rangeEnd
AND Test.id = 'abc'
AND Test.name = 'text'
GROUP BY Dummy.temporaer
ORDER BY Dummy.temporaer ASC;
Note that the end of the range is now exclusive, not inclusive like you had it before: you were including the very first minute of '2018-01-31', which is probably not what you wanted. Of course, excluding just the last day of a month also strikes me as a little strange - you most likely really want < TIMESTAMP('2018-02-01').
found a working solution:
with dummy(temporaer) as (
select TIMESTAMP('2017-12-01') from SYSIBM.SYSDUMMY1
union all
select temporaer + 1 MINUTES from dummy where temporaer <= TIMESTAMP('2018-01-31'))
select temporaer, avg(VALUE) as text from dummy
LEFT OUTER JOIN TEST ON temporaer=date_trunc('minute', TIMESTAMP) and ID='abc' and NAME='text'
group by temporaer
ORDER BY temporaer ASC;
cheers

SQL Statement Only latest entry of the day

seems it is too long ago that I needed create own SQL Statements. I have a table (GAS_COUNTER) with timestamps (TS) and values (VALUE).
There are hundreds of entries per day, but I only need the latest of the day. I tried different ways but never get what I need.
Edit
Thanks for the fast replies, but some do not meet my needs (I need the latest value of each day in the table) and some don't work. My best own statement was:
select distinct (COUNT),
from
(select
extract (DAY_OF_YEAR from TS) as COUNT,
extract (YEAR from TS) as YEAR,
extract (MONTH from TS) as MONTH,
extract (DAY from TS) as DAY,
VALUE as VALUE
from GAS_COUNTER
order by COUNT)
but the value is missing. If I put it in the first select all rows return. (logical correct as every line is distinct)
Here an example of the Table content:
TS VALUE
2015-07-25 08:47:12.663 0.0
2015-07-25 22:50:52.155 2.269999999552965
2015-08-10 11:18:07.667 52.81999999284744
2015-08-10 20:29:20.875 53.27999997138977
2015-08-11 10:27:21.49 54.439999997615814
2nd Edit and solution
select TS, VALUE from GAS_COUNTER
where TS in (
select max(TS) from GAS_COUNTER group by extract(DAY_OF_YEAR from TS)
)
This one would give you the very last record:
select top 1 * from GAS_COUNTER order by TS desc
Here is one that would give you last records for every day:
select VALUE from GAS_COUNTER
where TS in (
select max(TS) from GAS_COUNTER group by to_date(TS,'yyyy-mm-dd')
)
Depending on the database you are using you might need to replace/adjust to_date(TS,'yyyy-mm-dd') function. Basically it should extract date-only part from the timestamp.
Select the max value for the timestamp.
select MAX(TS), value -- or whatever other columns you want from the record
from GAS_COUNTER
group by value
Something like this would window the data and give you the last value on the day - but what happens if you get two TS the same? Which one do you want?
select *
from ( select distinct cast( TS as date ) as dt
from GAS_COUNTER ) as gc1 -- distinct days
cross apply (
select top 1 VALUE -- last value on the date.
from GAS_COUNTER as gc2
where gc2.TS < dateadd( day, 1, gc1.dt )
and gc2.TS >= gc1.dt
order by gc2.TS desc
) as x