Get data that is no more than an hour old in BigQuery - google-bigquery

Trying to use the statement:
SELECT *
FROM data.example
WHERE TIMESTAMP(timeCollected) < DATE_ADD(USEC_TO_TIMESTAMP(NOW()), 60, 'MINUTE')
to get data from my bigquery data. It seems to return same set of result even when time is not within the range. timeCollected is of the format 2015-10-29 16:05:06.
I'm trying to build a query that is meant to return is data that is not older than an hour. So data collected within the last hour should be returned, the rest should be ignored.

Using Standard SQL:
SELECT * FROM data
WHERE timestamp > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -60 MINUTE)

The query you made means "return to me anything that has a collection time smaller than an hour in the future" which will literally mean your whole table. You want the following (from what I got through your comment, at least) :
SELECT *
FROM data.example
WHERE TIMESTAMP(timeCollected) > DATE_ADD(USEC_TO_TIMESTAMP(NOW()), -60, 'MINUTE')
This means that any timeCollected that is NOT greater than an hour ago will not be returned. I believe this is what you want.
Also, unless you need it, Select * is not ideal in BigQuery. Since the data is saved by column, you can save money by selecting only what you need down the line. I don't know your use case, so * may be warranted though

To get table data collected within the last hour:
SELECT * FROM [data.example#-3600000--1]
https://cloud.google.com/bigquery/table-decorators

Using Standard SQL:
SELECT * FROM data WHERE timestamp > **TIMESTAMP_SUB**(CURRENT_TIMESTAMP(), INTERVAL 60 MINUTE)

Related

Get data when date is equal to or greater than 90 days ago

I wonder if anyone here can help with a BigQuery piece I am working on.
I'm trying to pull the date, email and last interaction time from a dataset when the last interaction time is equal to or greater than 90 days ago.
I have the following query:
SELECT
date,
user_email,
DATE_FROM_UNIX_DATE(gmail.last_interaction_time) AS Last_Interaction_Date,
DATE_ADD(CURRENT_DATE(), INTERVAL -90 DAY) AS Days_ago
FROM
`bqadminreporting.adminlogtracking.usage`
WHERE
'Last_Interaction_Date' >= 'Days_ago'
However, I run into the following error:
DATE value is out of allowed range: from 0001-01-01 to 9999-12-31
As far as I can see, it makes sense - so not entirely sure why its throwing out an error?
Looks like you have some inconsistent values (data) in filed gmail.last_interaction_time, which you need to handle to avoid error.
Moreover above query will not work as per your expected WHERE conditions, you should use following query to get expected output.
SELECT * FROM
(SELECT
date,
user_email,
DATE_FROM_UNIX_DATE(gmail.last_interaction_time) AS Last_Interaction_Date,
DATE_ADD(CURRENT_DATE(), INTERVAL -90 DAY) AS Days_ago
FROM
`bqadminreporting.adminlogtracking.usage`)
WHERE
Last_Interaction_Date >= Days_ago
Presumably, your problem is DATE_FROM_UNIX_DATE(). Without sample data, it is not really possible to determine what the issue is.
However, you don't need to convert to a date to do this. You can do all the work in the Unix seconds space:
select u.*
from `bqadminreporting.adminlogtracking.usage` u
where gmail.last_interaction_time >= unix_seconds(timestamp(current_date)) - 90 * 60 * 60 * 24
Note that I suspect that the issue is that last_interaction_time is really measured in milliseconds or microseconds or some other unit. This will prevent your error, but it might not do what you want.

How to select data but without similar times?

I have a table with create_dt times and i need to get records but without the datas that have similar create_dt time (15 minutes).
So i need to get only one record instead od two records if the create_dt is in 15 minutes of the first one.
Format of the date and time is '(29.03.2019 00:00:00','DD.MM.YYYY HH24:MI:SS'). Thanks
It's a bit unclear what exactly you want, but one thing I can think of, is to round all values to the nearest "15 minute" and then only pick one row from those "15 minute" intervals:
with rounded as (
select create_dt,
date '0001-01-01' + (round((cast(create_dt as date) - date '0001-01-01') * 24 * 60 / 15) * 15 / 60 / 24) as rounded,
... other columns ....
from your_table
), numbered as (
select create_dt,
rounded,
row_number() over (partition by rounded order by create_dt) as rn
... other columns ....
from rounded
)
select *
from numbered
where rn = 1;
The expression date '0001-01-01' + (round((cast(create_dt as date) - date '0001-01-01') * 24 * 60 / 15) * 15 / 60 / 24) will return create_dt rounded up or down to the next "15 minutes" interval.
The row_number() then assigns unique numbers for each distinct 15 minutes interval and the final select then always picks the first row for that interval.
Online example: https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=e6c7ea651c26a6f07ccb961185652de7
I'm going to walk you through this conceptually. First of all, there's a difficulty in doing this that you might not have noticed.
Let's say you wanted one record from the same hour or day. But if there are two record created on the same day, you only want one in your results. Which one?
I mention this because to the designers of SQL, there is not a single answer that they can provide SQL to pick. Then cannot show data from both records without both records being in the tabular output.
This is a common problem, but when the designers of SQL provided a feature to handle it, it can only work if there is no ambiguity of how to have one row of result for two records. That solution is GROUP BY, but it only works for showing the fields other than the timestamp if they are the same for all the records which match the time period. You have to include all the fields in your select clause and if multiple records in your time period are the same, they will create multiple records in your output. So although there is a tool GROUP BY for this problem, you might not be able to use it.
So here is the solution you want. If multiple records are close together, then don't include the records after the first one. So you want a WHERE clause which will exclude a record if another record recently proceeds it. So the test for each record in the result will involve other records in the table. You need to join the table to itself.
Let's say we have a table named error_events. If we get multiples of the same value in the field error_type very close to the time of other similar events, we only want to see the first one. The SQL will look something like this:
SELECT A.*
FROM error_events A
INNER JOIN error_events B ON A.error_type = B.error_type
WHERE ???
You will have to figure out the details of the WHERE clause, and the functions for the timestamp will depend you when RDBMS product you are using. (mysql and postgres for instance may work differently.)
You want only the records where there is no record which is earlier by less then 15 minutes. You do want the original record. That record will match itself in the join, but it will be the only record in the time period between its timestamp and 15 minutes prior.
So an example WHERE clause would be
WHERE B.create_dt BETWEEN [15 minutes before A.create_dt] and A.create_dt
GROUP BY A.*
HAVING 1 = COUNT(B.pkey)
Like we said, you will have to find out how your database product subtracts time, and how 15 minutes is represented in that difference.

Oddities with postgres SQL [negative date interval and alias that doesn't work only in condition clause]

I'm coming to you guys with with two small oddities I can't seem to understand with postgres:
(1)
SELECT "LASTREQUESTED",
(DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
ORDER BY "TIME"
I'm calculating when users can make their next request [once every 8 hours], but if you look at entry 16 I get "1 day -06:20:47" instead of "18:00:00" ish, unlike every other line. [The table LASTREQUESTED is a simple timestamp, nothing different here from the other entries for line 16], why is that?
(2)
On the same request, if I try to add a condition on the "TIME" column, the compiler says it doesn't exist although using it to order by is ok. I don't get why.
SELECT (DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
WHERE "TIME" > 0
ORDER BY "TIME";
Question #1: negative hours but positive days?
According to the PostgreSQL documentation, this is a situation where PostgreSQL differs from the SQL standard:
According to the SQL standard all fields of an interval value must have the same sign…. PostgreSQL allows the fields to have different signs….
Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. …
You can see a more extreme example of this with the following query:
=# select interval '1 day' - interval '300 hours';
?column?
------------------
1 day -300:00:00
(1 row)
So this is not a single interval in seconds expressed in a strange way; instead, it's an interval of 0 months, +1 day, and -1,080,000.0 seconds. If you are certain that there's no daylight savings time issues with the timestamps that you got these intervals from, you can use justify_hours to convert days into 24-hour periods and get an interval that makes more sense:
=# select justify_hours(interval '1 day' - interval '300 hours');
justify_hours
--------------------
-11 days -12:00:00
Question #2: SELECT columns can't be used in WHERE?
This is standard PostgreSQL behavior. See this duplicate question. Solutions presented there include:
Repeat the expression twice, once in the SELECT list, and again in the WHERE clause. (I've done this more times than I want to remember…)
SELECT (my - big * expression) AS x
FROM stuff
WHERE (my - big * expression) > 5
ORDER BY x
Create a subquery without that WHERE filter, and put the WHERE conditions in the outer query
SELECT *
FROM (SELECT (my - big * expression) AS x
FROM stuff) AS subquery
WHERE x > 5
ORDER BY x
Use a WITH statement to achieve something similar to the subquery trick.
I don't now exactly why it's calculating as-is (maybe because you subtract an Interval from another Interval) but when you change the calculation to Timestamp minus Timestamp it works as expected:
DATE_TRUNC('seconds', CURRENT_TIMESTAMP - (LASTREQUESTED + INTERVAL '8 hours'))
See Fiddle
Regarding #2: Based on Standard SQL the columns in the Select-list are calculated after FROM/WHERE/GROUP BY/HAVING, but before ORDER, that's why you can't use an alias in WHERE. There are some good articles on that topic written by Itzik Ben-Gan (based on MS SQL Server, but similar for PostgreSQL).

Date inside current timestamp - IBM DB2

I have a column (ROW_UPDATE_TIME) in a table where it stores the timestamp when an update happens in this table.
I'd like to know how to check rows that the timestamp is today.
This is what I'm using now, but it's not a pretty solution I think:
SELECT
*
FROM
TABLE
WHERE
ROW_UPDATE_TIME BETWEEN (CURRENT TIMESTAMP - 1 DAY) AND (CURRENT TIMESTAMP + 1 DAY);
Is there a better solution, example: ROW_UPDATE_TIME = CURRENT DATE, or something like that?
Found it:
SELECT
*
FROM
TABLE
WHERE
DATE(ROW_UPDATE_TIME) = CURRENT DATE;
The first version you have provided will not return you the results you expect, because you will get in the result timestamps from today or tomorrow, depends on the hour you run it.
Use the query below to get the results from today:
SELECT
*
FROM
table
WHERE
row_update_time
BETWEEN TIMESTAMP(CURRENT_DATE,'00:00:00')
AND TIMESTAMP(CURRENT_DATE,'23:59:59')
Avoid applying a function to a column you compare in the where clause(DATE(row_update_time) = CURRENT_DATE) . That will cause the optimizer to run the function against each row, just to allocate the data you need. It could slow down the query dramatically. Try to run explain against the two versions and you will see what I mean.

SQL Getting data by the hour

Hi I have a weather database in SQL Server 2008 that is filled with weather observations that are taken every 20 minutes. I want to get the weather records for each hour not every 20 minutes how can I filter out some the results so only the first observation for each hour is in the results.
Example:
7:00:00
7:20:00
7:40:00
8:00:00
Desired Output
7:00:00
8:00:00
To get exactly (less the fact that it's an INT instead of a TIME; nothing hard to fix) what you listed as your desired result,
SELECT DISTINCT DATEPART(HOUR, TimeStamp)
FROM Observations
You could also add in CAST(TimeStamp AS DATE) if you wanted that as well.
Assuming you want the data as well, however, it depends a little, but from exactly what you've described, the simple solution is just to say:
SELECT *
FROM Observations
WHERE DATEPART(MINUTE, TimeStamp) = 0
That fails if you have missing data, though, which is pretty common.
If you do have some hours where you want data but don't have a row at :00, you could do something like this:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY CAST(TimeStamp AS DATE), DATEPART(HOUR, TimeStamp) ORDER BY TimeStamp)
FROM Observations
)
SELECT *
FROM cte
WHERE n = 1
That'll take the first one for any date/hour combination.
Of course, you're still leaving out anything where you had no data for an entire hour. That would require a numbers table, if you even want to return those instances.
You can use a formula like the following one to get the nearest hour of a time point (in this case it's GETUTCDATE()).
SELECT DATEADD(MINUTE, DATEDIFF(MINUTE, 0, GETUTCDATE()) / 60 * 60, 0)
Then you can use this formula in the WHERE clause of your SQL query to get the data you want.
What you need is to GROUP BY your desired time frame, like the date and the hours. Then, you get the MIN value of the timeframe. Since you didn't specify which columns you are using, this is the most generic thing i can give.
Use as filter :
... where DATEPART(MINUTE, DateColumn) = 0
To filter the result for every whole hour, you can set your where clause to check for 00 minute since every whole hour is HH:00:00.
To get the minute part from a time-stamp, you can use DATEPART function.
SELECT *
FROM YOURTABLENAME
WHERE DATEPART(MINUTE, YOURDATEFIELDNAME) = 0
More information on datepart function can be found here: http://www.w3schools.com/sql/func_datepart.asp