How to run BigQuery for Selection of last month Records? - google-bigquery

I am trying to get records from BigQuery Table for last month. I found that my column is in TimeStamp format, that's why it is giving an error.
No matching signature for operator BETWEEN for argument types: TIMESTAMP, DATE, DATE. Supported signature: (ANY) BETWEEN (ANY) AND (ANY) at [4:21]
Table Structure
Query
SELECT user_mobile,count(*) as total_customer FROM `Project.Dataset.Table` cr
where cr.DATE BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH) AND CURRENT_DATE()
group by user_mobile
having count(*) >=1;
Please guide how can I use timestamps in my query to get my required results. Thank you.

Use CURRENT_TIMESTAMP() and TIMESTAMP_SUB() instead:
SELECT user_mobile,count(*) as total_customer
FROM `Project.Dataset.Table` cr
where cr.created_at BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND CURRENT_TIMESTAMP()
group by user_mobile
having count(*) >=1;
or convert created_at to date with DATE(cr.created_at)

Related

No matching signature for function DATE for argument types

There is a data column in my dataset called "date". The values look like this:
"2022-07-23 04:16:51 UTC"
I am trying to select rows from my table like this:
SELECT
date
type,
mid,
wikipediaUrl,
numMentions,
avgSalience
FROM
myTable,
UNNEST(entities)
WHERE type = "LOCATION" AND score < -0.1 AND (date BETWEEN DATE(current_date(), INTERVAL 40 DAY) AND current_date())
However, I get an error on the between function:
No matching signature for function DATE for argument types: DATE, INTERVAL. Supported signatures: DATE(TIMESTAMP, [STRING]); DATE(DATETIME); DATE(INT64, INT64, INT64); DATE(DATE); DATE(STRING) at [16:60]
What am I doing wrong?
This value "2022-07-23 04:16:51 UTC" is not a DATE it is a TIMESTAMP which is the issue you are seeing.
So in your where clause you should be doing something like this instead:
DATE(date) BETWEEN date_sub(current_date(), INTERVAL 40 DAY) and current_date
If I modify your query to be 60 days, so the timestamp provided fits in the between statement, like:
select sample_data.timestamp_value
,date(timestamp_value) date_value
from sample_data
where date(timestamp_value) between date_sub(current_date(), INTERVAL 60 DAY) and current_date
it returns the following

Retrieve mean time between failure within given date range parameter

Below is what i have tried,
select machine_id, count(incident_id) "No_Incident",
(fail_date BETWEEN (24 * to_date('&From_date_', 'DDMMYYYY') AND to_date('&To_Date','DDMMYYYY') / count(incident_id))) "MTBF"
from mytable;
This will work for you:
SELECT machine_id, count(incident_id) "No_Incident",
MAX(ROUND((ABS((TO_DATE('01-02-2019','DD-MM-YYYY') -
TO_DATE('03-02-2019'||' 23:59:59','DD-MM-YYYY HH24:MI:SS'))*24))))
/count(incident_id) AS MTBF
FROM
mytable;
You may consider the following examples:
since machine_id is not being aggregated (eg. being counted), I have used the group by to provide the Mean Time Between failure (MTBF) for each machine. If you would like for all machines, simple remove machine_id from the SELECT clause and the GROUP BY machine_id
You had a syntax error with your query as you were querying the range of dates using the between in your select clause. NB. I have modified this and placed it in a where clause. Based on how you execute your query (eq sql clients), how you may handle parameters may differ.
Using Oracle
SELECT
machine_id,
count(incident_id) as "No_Incidents",
(
EXTRACT(
HOUR FROM
CAST(MAX(fail_date) AS TIMESTAMP) - CAST(MIN(fail_date) AS TIMESTAMP)
) +
EXTRACT(
DAY FROM
CAST(MAX(fail_date) AS TIMESTAMP) - CAST(MIN(fail_date) AS TIMESTAMP)
) * 24
)/count(incident_id) as "MTBF"
FROM
mytable
WHERE
fail_date BETWEEN to_date('&From_date_', 'DDMMYYYY') AND to_date('&To_Date','DDMMYYYY')
GROUP BY
machine_id
In the oracle example, I converted the dates using CAST to a TIMESTAMP before finding the difference (date_max - date_min gives a date interval ). Extracting (using EXTRACT) and summing the hours (hours and days*24) gives the total hours.
Using MySQL
SELECT
machine_id,
count(incident_id) as "No_Incidents",
(
TIMESTAMPDIFF(HOUR,min(fail_date),max(fail_date))
)/count(incident_id) as "MTBF"
FROM
mytable
WHERE
fail_date BETWEEN to_date('&From_date_', 'DDMMYYYY') AND to_date('&To_Date','DDMMYYYY')
GROUP BY
machine_id
I have also used the TIMESTAMPDIFF in MYSQL function to determine the difference between the dates in hours.

How to Calculate avg no of records added per day in BigQuery.?

I have a table in BigQuery having a column Published_date with a datatype of "Timestamp". I want to calculate avg no of rows added per day (for a specific month) in that table. I have the following query
SELECT AVG(Num_Rows)
FROM (SELECT [Day]=DAY( Published_Date ), Num_Rows=COUNT(*)
FROM `mytable`
WHERE Published_Date BETWEEN '20190729' AND '20190729 '
GROUP BY DAY( Published_Date ) ) AS Z
But its generating the following error
Could not cast literal "20190729" to type TIMESTAMP
How should I deal with timestamp because I only need the date from timestamp column?
I want to calculate avg no of rows added per day (for a specific month) in that table
Below example for BigQuery Standard SQL
#standardSQL
SELECT AVG(Num_Rows) AS avg_rows_per_day
FROM (
SELECT DATE(Published_Date) AS day, COUNT(*) AS Num_Rows
FROM `project.dataset.mytable`
WHERE DATE(Published_Date) BETWEEN '2019-07-01' AND '2019-07-31'
GROUP BY day
)
Use explicit conversion:
WHERE Published_Date BETWEEN TIMESTAMP('2019-07-29') AND TIMESTAMP('2019-07-29')
Note that you have a column called "_date", but the error is saying that the value is a timestamp. I find this confusing. We use a convention of using _ts in columns that are timestamps (and _dt for datetime and _date for date).
Why is this important? The timestamp is UTC. So you might need to be careful about timezones and time components -- which is not obvious in a column called Publish_Date.

How can I extract just the hour of a timestamp using standardSQL

How can I extract just the hour of a timestamp using standardSQL.
I've tried everything and no function works. The problem is that I have to extract the time from a column and this column is in the following format:2018-07-09T02:40:23.652Z
If I just put the date, it works, but if I put the column it gives the error below:
Syntax error: Expected ")" but got identifier "searchIntention" at [4:32]
Follow the query below:
#standardSQL
select TOTAL, dia, hora FROM
(SELECT cast(replace(replace(searchIntention.createdDate,'T',' '),'Z','')as
DateTime) AS DIA,
FORMAT_DATETIME("%k", DATETIME searchIntention.createdDate) as HORA,
count(searchintention.id) as Total
from `searchs.searchs2016626`
GROUP BY DIA)
Please, help me. :(
How can I extract just the hour of a timestamp using standardSQL?
Below is for BigQuery Standard SQL
You can use EXTRACT(HOUR FROM yourTimeStampColumn)
for example:
SELECT EXTRACT(HOUR FROM CURRENT_TIMESTAMP())
or
SELECT EXTRACT(HOUR FROM TIMESTAMP '2018-07-09T02:40:23.652Z')
or
SELECT EXTRACT(HOUR FROM TIMESTAMP('2018-07-09T02:40:23.652Z'))
In BigQuery Standard SQL, you can use the EXTRACT timestamp function in order to return an INT64 value corresponding to the part of the timestamp that you want to retrieve, like.
The available parts includes a full list that you can check in the documentation page linked, but in your use case you can directly refer to the HOUR operator in order to retrieve the INT64 representation of the hour value in a field of TIMESTAMP type.
#standardSQL
# Create a table
WITH table AS (
SELECT TIMESTAMP("2018-07-09T02:40:23.652Z") time
)
# Extract values from a Timestamp expression
SELECT
EXTRACT(DAY FROM time) as day,
EXTRACT(MONTH FROM time) as month,
EXTRACT(YEAR FROM time) as year,
EXTRACT(HOUR FROM time) AS hour,
EXTRACT(MINUTE FROM time) as minute,
EXTRACT(SECOND from time) as second
FROM
table

PostgreSQL: SELECT * from table WHERE timestamp IS WITHIN THIS MONTH

For some reason I'm kind of lost on how to archive:
SELECT * FROM table WHERE timestamp IS WITHIN THIS MONTH;
I've looked at https://www.postgresql.org/docs/9.4/static/functions-datetime.html, but are only able to select X days backwards.
I'm running PostgreSQL 9.4
... WHERE date_trunc('month', timestamp)
= date_trunc('month', current_timestamp);
Alternatively:
... WHERE timestamp >= date_trunc('month', current_timestamp)
AND timestamp < date_trunc('month', current_timestamp) + INTERVAL '1 month';
The second version can use an index on timestamp, the first would need one on the expression date_trunc('month', timestamp).
Why don't you just filter the month with between ?
Pass the start of this month as variable1, and the end of this month as variable2...
SELECT * FROM table WHERE
timestamp >= __month_start AND timestamp < __next_month_start
e.g.
SELECT * FROM table
WHERE
(
timestamp >= '20170701'::timestamp
AND
timestamp < '20170801'::timestamp
)
Unlike using functions in the where-clause, this maintains sargability.
What Laurenz Albe suggested will work, however you're going to have a performance penalty because you'll lose cardinality on that field, you either have to index expression you're going to query (Apparently PostgreSQL allows to do that: https://www.postgresql.org/docs/current/static/indexes-expressional.html) or create a separate column to store yyyy-mm values and query it.