I am using Big Query SQL and I can't use a couple of functions and more specifically WEEKNUM. Everytime I try to, it outputs unrecognized function.
WEEKNUM
During my search I found this that I think it meansI can't use derive functions, and I also think that part of it is the WEEKNUM. I could be totally wrong though.
So how can I use the WEEKNUM function or do what it does with another way?
Thanks
In Standard SQL, use EXTRACT():
select extract(week from current_date)
Your link to WEEKNUM is for Google Cloud DataPrep. The BigQuery Documentation for date functions does not use WEEKNUM, but allows similar functionality through EXTRACT or FORMAT_DATE.
Related
This question is mostly for an "Optimizing Code" kind of purpose.
So, in SQL, specifically Google BigQuery, there are 2 ways to transform a timestamp into a date or an hour. Using EXTRACT() or CAST().
There might be more ways to do so, but at least those are the ones I know of currently.
CAST() example:
SELECT
CAST(tb.timestamp_field AS DATE) AS date_field, COUNT(*)
FROM
database.table tb
GROUP BY
CAST(tb.timestamp_field AS DATE)
EXTRACT() example:
SELECT
EXTRACT(DATE FROM tb.timestamp_field) AS date_field, COUNT(*)
FROM
database.table tb
GROUP BY
EXTRACT(DATE FROM tb.timestamp_field)
Both methods work for what I'm trying to do, but I would like to know which one would be considered as a "best practice". Or maybe the whole questions could be silly, like asking which is better: "4+3-2" or "4-2+3". Which would be basically the same.
my two cents -
Cast - preferable. Because lot of other big data tools uses similar format so if you have to ever migrate to another big data, you can migrate smoothly.
Also, in your SQL, cast is a direct operation so i think this can be faster. I tested this using one record and this sql took 0.011 sec.
SELECT cast( TIMESTAMP "2018-12-25 05:30:00" as date)
Extract - The SQL you are using is not official - there is nothing like EXTRACT(DATE from timestamp_col). Good way is to use what #
Mikhail Berlyant mentioned. but your sql is working - so i think internally, google big query engine is converting the timestamp to date and the removing time part. So its a two part operation and heavily depends on internal conversion. A little unreliable i think. Also, i think you can run both your query and check performance because perf depends on lot of factors like - environment, amount of data, optimized table, etc.
Also, below SQL took like 0.012 sec. (not a great perf indicator though)
SELECT EXTRACT(DATE FROM TIMESTAMP "2018-12-25 05:30:00")
You can refer to below link for more on EXTRACT or DATE -
https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#extract
I'm working with Google Analytics data in a Google Big Query table using standard sql. I wish to select group data at a weekly level. I've found this solution:
SELECT FORMAT_TIMESTAMP("%W", PARSE_TIMESTAMP("%Y%m%d", '20150519'))
Outputs:
20
Is there a shorter method to get to this point? It seem quite long to me and I'm wondering if I'm missing a trick.
The only "improvement" I see can be done - is using PARSE_DATE instead of PARSE_TIMESTAMP
SELECT FORMAT_DATE("%W", PARSE_DATE("%Y%m%d", '20150519'))
I'm having a problem trying to query for 15 months worth of data.
I know about bigquery's wildcard functions, but I can't seem to get them to work with my tables.
For example, if my tables are called:
xxxx_201501,
xxxx_201502,
xxxx_201503,
...
xxxx_201606
How can I select everything from 201501 until today (current_timestamp)?
It seems that it's necessary to have the tables per day, am I wrong?
I've also read that you can use regex but can't find the way.
With Standard SQL, you can use a WHERE clause on a _TABLE_SUFFIX pseudo column as described here:
Is there an equivalent of table wildcard functions in BigQuery with standard SQL?
In this particular case, it would be:
SELECT ... from `mydataset.xxx_*` WHERE _TABLE_SUFFIX >= '201501';
This is a bit long for a comment.
If you are using the standard SQL dialect, then I don't think the functionality is yet implemented.
If you are using the legacy SQL dialect, then you can use a function such as TABLE_DATE_RANGE(). This and other table wildcard functions are well documented.
EDIT:
Oh, I see. The simplest way would be to store the tables as YYYYMM01 so you can use the range query.
But, you can also use table_query():
from table_query(t, 'right(table_id, 6) >= ''201501'' ')
We have a set of events (kind of log) that we want to connect to get the current state. To improve performance/cost further, we would like to create snapshots (in order to not check all the events in history, but only from the last snapshot). Logs and snapshots are the tables with date suffix.
This approach works OK in the BQ, but we need to manually define the query every time. Is there any way to define 'view' with parameters (e.g. dates for the table range query)? Or any plans to do something like that?
I know that there are some topics connected with TABLE_RANGE / QUERY in views (eg Use of TABLE_DATE_RANGE function in Views). Are there any new information on this subject?
That's a great feature request - but currently not supported. Please leave more details at https://code.google.com/p/google-bigquery/issues/list, the BigQuery team takes these requests very seriously!
As a workaround i wrote a small framework to generate complex queries with help of velocity templates. Just published it at https://github.com/softkot/gbq
Now you can use Table Functions (aka table-valued functions - TVF) to achieve this. They are very similar to a view but they accept a parameter. I've tested and they really help to save a lot while keeping future queries simple, since the complexity is inside the Table Function definition. It receives a parameter that you can then use inside the query for filtering.
This example is from the documentation:
CREATE OR REPLACE TABLE FUNCTION mydataset.names_by_year(y INT64)
AS
SELECT year, name, SUM(number) AS total
FROM `bigquery-public-data.usa_names.usa_1910_current`
WHERE year = y
GROUP BY year, name
Then you just query it like this:
SELECT * FROM mydataset.names_by_year(1950)
More details can be found in the oficial documentation.
You can have a look at BigQuery scripting that have been released in beta : https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
Is there any any way to use TABLE_DATE_RANGE or TABLE_QUERY with table decorators?
I would like to extract all "new records" (added in past 2 hours) from a set of tables.
Many thanks
Not currently. That's an interesting feature request, however. I'm trying to think of what the syntax would look like. Perhaps where the dataset name / prefix is used we could use a *, as in
TABLE_DATE_RANGE(dataset1.prefix*#time, timestamp1, timestamp2) or
TABLE_QUERY(dataset1.prefix*#time, 'where clause')
I've filed a feature request.