I'm working with Google Analytics data in a Google Big Query table using standard sql. I wish to select group data at a weekly level. I've found this solution:
SELECT FORMAT_TIMESTAMP("%W", PARSE_TIMESTAMP("%Y%m%d", '20150519'))
Outputs:
20
Is there a shorter method to get to this point? It seem quite long to me and I'm wondering if I'm missing a trick.
The only "improvement" I see can be done - is using PARSE_DATE instead of PARSE_TIMESTAMP
SELECT FORMAT_DATE("%W", PARSE_DATE("%Y%m%d", '20150519'))
Related
I'm fairly rookie when it comes to SQL but as of recent I've been having to use it in it's basic form to do very simple tasks like only recalling relevant columns from a table etc.
I'm currently using DBeaver as my SQL Client and for this example I'm tapping straight into a CSV, no problems there. The data I'm working with is transaction data and the table is structured as follows
My problem is that the data is in 15 minute intervals whereas I need a value per day per store per metric (I.E. in the image example, it would return "Site" = 101 - "Metric" = FOODSER3 - "Date" = 2020-08-09 - "Value" = 6.0000)
Firstly, is this possible
Secondly, if so then please could someone let me in on the secret of how and maybe an explanation as to what the resolution is and why so that I can really understand what's going on.
I'm fairly proficient in Javascript and VBA, but so far SQL defeats me at every hurdle.
The structure of such a query is aggregation. Date/time functions are notoriously dependent on a database, but the idea is:
select cast(date as date), site, metric, sum(value)
from t
group by cast(date as date), site, metric;
I am using Big Query SQL and I can't use a couple of functions and more specifically WEEKNUM. Everytime I try to, it outputs unrecognized function.
WEEKNUM
During my search I found this that I think it meansI can't use derive functions, and I also think that part of it is the WEEKNUM. I could be totally wrong though.
So how can I use the WEEKNUM function or do what it does with another way?
Thanks
In Standard SQL, use EXTRACT():
select extract(week from current_date)
Your link to WEEKNUM is for Google Cloud DataPrep. The BigQuery Documentation for date functions does not use WEEKNUM, but allows similar functionality through EXTRACT or FORMAT_DATE.
I have written a query in Google Big Query and want to get the same number of users I see in Google Analytics. I used Legacy and Normal SQL and got 3 different users numbers while the sessions were the same. What did I do wrong, or does anyone have an explanation/solution for it? Every help is appreciated!
Normal SQL
SELECT COUNT(DISTINCT fullVisitorId) AS users, SUM(IF(totals.visits IS
NULL,0,totals.visits)) AS sessions
FROM `XXX.XXX.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20181120' AND '20181120'
Legacy SQL
SELECT COUNT(DISTINCT fullVisitorId) AS users, SUM(IF(totals.visits IS
NULL,0,totals.visits)) AS sessions
FROM TABLE_DATE_RANGE([XXX:XXX.ga_sessions_], TIMESTAMP('2018-11-20'),
TIMESTAMP('2018-11-20'))
I think this warning from the documentation explains what is happening:
In legacy SQL, COUNT(DISTINCT x) returns an approximate count. In standard SQL, it returns an exact count.
StandardSQL has the correct number. You can test this by attempting to use EXACT_COUNT_DISTINCT() in legacy SQL.
This is my problem.
I would like to request only the data of the last hour from Big Query.
I would like to use Standard Sql.
I would like to pay only for read the data in this interval of time.
Example :
My partition of the day take 200 Go. I request data of the last hour (40Go). Is it possible to pay only for 40Go in Standard SQL ?
Thanks !
You can use table decorators (specifically range decorators) but they are supported in BigQuery Legacy SQL ONLY
To get data from the last hour you can use below:
SELECT <list_of_fields>
FROM [yourproject:yourdataset.yourtable#-3600000-]
Of course, the preferred query syntax for BigQuery is standard SQL - so you can either have your query logic built with Legacy SQL syntax and thus have whole logic in one query or you can use split logic to first get last hour data into temp table using legacy's sql decorators and then use standard sql to apply needed logic
Meantime see below opened issue on Google's Issue Tracker:
Support an equivalent to table decorators in standard SQL
From that thread - looks like the closest feature to meet your case could be hourly partitioning - whenever it will be available
I'm having a problem trying to query for 15 months worth of data.
I know about bigquery's wildcard functions, but I can't seem to get them to work with my tables.
For example, if my tables are called:
xxxx_201501,
xxxx_201502,
xxxx_201503,
...
xxxx_201606
How can I select everything from 201501 until today (current_timestamp)?
It seems that it's necessary to have the tables per day, am I wrong?
I've also read that you can use regex but can't find the way.
With Standard SQL, you can use a WHERE clause on a _TABLE_SUFFIX pseudo column as described here:
Is there an equivalent of table wildcard functions in BigQuery with standard SQL?
In this particular case, it would be:
SELECT ... from `mydataset.xxx_*` WHERE _TABLE_SUFFIX >= '201501';
This is a bit long for a comment.
If you are using the standard SQL dialect, then I don't think the functionality is yet implemented.
If you are using the legacy SQL dialect, then you can use a function such as TABLE_DATE_RANGE(). This and other table wildcard functions are well documented.
EDIT:
Oh, I see. The simplest way would be to store the tables as YYYYMM01 so you can use the range query.
But, you can also use table_query():
from table_query(t, 'right(table_id, 6) >= ''201501'' ')