BigQuery calculate grand totals row - google-bigquery

Is there any way to add row with grand totals under my result set, like ms-sql ROLLUP operator? I want to avoid using extra query for this totals.

Edited: BigQuery now supports the ROLLUP operator. See the query reference docs for more information.

Update: Since this question was posted, BigQuery did implement support GROUP BY ROLLUP clause compatible with SQL Standard. It is documented here: https://cloud.google.com/bigquery/query-reference#groupby

Related

CURRENT in BigQuery?

I've noticed that CURRENT is a reserved keyword for BigQuery at: https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical.
What exactly does CURRENT do? I've only seen it as a prefix for things such as CURRENT_TIME(), CURRENT_DATE(), and other such stuff but have never seen it by itself. Is this just reserved for future usage or do any SQL statements contain that as a keyword?
Just to add on the comment of #Jaytiger:
CURRENT keyword seems to be reserved as a part of SQL 2016 spec. en.wikipedia.org/wiki/SQL_reserved_words And you can check it's usage
in another DBMS implementations like Oracle, SQL Server. hope this is
helpful.
stackoverflow.com/questions/49110728/where-current-of-in-pl-sql
In BigQuery CURRENT clause is used on defining frame_start and frame_end in window functions.
A window function, also known as an analytic function, computes values
over a group of rows and returns a single result for each row.
A common usage for this is calculating a cumulative sum for each category in the table. See BigQuery window function examples for reference.

How to write Window functions using Druid?

For example, i wanted to write Window functions like sum over (window)
Since over clause is not supported by Druid, how do i achieve the same using Druid Native query API or SQL API?
You should use a GroupBy Query. As Druid is a time series database, you have to specify your interval (window) where you want to query data from. You can use aggregation methods over this data, for example a SUM() aggregation.
If you want, you can also do extra filtering within your aggregation, like "only sum records where city=paris"). You could also apply the SUM aggregation only to records which exists in a certain time window within your selected interval.
If you are a PHP user then maybe this package is handy for you: https://github.com/level23/druid-client#sum
We have tried to implement an easy way to query such data.

TABLE_DATE_RANGE for xxxx_yyyymm format tables

I'm having a problem trying to query for 15 months worth of data.
I know about bigquery's wildcard functions, but I can't seem to get them to work with my tables.
For example, if my tables are called:
xxxx_201501,
xxxx_201502,
xxxx_201503,
...
xxxx_201606
How can I select everything from 201501 until today (current_timestamp)?
It seems that it's necessary to have the tables per day, am I wrong?
I've also read that you can use regex but can't find the way.
With Standard SQL, you can use a WHERE clause on a _TABLE_SUFFIX pseudo column as described here:
Is there an equivalent of table wildcard functions in BigQuery with standard SQL?
In this particular case, it would be:
SELECT ... from `mydataset.xxx_*` WHERE _TABLE_SUFFIX >= '201501';
This is a bit long for a comment.
If you are using the standard SQL dialect, then I don't think the functionality is yet implemented.
If you are using the legacy SQL dialect, then you can use a function such as TABLE_DATE_RANGE(). This and other table wildcard functions are well documented.
EDIT:
Oh, I see. The simplest way would be to store the tables as YYYYMM01 so you can use the range query.
But, you can also use table_query():
from table_query(t, 'right(table_id, 6) >= ''201501'' ')

Grouping by month in database and not with ruby

I'm trying to group calls by month but I need to do it in the database and not with ruby. Here is the current code:
Call.limit(1000).group_by { |t| t.created_at.month }
Which returns:
SELECT `calls`.* FROM `calls` ORDER BY created_at desc LIMIT 1000
Then ruby does the grouping. What should I do to make the database do the work ?
Thank you.
The short answer, is that you cannot achieve the same result at SQL level.
Here's the full explanation.
First of all, what should be the result of that call? You can use the PG/SQL Group BY statement, however it's likely the result is not what you expect.
The Group By syntax is designed to group rows with a pattern, and compute and aggregate function. In your case, even assuming you create a query that uses date_trunc to group by a part of the timestamp, the aggregate function does not permit you to return a dataset structured like the Ruby group_by method.
Why do you want to compute such grouping at database level?
If you have specific requirements or computation limits, then work on a custom method.
Use Call.limit(1000).group("month(created_at)")
Please checkout mysql date-time methods appropriate in your case. But .group() will do the mysql grouping.
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_month

SQL: Ordering by how much greater than something is?

I have two datetime fields here: actual_delivery and scheduled_delivery
What I want to do an ORDER BY on is how much great actual_delivery is than scheduled_delivery.
I'm using MySQL locally and PostgreSQL in production, so it needs to work for both.
If I were doing it in SQL Server I'd calculate DATEDIFF(actual_delivery, scheduled_deliver) AS [DeliveryDifference] then order by that computed column.
A quick search indicates there's a datediff function in MySql but the syntax may be slightly different in PostgreSQL so you may have to create your own function there.
Try this:
SELECT actual_delivery, scheduled_delivery, actual_delivery - scheduled_delivery as difference FROM tablename ORDER BY difference