BigQuery and GROUP BY clause - google-bigquery

I'm trying to figure out how Google BigQuery works in respect to aggregation and grouping. I read the documentation and it says for GROUP BY this:
The GROUP BY clause allows you to group rows that have the same values
for a given field. You can then perform aggregate functions on each of
the groups. Grouping occurs after any selection or aggregation in the
SELECT clause.
So it says that after grouping I can perform aggregate functions (I assume that's functions like COUNT). But than the sentence later it says that grouping occurs after any selection or aggregation in the SELECT clause.
So if I have
SELECT f1, COUNT(f2)
FROM ds.Table
GROUP BY f1;
Which happens first, grouping or counting?

You will have the group and then the count. In your case you would get a single line for each f1 and then the count.
However, if you want to do something interesting, you could use window functions in which first you can group by some fields, and then you can execute functions against the resulting rows, which is quite handy.
Take a look at the window functions section of the bigquery online docs for a few examples on this.

Related

Is there possible to use SQL Lag function itself?

I need to find solution for my SQL report issue.
My issue is related to find appropriate way about using lag function for the column that was came from lag function.
Here it is picture to clarify my problem
That looks like a simple running total. You don't need lag() for that. When using an aggregate like sum() together with an order by in the window definition, it will give you just that.
But you you have to specify an order by, because rows in a table don't have any implied ordering.
select column1,
sum(column1) over (order by ???) as column1_calculation
from the_table
order by ???
You need to replace the ??? in the above statement with the column that defines the sort order of your rows. Very often a date/time column is used for that to get the running total over time.
Online example

exclude a column from group by statement

I would like to exclude a column from group by statement, because it results in some redundant records. Are there any recommendations?
I use Oracle, and have a complex query which join 6 tables together, and want to use sql aggregate function (count), without duplicate result.
You can't.
When using aggregate functions every column/column expression which is not an aggregate must be in the GROUP BY.
This is completely logical. If you're not aggregating the column then excluding it from the GROUP BY would force Oracle to chose a random value, which is not very useful.
If you don't want this column in your GROUP BY then you must decide what aggregation to apply to this column in order to return the appropriate data for your situation. You can't hand this responsibility off to the database engine.

Sql error in simple query? Any ideas?

I'm getting the error:
Column 'A10000012VICKERS.dbo.IMAGES.idimage' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Any ideas why I would be getting this error or how to fix it? I thought that I was just asking for the size of a number of filestream columns and the values of two others?
SELECT
idimage,
filetype,
SUM(DATALENGTH(filestreamimageoriginal)) AS original,
SUM(DATALENGTH(filestreamimagefull)) AS [full],
SUM(DATALENGTH(filestreamimageextra)) AS extra,
SUM(DATALENGTH(filestreamimagelarge)) AS large,
SUM(DATALENGTH(filestreamimagemedium)) AS medium,
SUM(DATALENGTH(filestreamimagesmall)) AS small, SUM(DATALENGTH(filestreamimagethumbnail)) AS thumbnail
FROM A10000012VICKERS.dbo.IMAGES WHERE display = 1
I don't really see how that query could generate that message. There is no column with that name. However, the query does have an obvious error.
Your query is an aggregation query because it uses SUM() in the SELECT clause. However, this will return only one row, unless you also have a GROUP BY.
Add this to the end of your query:
GROUP BY idimage, filetype
Or, remove these columns from the SELECT.
By using an aggregation function (SUM) you are aggregating your records. As you have specified no GROUP BY clause you will get one result row, i.e. an aggregation over all rows. In this aggregation, however, there is no longer one idimage or one filetype that you could show in your results.
So either use an aggregation function on these, too (e.g. max(idimage), min(filetype)) or remove them from the query, if you really want one aggregate over all these rows.
If, however, you want to aggregate per idimage and filetype, then add GROUP BY idimage, filetype at the end of your query.

Why can't I perform an aggregate function on an expression containing an aggregate but I can do so by creating a new select statement around it?

Why is it that in SQL Server I can't do this:
select sum(count(id)) as 'count'
from table
But I can do
select sum(x.count)
from
(
select count(id) as 'count'
from table
) x
Are they not essentially the same thing? How am I meant to be thinking about this in order to understand why the first block of code isn't allowed?
SUM() in your example is a no-op - SUM() of a COUNT() means the same as just COUNT(). So neither of your example queries appear to do anything useful.
It seems to me that nesting aggregates would only make sense if you wanted to apply two different aggregations - meaning GROUP BY on different sets of columns. To specify two different aggregations you would need to use the GROUPING SETS feature or SUM() OVER feature. Maybe if you explain what you want to achieve someone could show you how.
The gist of the issue is that there is no such concept as aggregate of an aggregate applied to a relation, see Aggregation. Having such a concept would leave too many holes in the definition and makes the GROUP BY clause impossible to express: it needs to define both the inner aggregate GROUP BY clause and the outer aggregate as well! This applies also to the other aggregate attributes, like the HAVING clause.
However, the result of an aggregate applied to a relation is another relation, and this result relation in turn can support a new aggregate operator. This explains why you can aggregate the result into an outer SELECT. This leaves no ambiguity in the definition, each SELECT has its own distinct GROUP BY/HAVING clauses.
In simple terms, aggregation functions operate over a column and generate a scalar value, hence they cannot be applied over their result. When you create a select statement over a scalar value you transform it into an artificial column, that's why it can be used by an aggregation function again.
Please note that most of the times there's no point in applying an aggregation function over the result of another aggregation function: in your sample sum(count(id)) == count(id).
i would like to know what your expected result in this sql
select sum(count(id)) as 'count'
from table
when you use the count function, only 1 result(total count) will be return. So, may i ask why you want to sum the only 1 result.
You will surely got the error because an aggregate function cannot perform on an expression containing an aggregate or a subquery.
It's working for me using SQLFiddle, not sure why it would't work for you. But I do have an explanation as to why it might not be working for you and why the alternative would work...
Your example is using a keyword as a column name, that may not always work. But when the column is only in a sub expression, the query engine is free to discard the name (in fact it probaly does) so the fact that it potentially potentially conflicts with a key word may be disregarded.
EDIT: in response to your edit/comment. No, the two aren't equivalent. The RESULT would be equivalent, but the process of getting to that result is not at all similar. For the first to work, the parser has do some work that simply doesn't make sense for it to do (applying an aggregate to a single value, either on a row by row basis or as), in the second case, an aggregate is applied to a table. The fact that the table is a temporary virtual table will be unimportant to the aggregate function.
I think you can write the sql query, which produces 'count' of rows for the required output. Functions do not take aggregated functions like 'sum' or aggregated subquery. My problem was resolved by using a simple sql query to get the count out....
Microsoft SQL Server doesn’t support it.
You can get around this problem by using a Derived table:
select sum(x.count)
from
(
select count(id) as 'count'
from table
) x
On the other hand using the below code will give you an error message.
select sum(count(id)) as 'count'
from table
Cannot perform an aggregate function on an expression containing an
aggregate or a subquery

Any reason for GROUP BY clause without aggregation function?

I'm (thoroughly) learning SQL at the moment and came across the GROUP BYclause.
GROUP BY aggregates or groups the resultset according to the argument(s) you give it. If you use this clause in a query you can then perform aggregate functions on the resultset to find statistical information on the resultset like finding averages (AVG()) or frequency (COUNT()).
My question is: is the GROUP BY statement in any way useful without an accompanying aggregate function?
Update
Using GROUP BY as a synonym for DISTINCT is (probably) a bad idea because I suspect it is slower.
is the GROUP BY statement in any way useful without an accompanying aggregate function?
Using DISTINCT would be a synonym in such a situation, but the reason you'd want/have to define a GROUP BY clause would be in order to be able to define HAVING clause details.
If you need to define a HAVING clause, you have to define a GROUP BY - you can't do it in conjunction with DISTINCT.
You can perform a DISTINCT select by using a GROUP BY without any AGGREGATES.
Group by can used in Two way Majorly
1)in conjunction with SQL aggregation functions
2)to eliminate duplicate rows from a result set
SO answer to your question lies in second part of USEs above described.
Note: everything below only applies to MySQL
GROUP BY is guaranteed to return results in order, DISTINCT is not.
GROUP BY along with ORDER BY NULL is of same efficiency as DISTINCT (and implemented in the say way). If there is an index on the field being aggregated (or distinctified), both clauses use loose index scan over this field.
In GROUP BY, you can return non-grouped and non-aggregated expressions. MySQL will pick any random values from from the corresponding group to calculate the expression.
With GROUP BY, you can omit the GROUP BY expressions from the SELECT clause. With DISTINCT, you can't. Every row returned by a DISTINCT is guaranteed to be unique.
It is used for more then just aggregating functions.
For example, consider the following code:
SELECT product_name, MAX('last_purchased') FROM products GROUP BY product_name
This will return only 1 result per product, but with the latest updated value of that records.