SQL: How to disable result of aggregate on empty table? - sql

When applying SQL aggregate functions (COUNT, MAX, etc.) on an empty table, I would like to get an empty result set (no rows) to simplify processing in the ORM.
Currently, the special return values (0 for COUNT, NULL for all other aggregates) are returned (assuming an empty table user):
sqlite> SELECT COUNT(id) FROM user;
count(id)
0
I know there is the trick to use GROUP BY plus HAVING clause to filter empty results, but this is rather cumbersome and I am unsure about performance:
sqlite> SELECT COUNT(id) FROM user GROUP BY 1=1 HAVING COUNT(id) > 0;
sqlite>
Thus the questions:
Is it possible to disable aggregate functions to return a row if the source table is empty?
Is there a performance impact of using a GROUP BY clause that true for all entries?

A SQL aggregation query with no group by returns one row. This is by definition. It is how SQL works. Usually, this is considered a good thing and actually makes applications work better.
For instance, it is easier to check that a single column count, rather than checking count (if there are rows) and checking for no rows (in other cases).
In SQLite, you can do what you want by adding a GROUP BY. So:
select . . . -- aggregation functions only
from . . .
group by null;
This is grouping by a constant, which is functionally equivalent to no group by, unless there are no rows. This version returns an empty result set.

Related

Which row will be evaluated when using HAVING clause to a candidate group?

I'm new to SQL. Today I am using GROUP BY and HAVING clause to a table, just like:
CREATE TABLE tab(a, b);
INSERT INTO tab VALUES(0, 1);
INSERT INTO tab VALUES(-1, 1);
SELECT COUNT(b) FROM tab GROUP BY b HAVING a;
I got nothing output.
But when I changed the order of the two INSERT statements, I got the output 2.
So I got two distinct output just by changing the order of two INSERT statements.
The candidate group may be discarded when the result of evaluating the HAVING clause is false, but which column does SQLite use to evaluate the value for the HAVING clasuse?
I wonder if this behavior is specified in the SQL standard or in the SQLite documentation.
Your code would not even run in most databases because it contains the non-aggregate expression a in the HAVING clause.
But SQLite allows it.
From SELECT/Simple Select Processing/Generation of the set of result rows:
If a HAVING clause is specified, it is evaluated once for each group
of rows as a boolean expression. If the result of evaluating the
HAVING clause is false, the group is discarded. If the HAVING clause
is an aggregate expression, it is evaluated across all rows in the
group. If a HAVING clause is a non-aggregate expression, it is
evaluated with respect to an arbitrarily selected row from the
group...
What happens is that SQLite chooses an arbitrary row of the group (usually the first) to evaluate the HAVING clause.
This is why you get different results when you change the order of insertion of the rows.

Select of calculated value always returns row

I have a database (running on postgres 9.3) of bookings of resources. This database contains a table reservations which contains beside other values the start and stop time of the reservation (as timestamp with time zone)
Now I need to know how much reservations a given company has currently active in the future in terms of total hours of all these reservations added together.
I have put together the following query that does the job:
SELECT EXTRACT(EPOCH FROM Sum(stop-start))/3600 AS total
FROM (reservations JOIN partners ON partner = email)
WHERE stop > now() AND company = 'givencompany'
This works quite well if the given company has reservations in the future. The problem I am experiencing is that when the company doesnt have any reservations the query does in fact return a row but the collumn total is empty whereas I would like it to return no row at all (or a row containing 0 if nothing is too complicated) in that case.
Is this possible to accomplish with a different SELECT or another modification to the database or does the consuming application have to check for null every time?
Sorry if my question is trivial but I am very new to databases altogether
Edit
I found out that I could default the returned value with 0 by using COALESCE but I would much prefer it if no row would be returned
Short answer: just add HAVING Sum(stop-start) IS NOT NULL at the end of query.
Long answer:
This query has no explicit GROUP BY, but since it aggregates the rows with sum(), it's implicitly turned into a GROUP BY query, with all the rows matching the WHERE condition taken as one group.
See the doc on SELECT :
without GROUP BY, an aggregate produces a single value computed across
all the selected rows
And about the HAVING clause:
The presence of HAVING turns a query into a grouped query even if
there is no GROUP BY clause. This is the same as what happens when the
query contains aggregate functions but no GROUP BY clause. All the
selected rows are considered to form a single group, and the SELECT
list and HAVING clause can only reference table columns from within
aggregate functions. Such a query will emit a single row if the HAVING
condition is true, zero rows if it is not true.

SQL - Using MAX in a WHERE clause

Assume value is an int and the following query is valid:
SELECT blah
FROM table
WHERE attribute = value
Though MAX(expression) returns int, the following is not valid:
SELECT blah
FROM table
WHERE attribute = MAX(expression)
OF course the desired effect can be achieved using a subquery, but my question is why was SQL designed this way - is there some reason why this sort of thing is not allowed? Students coming from programming languages where you can always replace a data-type by a function call that returns that type find this issue confusing. Is there an explanation one can give them rather than just saying "that's the way it is"?
It's just because of the order of operations of a query.
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
WHERE just filters the rows returned by FROM. An aggregate function like MAX() can't have a result returned because it hasn't even been applied to anything.
That's also the reason, why you can't use aliases defined in the SELECT clause in a WHERE clause, but you can use aliases defined in FROM clause.
A where clause checks every row to see if it matches the conditions specified.
A max computes a single value from a row set. If you put a max, or any other aggregate function into a where clause, how can SQL server figure out what rows the max function can use until the where clause has finished it filter?
This deals with the order that SQL Server processes commands in. It runs the WHERE clause before a GROUP BY or any aggregate. Since a where clause runs first, SQL Server can't tell if a row will be included in an aggregate until it processes the where. That is what the HAVING clause is for. HAVING runs after the GROUP BY and the WHERE and can include MAX since you have already filtered out the rows you don't want to use. See http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm for a good explanation of the order in which SQL commands run.
Maybe this work
SELECT blah
FROM table
WHERE attribute = (SELECT MAX(expresion) FROM table1)
The WHERE clause is specifically designed to test conditions against raw data (individual rows of the table). However, MAX is an aggregate function over multiple rows of data. Basically, without a sub-select, the WHERE clause knows nothing about any rows in the table except for the current row. So how can you determine the maximum value over a whole bunch of rows when you don't even know what those rows are?
Yes, it's a little bit of a simplification, especially when dealing with joins, but the same principle applies. WHERE is always row-by-row, so that's all it really knows about.
Even if you have a GROUP BY clause, the WHERE clause still only processes one row at a time in the raw data before grouping. It doesn't know the value of a column in any other rows, so it has no way of knowing which row has the maximum value.
Assuming this is MS SQL Server, the following would work.
SELECT TOP 1 blah
FROM table
ORDER BY expression DESC

How to get other columns in this query

I am using a group by clause in my query. I want to get other columns not specified in the group by parameters
SELECT un.user, un.role
FROM [Unique] un
group by user, role
In the query about [Unique] has 7 columns altogether. How do I get the other columns?
In most databases (MySQL and SQLite are the exceptions I know of), you cannot include a column in a GROUP BY SELECT unless:
The column is included in the GROUP BY clause.
The column is aggregated in one of the supported aggregate functions.
In MySQL and SQLite, the rows inside the aggregate groups from which the extra values get taken are undefined.
If you want extra columns in any other engine, you can wrap the column names in MAX():
SELECT un.user, un.role, MAX(un.city), MAX(un.bday)
FROM [Unique] un
GROUP BY user, role
In this case, the values for the extra columns are likely to come from different rows in the input record set. If this is important (sometimes it isn't since the extra columns come from the one side of a one-to-many JOIN), you can't use this technique.
Just to be clear: If you use GROUP BY in a SELECT, then each row you get back is constructed out of groups of multiple rows in the table you're SELECTing against. If you include columns that are not part of the GROUP BY clause, you're not giving the engine any instructions on which row from the table you want that value read from. Most engines, therefore, do not allow you to run this kind of SQL. MySQL does, with undefined results but I personally consider it bad practice to do this.
You have to choose on what basis you want the other columns. If multiple entries exist for the same user / role, do you want the first / last / random? You have to make choices on the other columns, by aggregating them or choosing to include them in the group by statement.
Some RDBMS do provide a default behaviour for performing this, but since the question is just marked SQL, we do not know if it applies.
Have you tried just specifying them?
SELECT un.user, un.role, un.col3, un.col4
FROM [Unique] un
group by user, role
You need to use a Order By to get extra column. or you end up specifying every column in your group by.
Use LEFT JOIN to self-join the Unique or use the SELECT with GROUP BY as sub-query.

Any reason for GROUP BY clause without aggregation function?

I'm (thoroughly) learning SQL at the moment and came across the GROUP BYclause.
GROUP BY aggregates or groups the resultset according to the argument(s) you give it. If you use this clause in a query you can then perform aggregate functions on the resultset to find statistical information on the resultset like finding averages (AVG()) or frequency (COUNT()).
My question is: is the GROUP BY statement in any way useful without an accompanying aggregate function?
Update
Using GROUP BY as a synonym for DISTINCT is (probably) a bad idea because I suspect it is slower.
is the GROUP BY statement in any way useful without an accompanying aggregate function?
Using DISTINCT would be a synonym in such a situation, but the reason you'd want/have to define a GROUP BY clause would be in order to be able to define HAVING clause details.
If you need to define a HAVING clause, you have to define a GROUP BY - you can't do it in conjunction with DISTINCT.
You can perform a DISTINCT select by using a GROUP BY without any AGGREGATES.
Group by can used in Two way Majorly
1)in conjunction with SQL aggregation functions
2)to eliminate duplicate rows from a result set
SO answer to your question lies in second part of USEs above described.
Note: everything below only applies to MySQL
GROUP BY is guaranteed to return results in order, DISTINCT is not.
GROUP BY along with ORDER BY NULL is of same efficiency as DISTINCT (and implemented in the say way). If there is an index on the field being aggregated (or distinctified), both clauses use loose index scan over this field.
In GROUP BY, you can return non-grouped and non-aggregated expressions. MySQL will pick any random values from from the corresponding group to calculate the expression.
With GROUP BY, you can omit the GROUP BY expressions from the SELECT clause. With DISTINCT, you can't. Every row returned by a DISTINCT is guaranteed to be unique.
It is used for more then just aggregating functions.
For example, consider the following code:
SELECT product_name, MAX('last_purchased') FROM products GROUP BY product_name
This will return only 1 result per product, but with the latest updated value of that records.