My professor is teaching sql 2, and used an statement like the code below into a query:
HAVING SUM(column) > subselect
Where subselect is something like SELECT AVG(column) FROM ...
This subselect returns only one value, but I could not understand how is it possible to compare a function (the sum) with a subselect. The subselect should return a table, right? Then how is it possible to compare a table with a value? That did not make sense to me.
Thanks in advance.
SQL has a concept of scalar subqueries. These are subqueries that return exactly one column and at most one row. Scalar subqueries can be used in almost all cases where a single value ("scalar") can be used.
If the scalar subquery returns no rows, then the value is treated as NULL.
(I should add that some databases support tuples. A tuple is a set of scalar values that is treated as a single value. In such databases "scalar" subqueries can return more than one value, but these are converted to a tuple. This is not relevant to the question being asked; tuples are just another example of a "single" value.)
In principle you are right, if you look at it in a relational way.
But SQL is an industry standard and allows the abbreviation of comparing a scalar value with a result table with only a single row and column.
Depending on the exact implementation, it even allows to compare a scalar with a list of values (a column with more than one value), although you actually should write ... value > ALL (subselect), it is often accepted without the ALL keyword.
This is both valid syntax in the where and in the having clause
Related
I have a where clause that compares two columns to the following string. Does this concatenation run for every row? Should this string (and concatenation) be left in twice or should I create a variable to hold the result and use that in the WHERE clause?
CONCAT('%', #myVar, 'dr')
I checked the Execution Plan for your expression in a test table present in my database. The table has two nvarchar(50) columns namely firstname and fullname .
I can clearly see that even for three AND conditions in where clause that I put deliberately, the SQL Server engine is showing 0% cost for both the compute scalar steps. It is clearly evident of the fact that irrespective of whether you create a separate variable for the concatenation expression or leave it in-line in your where clause it is not going to make any difference.
Create a variable to hold the result and use that in the WHERE clause
I had a discussion with a teammate on the topic whether the terms clause and expression can be used interchangeably. For example, is it correct/common to call a variable that stands for an expression a=b (e.g. that participates in a statement SELECT * WHERE expression) a clause?
Edit
It would be useful is someone could give precise definitions of what clause, expression and statement are in SQL world.
In SQL Terms, "clause" is usually used to refer to a section of a statement, usually introduced by the keyword it's named after - e.g. a typical SELECT statement would be composed of a SELECT clause, a FROM clause and a WHERE clause. Within the FROM clause, some people may refer to JOIN clauses and ON clauses. However, this is by no means 100% accepted usage.
When it comes to "statement" and "expression", it's fairly standard usage - an expression is something that produces a value. In most languages, this is understood, further, to be something that produces a scalar value. In SQL, this is slightly modified because when you encounter an expression when working with a row set, the expression will produce one scalar value per row (or per group or partition, if grouping or partitioning are involved and it's in the relevant location).
Finally, a statement is a complete "something" that your database engine can understand and produce results for. It doesn't produce a value but it may produce a result set. You can't just send a FROM clause to the database - it has to be part of a larger statement, such as the SELECT statement I mentioned in my first paragraph.
The answer is NO, expression evaluates to something may be a boolean value or string or number where as a clause forms a rule for the data to satisfy and only then the record forms part of the result.
select * from TABLE where /*clause 1*/ field1 = field2
and /*clause 2*/field3 = /*expression*/ field1 + field2
In the above select statement
first clause forms a rule which is field1 should be equal to field2
Second clause form a rule which is field3 should be equal to the result of the > expression field1 + field2
UPDATE
There are various clauses in SQL like from, where, order by, group by and having. from clause tells from which table to read and order by tells how to arrange the result. Clauses control from where data to be read, what data be formed as part of the select statement and how the data to be presented.
Expression on the other hand evaluate to a value of some datatype.
A Statement, is a structured query build with the clauses.
Assume value is an int and the following query is valid:
SELECT blah
FROM table
WHERE attribute = value
Though MAX(expression) returns int, the following is not valid:
SELECT blah
FROM table
WHERE attribute = MAX(expression)
OF course the desired effect can be achieved using a subquery, but my question is why was SQL designed this way - is there some reason why this sort of thing is not allowed? Students coming from programming languages where you can always replace a data-type by a function call that returns that type find this issue confusing. Is there an explanation one can give them rather than just saying "that's the way it is"?
It's just because of the order of operations of a query.
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
WHERE just filters the rows returned by FROM. An aggregate function like MAX() can't have a result returned because it hasn't even been applied to anything.
That's also the reason, why you can't use aliases defined in the SELECT clause in a WHERE clause, but you can use aliases defined in FROM clause.
A where clause checks every row to see if it matches the conditions specified.
A max computes a single value from a row set. If you put a max, or any other aggregate function into a where clause, how can SQL server figure out what rows the max function can use until the where clause has finished it filter?
This deals with the order that SQL Server processes commands in. It runs the WHERE clause before a GROUP BY or any aggregate. Since a where clause runs first, SQL Server can't tell if a row will be included in an aggregate until it processes the where. That is what the HAVING clause is for. HAVING runs after the GROUP BY and the WHERE and can include MAX since you have already filtered out the rows you don't want to use. See http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm for a good explanation of the order in which SQL commands run.
Maybe this work
SELECT blah
FROM table
WHERE attribute = (SELECT MAX(expresion) FROM table1)
The WHERE clause is specifically designed to test conditions against raw data (individual rows of the table). However, MAX is an aggregate function over multiple rows of data. Basically, without a sub-select, the WHERE clause knows nothing about any rows in the table except for the current row. So how can you determine the maximum value over a whole bunch of rows when you don't even know what those rows are?
Yes, it's a little bit of a simplification, especially when dealing with joins, but the same principle applies. WHERE is always row-by-row, so that's all it really knows about.
Even if you have a GROUP BY clause, the WHERE clause still only processes one row at a time in the raw data before grouping. It doesn't know the value of a column in any other rows, so it has no way of knowing which row has the maximum value.
Assuming this is MS SQL Server, the following would work.
SELECT TOP 1 blah
FROM table
ORDER BY expression DESC
Why is it that in SQL Server I can't do this:
select sum(count(id)) as 'count'
from table
But I can do
select sum(x.count)
from
(
select count(id) as 'count'
from table
) x
Are they not essentially the same thing? How am I meant to be thinking about this in order to understand why the first block of code isn't allowed?
SUM() in your example is a no-op - SUM() of a COUNT() means the same as just COUNT(). So neither of your example queries appear to do anything useful.
It seems to me that nesting aggregates would only make sense if you wanted to apply two different aggregations - meaning GROUP BY on different sets of columns. To specify two different aggregations you would need to use the GROUPING SETS feature or SUM() OVER feature. Maybe if you explain what you want to achieve someone could show you how.
The gist of the issue is that there is no such concept as aggregate of an aggregate applied to a relation, see Aggregation. Having such a concept would leave too many holes in the definition and makes the GROUP BY clause impossible to express: it needs to define both the inner aggregate GROUP BY clause and the outer aggregate as well! This applies also to the other aggregate attributes, like the HAVING clause.
However, the result of an aggregate applied to a relation is another relation, and this result relation in turn can support a new aggregate operator. This explains why you can aggregate the result into an outer SELECT. This leaves no ambiguity in the definition, each SELECT has its own distinct GROUP BY/HAVING clauses.
In simple terms, aggregation functions operate over a column and generate a scalar value, hence they cannot be applied over their result. When you create a select statement over a scalar value you transform it into an artificial column, that's why it can be used by an aggregation function again.
Please note that most of the times there's no point in applying an aggregation function over the result of another aggregation function: in your sample sum(count(id)) == count(id).
i would like to know what your expected result in this sql
select sum(count(id)) as 'count'
from table
when you use the count function, only 1 result(total count) will be return. So, may i ask why you want to sum the only 1 result.
You will surely got the error because an aggregate function cannot perform on an expression containing an aggregate or a subquery.
It's working for me using SQLFiddle, not sure why it would't work for you. But I do have an explanation as to why it might not be working for you and why the alternative would work...
Your example is using a keyword as a column name, that may not always work. But when the column is only in a sub expression, the query engine is free to discard the name (in fact it probaly does) so the fact that it potentially potentially conflicts with a key word may be disregarded.
EDIT: in response to your edit/comment. No, the two aren't equivalent. The RESULT would be equivalent, but the process of getting to that result is not at all similar. For the first to work, the parser has do some work that simply doesn't make sense for it to do (applying an aggregate to a single value, either on a row by row basis or as), in the second case, an aggregate is applied to a table. The fact that the table is a temporary virtual table will be unimportant to the aggregate function.
I think you can write the sql query, which produces 'count' of rows for the required output. Functions do not take aggregated functions like 'sum' or aggregated subquery. My problem was resolved by using a simple sql query to get the count out....
Microsoft SQL Server doesn’t support it.
You can get around this problem by using a Derived table:
select sum(x.count)
from
(
select count(id) as 'count'
from table
) x
On the other hand using the below code will give you an error message.
select sum(count(id)) as 'count'
from table
Cannot perform an aggregate function on an expression containing an
aggregate or a subquery
I am using a PostgreSQL database, and in a table representing some measurements I've two columns: measurement, and interpolated. In the first I've the observation (measurement), and in the second the interpolated value depending on nearby values. Every record with an original value has also an interpolated value. However, there are a lot of records without "original" observations (NULL), hence the values are interpolated and stored in the second column. So basically there are just two cases in the database:
Value Value
NULL Value
Of course, it is preferable to use the value from the first column if available, hence I need to build a query to select the data from the first column, and if not available (NULL), then the database returns the value from the second column for the record in question. I have no idea how to build the SQL query.
Please help. Thanks.
You can use Coalesce:
The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null.
Select Coalesce( first_value, second_value )
From your_table
This would return first_value if it is not NULL, second_value otherwise.
Peter nailed it. But for the sake of completeness (the title of your question is more general than your particular issue), here goes the docs for the several conditional expressions available in Postgresql (and in several other databases) : CASE, COALESCE, NULLIF, GREATEST, LEAST. The first one is the most general.