My first foray into SQL and I'm having some difficulty applying the MAX() function.
If I run the following, I receive the correct returned value:
SELECT MAX(count)
FROM readings
However, when I try to also return fields related to that value, I get an incorrect return. Running the following returns the correct 'count' value but an incorrect 'location' value
SELECT MAX(count), location
FROM readings
What I expected from the above, are results the same as from:
SELECT count, location
FROM readings
ORDER BY count DESC
LIMIT 1
Could you please advise if it is possible to achieve this using the MAX() function or if I have just misunderstood what MAX actually does!
Your advice is greatly appreciated.
What database system are you using? MAX is an aggregation function that should be operating across the entire table, while selecting a single value (like location in your query) is operating only on a single row. In most databases, if you want to select another column, you must specify that column in a GROUP BY clause or also wrap it in a similar aggregation function.
To get the value of location in the same row, you typically should use a subselect, like this:
SELECT count, location
FROM readings
WHERE count = (SELECT MAX(count) FROM readings);
Note that this doesn't guarantee a single result, though; there could be several rows that match the maximum count value!
Related
I'm trying to build a query that groups and sums two different fields for the same ID. Then I'm trying to extract only the records that are different where the grouped total is different from the summed total.
For example - The sum([EstimatedEmployeesAtLocation]) should equal the grouped Estimatedtotalemployees is the same for each record.
This is what three records would look like
ID 1,1,1
Estimatedtotalemployees 10,10,10
EstimatedEmployeesAtLocation 6,2,1
I know the issue is something with using an aggregate function in the where clause, because the query works until I add the where clause. But I don't know the correct syntax. Can someone please advise?
select ID, Estimatedtotalemployees, sum([EstimatedEmployeesAtLocation]) emploc
from Rawdata
where sum([EstimatedEmployeesAtLocation]) <> EstimatedTotalEmployees
group by policyNumber, EstimatedTotalEmployees
This is the error message.An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference. I'm just starting to use SQL beyond basic querying so any help is much appreciated -
SQLs are evaluated in roughly the following order:
FROM - the data is joined
WHERE - the data is filtered
GROUP BY - the data is aggregated
SELECT - the result set is produced
You can't use a SUM(...) in a WHERE clause because it hasn't been calculated yet; it's done as part of the GROUP BY and SELECT
You also can't have ID in your select list because you haven't grouped by it.
You might be better off asking a new question/editing some detail into this one that sets out what raw data you have (and please put it with columns across the top, rows downwards, like SSMS shows you), what you've tried (e.g the above sql), and what your desired result is (again, rows run horizontally and columns run vertically, otherwise someone will end up giving you a pivot). Right now you seem to have given the raw data but have labelled it as the result data, and it looks the wrong way round so I'm not sure if it needs rotating or not
Here's my best stab at what you're after:
select ID, MAX(Estimatedtotalemployees), SUM([EstimatedEmployeesAtLocation])
from Rawdata
group by ID
having SUM([EstimatedEmployeesAtLocation]) <> MAX(EstimatedTotalEmployees)
This is hard to explain, but say I have this query:
SELECT *
FROM "late_fee_tiers"
And it returns this:
I have a validation in code set up to prevent duplicate days from being saved (notice there are 2 rows of days = 2).
I want my query to double-check there are only unique rows of day, and if there are multiple, select the first one (so it should return 3 rows with 2,3,5).
My first thought is to use GROUP BY day, while selecting a MIN("id").
The problem is, I don't understand SQL enough, because it forces me to add different aggregator functions to every single column... but what if I don't want to do that? I want THAT row to be "chosen" according to the single aggregator function I define, I don't need multiple aggregators creating some weird hybrid row. I just want the MIN() function to choose that 1 row and fill in all the rest of the values for that row.
What function do I use to do this, or how would I do it?
Thanks
You want to use DISTINCT ON:
select distinct on (day) *
from "late_fee_tiers"
order by day, id;
Why day is also required in the order by:
From the official documentation:
The DISTINCT ON expression(s) must match the leftmost ORDER BY
expression(s). The ORDER BY clause will normally contain additional
expression(s) that determine the desired precedence of rows within
each DISTINCT ON group.
I'm getting the error:
Column 'A10000012VICKERS.dbo.IMAGES.idimage' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Any ideas why I would be getting this error or how to fix it? I thought that I was just asking for the size of a number of filestream columns and the values of two others?
SELECT
idimage,
filetype,
SUM(DATALENGTH(filestreamimageoriginal)) AS original,
SUM(DATALENGTH(filestreamimagefull)) AS [full],
SUM(DATALENGTH(filestreamimageextra)) AS extra,
SUM(DATALENGTH(filestreamimagelarge)) AS large,
SUM(DATALENGTH(filestreamimagemedium)) AS medium,
SUM(DATALENGTH(filestreamimagesmall)) AS small, SUM(DATALENGTH(filestreamimagethumbnail)) AS thumbnail
FROM A10000012VICKERS.dbo.IMAGES WHERE display = 1
I don't really see how that query could generate that message. There is no column with that name. However, the query does have an obvious error.
Your query is an aggregation query because it uses SUM() in the SELECT clause. However, this will return only one row, unless you also have a GROUP BY.
Add this to the end of your query:
GROUP BY idimage, filetype
Or, remove these columns from the SELECT.
By using an aggregation function (SUM) you are aggregating your records. As you have specified no GROUP BY clause you will get one result row, i.e. an aggregation over all rows. In this aggregation, however, there is no longer one idimage or one filetype that you could show in your results.
So either use an aggregation function on these, too (e.g. max(idimage), min(filetype)) or remove them from the query, if you really want one aggregate over all these rows.
If, however, you want to aggregate per idimage and filetype, then add GROUP BY idimage, filetype at the end of your query.
Assume value is an int and the following query is valid:
SELECT blah
FROM table
WHERE attribute = value
Though MAX(expression) returns int, the following is not valid:
SELECT blah
FROM table
WHERE attribute = MAX(expression)
OF course the desired effect can be achieved using a subquery, but my question is why was SQL designed this way - is there some reason why this sort of thing is not allowed? Students coming from programming languages where you can always replace a data-type by a function call that returns that type find this issue confusing. Is there an explanation one can give them rather than just saying "that's the way it is"?
It's just because of the order of operations of a query.
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
WHERE just filters the rows returned by FROM. An aggregate function like MAX() can't have a result returned because it hasn't even been applied to anything.
That's also the reason, why you can't use aliases defined in the SELECT clause in a WHERE clause, but you can use aliases defined in FROM clause.
A where clause checks every row to see if it matches the conditions specified.
A max computes a single value from a row set. If you put a max, or any other aggregate function into a where clause, how can SQL server figure out what rows the max function can use until the where clause has finished it filter?
This deals with the order that SQL Server processes commands in. It runs the WHERE clause before a GROUP BY or any aggregate. Since a where clause runs first, SQL Server can't tell if a row will be included in an aggregate until it processes the where. That is what the HAVING clause is for. HAVING runs after the GROUP BY and the WHERE and can include MAX since you have already filtered out the rows you don't want to use. See http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm for a good explanation of the order in which SQL commands run.
Maybe this work
SELECT blah
FROM table
WHERE attribute = (SELECT MAX(expresion) FROM table1)
The WHERE clause is specifically designed to test conditions against raw data (individual rows of the table). However, MAX is an aggregate function over multiple rows of data. Basically, without a sub-select, the WHERE clause knows nothing about any rows in the table except for the current row. So how can you determine the maximum value over a whole bunch of rows when you don't even know what those rows are?
Yes, it's a little bit of a simplification, especially when dealing with joins, but the same principle applies. WHERE is always row-by-row, so that's all it really knows about.
Even if you have a GROUP BY clause, the WHERE clause still only processes one row at a time in the raw data before grouping. It doesn't know the value of a column in any other rows, so it has no way of knowing which row has the maximum value.
Assuming this is MS SQL Server, the following would work.
SELECT TOP 1 blah
FROM table
ORDER BY expression DESC
Why is it that in SQL Server I can't do this:
select sum(count(id)) as 'count'
from table
But I can do
select sum(x.count)
from
(
select count(id) as 'count'
from table
) x
Are they not essentially the same thing? How am I meant to be thinking about this in order to understand why the first block of code isn't allowed?
SUM() in your example is a no-op - SUM() of a COUNT() means the same as just COUNT(). So neither of your example queries appear to do anything useful.
It seems to me that nesting aggregates would only make sense if you wanted to apply two different aggregations - meaning GROUP BY on different sets of columns. To specify two different aggregations you would need to use the GROUPING SETS feature or SUM() OVER feature. Maybe if you explain what you want to achieve someone could show you how.
The gist of the issue is that there is no such concept as aggregate of an aggregate applied to a relation, see Aggregation. Having such a concept would leave too many holes in the definition and makes the GROUP BY clause impossible to express: it needs to define both the inner aggregate GROUP BY clause and the outer aggregate as well! This applies also to the other aggregate attributes, like the HAVING clause.
However, the result of an aggregate applied to a relation is another relation, and this result relation in turn can support a new aggregate operator. This explains why you can aggregate the result into an outer SELECT. This leaves no ambiguity in the definition, each SELECT has its own distinct GROUP BY/HAVING clauses.
In simple terms, aggregation functions operate over a column and generate a scalar value, hence they cannot be applied over their result. When you create a select statement over a scalar value you transform it into an artificial column, that's why it can be used by an aggregation function again.
Please note that most of the times there's no point in applying an aggregation function over the result of another aggregation function: in your sample sum(count(id)) == count(id).
i would like to know what your expected result in this sql
select sum(count(id)) as 'count'
from table
when you use the count function, only 1 result(total count) will be return. So, may i ask why you want to sum the only 1 result.
You will surely got the error because an aggregate function cannot perform on an expression containing an aggregate or a subquery.
It's working for me using SQLFiddle, not sure why it would't work for you. But I do have an explanation as to why it might not be working for you and why the alternative would work...
Your example is using a keyword as a column name, that may not always work. But when the column is only in a sub expression, the query engine is free to discard the name (in fact it probaly does) so the fact that it potentially potentially conflicts with a key word may be disregarded.
EDIT: in response to your edit/comment. No, the two aren't equivalent. The RESULT would be equivalent, but the process of getting to that result is not at all similar. For the first to work, the parser has do some work that simply doesn't make sense for it to do (applying an aggregate to a single value, either on a row by row basis or as), in the second case, an aggregate is applied to a table. The fact that the table is a temporary virtual table will be unimportant to the aggregate function.
I think you can write the sql query, which produces 'count' of rows for the required output. Functions do not take aggregated functions like 'sum' or aggregated subquery. My problem was resolved by using a simple sql query to get the count out....
Microsoft SQL Server doesn’t support it.
You can get around this problem by using a Derived table:
select sum(x.count)
from
(
select count(id) as 'count'
from table
) x
On the other hand using the below code will give you an error message.
select sum(count(id)) as 'count'
from table
Cannot perform an aggregate function on an expression containing an
aggregate or a subquery