Suppose I have an employee table with their salaries.
What is the difference between:
SELECT AVG(salary)
FROM employee;
and
SELECT AVG(ALL salary)
FROM employee;
What does ALL do? Both cases give the same result.
According to the documentation they are exactly the same regardless of the aggregation function:
The first form of aggregate expression invokes the aggregate once for each input row. The second form is the same as the first, since ALL is the default.
Related
SELECT
"employees"."FIRST_NAME",
"employees"."LAST_NAME",
"employees"."SALARY"
FROM
'employees'
WHERE
(("employees"."SALARY" > (SELECT "employees"."SALARY"
FROM 'employees'
WHERE (("employees"."FIRST_NAME" = "Alexander")))))`
The subquery returns 2 values. How can they be compared with "employees"."salary" ? That is, there are 2 employees with first name "Alexander"... Replacing the subquery with a 2 element tuple gives a query that is not accepted from the SQL client... That is this query should be equivalent to the one above but it does not execute correctly:
SELECT
"employees"."FIRST_NAME", "employees"."LAST_NAME",
"employees"."SALARY"
FROM
'employees'
WHERE
(("employees"."SALARY" > (3500, 9000)))
What is going on?
From SQL Language Expressions/11. Subquery Expressions:
A SELECT statement enclosed in parentheses is a subquery. All types of
SELECT statement, including aggregate and compound SELECT queries
(queries with keywords like UNION or EXCEPT) are allowed as scalar
subqueries. The value of a subquery expression is the first row of
the result from the enclosed SELECT statement. The value of a
subquery expression is NULL if the enclosed SELECT statement returns
no rows.
Your query would not run in any other database than SQLite.
But, SQLite as you can see from the documentation, instead of throwing an error like the subquery returns more than 1 rows, allows the subquery by keeping only the 1st of the rows that it returns.
This is one of SQLite's non-standard sql features and in your case it leads to wrong results.
What you would want, I believe, is to compare each employee's salary to the max salary of all employees named 'Alexander'.
You can do this by changing the subquery to:
SELECT MAX(SALARY) FROM employees WHERE FIRST_NAME = 'Alexander'
This is a not correlated scalar subquery, so there is no need for any aliases.
Note assuming that the SQLite tag is correct , i.e. there are many flavours of SQL and that the database manager being is used is therefore important and relevant.
the subquery returns 2 values. How can they be compared with "employees"."salary" ?
You compare multiple values using a function that can take multiple values such as max, which could be what you require.
e.g.
SELECT "employees"."FIRST_NAME", "employees"."LAST_NAME", "employees"."SALARY" FROM 'employees' WHERE (("employees"."SALARY" > max(3500, 9000)))
What is going on?
The first is using a WHERE clause that is a valid expression that is either true or false. The second is misusing values i.e. a list of values is provided where a single value is expected.
First, writing your queries, you should not have to "quote" every part, it gets cluttered and bloated. Also, you can use aliases to help readability. you'll see soon. If you use quotes, use the single quotes around specific values such as a date like > '2022-02-22'.
Now on to your query. Your query is looking for salaries greater than a given person (Alexander), but there are multiple people by that name. To get ONE answer, you might need the MAX() salary for the critiera. So this essentially becomes TWO queries... one relying upon the other.
So, to get you an answer, the outer query is what you will get as the results, the WHERE query is pre-qualifying that one salary you are interested in.
Select
e.first_name,
e.last_name,
e.salary
from
employees e
where
e.salary > ( select max( e2.salary )
from employees e2
where e2.first_name = 'Alexander' )
Notice the where clause is getting whatever the MAX() salary value is from the employee table for the employee 'Alexander'. So now, that ONE value comes back as the basis for the outer query.
Notice the indentation, you can better see how the outer query is reliant on that. Also makes for more readable queries.
I have a table emptable and EMPNO is the primary key.
I run this query:
SELECT ename
FROM emp
WHERE ename = (SELECT MIN(ename) FROM emp);
and it returns this:
Since the inner subquery is returning two rows, and since I am not using IN(), shouldn't I get an error of subquery returning multiple rows, so how am I getting this output?
P.S.: Sorry for my horrible English
The subquery:
SELECT MIN(ename) FROM emp
returns only 1 row with 1 column (it is called a scalar subquery) which has the value of the minimum ename of the table.
If you had used also a GROUP BY clause like this:
SELECT MIN(ename) FROM emp GROUP BY empno
then the subquery would return 2 rows, 1 for each empno.
So subqueries are fun. Aside from using a sub query as something to select from, the general types are scalar or correlated.
Scalar subqueries can only return one value.
Correlated subqueries refer to something outside of the subquery (think EXISTS clause, as that is a kind of correlated subquery).
These two definitions are crude and not text book but explain them simply enough for now.
So your subquery is scalar and used for comparison. What makes it fit the rules for returning only one value is the MIN() function as MIN() only returns one value. If you did not use MIN(), then you would have to figure out another way to make it fit the rules of a scalar subquery. For example:
SELECT ename
FROM emp e
WHERE ename =
(
SELECT TOP 1 ename
FROM emp
WHERE emp.ename=e.ename
ORDER BY emp.ename ASC
);
This is a correlated scalar subquery, as the subquery refers to the e object from the outer query. You’ll notice I had to had TOP 1 because the sample data provided doesn’t give me any other unique columns for comparison, not doing so would caused the “you can only have one value returned from scalar subquery” error.
So to answer your question – the reason it does not error is because the sub query only returns one value. For each row of your result set, the ename equals to the MIN(ename). This is intended behavior.
I am looking for clarification on this. I am writing two queries below:
We have a table of employee name with columns ID , name , salary
1. Select name from employee
where sum(salary) > 1000 ;
2. Select name from employee
where substring_index(name,' ',1) = 'nishant' ;
Query 1 doesn't work but Query 2 does work. From my development experience, I feel the possible explanation to this is:
The sum() works on a set of values specified in the argument. Here
'salary' column is passed , so it must add up all the values of this
column. But inside where clause, the records are checked one by one ,
like first record 1 is checked for the test and so on. Thus
sum(salary) will not be computed as it needs access to all the column
values and then only it will return a value.
Query 2 works as substring_index() works on a single value and hence here it works on the value supplied to it.
Can you please validate my understanding.
The reason you can't use SUM() in the WHERE clause is the order of evaluation of clauses.
FROM tells you where to read rows from. Right as rows are read from disk to memory, they are checked for the WHERE conditions. (Actually in many cases rows that fail the WHERE clause will not even be read from disk. "Conditions" are formally known as predicates and some predicates are used - by the query execution engine - to decide which rows are read from the base tables. These are called access predicates.) As you can see, the WHERE clause is applied to each row as it is presented to the engine.
On the other hand, aggregation is done only after all rows (that verify all the predicates) have been read.
Think about this: SUM() applies ONLY to the rows that satisfy the WHERE conditions. If you put SUM() in the WHERE clause, you are asking for circular logic. Does a new row pass the WHERE clause? How would I know? If it will pass, then I must include it in the SUM, but if not, it should not be included in the SUM. So how do I even evaluate the SUM condition?
Why can't we use aggregate function in where clause
Aggregate functions work on sets of data. A WHERE clause doesn't have access to entire set, but only to the row that it is currently working on.
You can of course use HAVING clause:
select name from employee
group by name having sum(salary) > 1000;
If you must use WHERE, you can use a subquery:
select name from (
select name, sum(salary) total_salary from employee
group by name
) t where total_salary > 1000;
sum() is an aggregation function. In general, you would expect it to work with group by. Hence, your first query is missing a group by. In a group by query, having is used for filtering after the aggregation:
Select name
from employee
group by name
having sum(salary) > 1000 ;
Using having works since the query goes direct to the rows in that column while where fails since the query keep looping back and forth whenever conditions is not met.
Just have a quick question, I don't think it's possible after reading sql manual but thought to give it a try here...
Q. Can we use column number in compute expression instead of column name or alias after OF?
Something like e.g
COMPUTE SUM LABEL 'TOTAL' OF 2 ON JOB_ID;
BREAK ON JOB_ID SKIP 1;
COMPUTE SUM LABEL 'TOTAL' OF SALARY ON JOB_ID;
SELECT JOB_ID, LAST_NAME, SALARY
FROM EMP_DETAILS_VIEW
WHERE JOB_ID IN ('AC_MGR', 'SA_MAN')
ORDER BY JOB_ID, SALARY;
SQL reference manual list below
OF {expr|column|alias} ...
In the OF clause, you can refer to an expression or function reference in the SELECT statement by placing the expression or function reference in double quotes. Column names and aliases do not need quotes.
I tried following query:
SELECT
MAX(SUM(e.Empid))
FROM HR.Employees
and got following error:
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
My question is why isn't this allowed?
Each aggregate works on a group. You can only define one group per query. So multiple aggregates require subqueries. For example, to find the amount of employees in the largest department:
SELECT MAX(EmpCount)
FROM (
SELECT COUNT(*) as EmpCount
FROM HR.Employees
GROUP BY
e.Department
) as SubQueryAlias
since you have not define any columns to be grouped, The value of SUM() is equal to MAX()
UPDATE
An error was thrown because MAX(SUM(e.Empid)) requires the results of two grouped selects, not just one.
SUM(x) evaluates to a single value, so it's not appropriate to MAX the result of it.
This query makes no sense, as even if it worked, it would return only one value : the sum of Empid. The MAX function applied on one value is not really useful.
Try this
SELECT MAX(_ID)
FROM (SELECT SUM(e.Empid) _ID FROM HR.Employees e) t
OK. I got your question now. Here is why:
The value expression simply contained in set function specification
shall not contain a set function specification or a subquery. If the
value expression contains a column reference that is an outer
reference, then that outer reference shall be the only column
reference contained in the value expression.
Further reading : SQL 92 Standards
Raj