SQL: aggregate function and group by - sql

Consider the Oracle emp table. I'd like to get the employees with the top salary with department = 20 and job = clerk. Also assume that there is no "empno" column, and that the primary key involves a number of columns. You can do this with:
select * from scott.emp
where deptno = 20 and job = 'CLERK'
and sal = (select max(sal) from scott.emp
where deptno = 20 and job = 'CLERK')
This works, but I have to duplicate the test deptno = 20 and job = 'CLERK', which I would like to avoid. Is there a more elegant way to write this, maybe using a group by? BTW, if this matters, I am using Oracle.

The following is slightly over-engineered, but is a good SQL pattern for "top x" queries.
SELECT
*
FROM
scott.emp
WHERE
(deptno,job,sal) IN
(SELECT
deptno,
job,
max(sal)
FROM
scott.emp
WHERE
deptno = 20
and job = 'CLERK'
GROUP BY
deptno,
job
)
Also note that this will work in Oracle and Postgress (i think) but not MS SQL. For something similar in MS SQL see question SQL Query to get latest price

If I was certain of the targeted database I'd go with Mark Nold's solution, but if you ever want some dialect agnostic SQL*, try
SELECT *
FROM scott.emp e
WHERE e.deptno = 20
AND e.job = 'CLERK'
AND e.sal = (
SELECT MAX(e2.sal)
FROM scott.emp e2
WHERE e.deptno = e2.deptno
AND e.job = e2.job
)
*I believe this should work everywhere, but I don't have the environments to test it.

In Oracle I'd do it with an analytical function, so you'd only query the emp table once :
SELECT *
FROM (SELECT e.*, MAX (sal) OVER () AS max_sal
FROM scott.emp e
WHERE deptno = 20
AND job = 'CLERK')
WHERE sal = max_sal
It's simpler, easier to read and more efficient.
If you want to modify it to list list this information for all departments, then you'll need to use the "PARTITION BY" clause in OVER:
SELECT *
FROM (SELECT e.*, MAX (sal) OVER (PARTITION BY deptno) AS max_sal
FROM scott.emp e
WHERE job = 'CLERK')
WHERE sal = max_sal
ORDER BY deptno

That's great! I didn't know you could do a comparison of (x, y, z) with the result of a SELECT statement. This works great with Oracle.
As a side-note for other readers, the above query is missing a "=" after "(deptno,job,sal)". Maybe the Stack Overflow formatter ate it (?).
Again, thanks Mark.

In Oracle you can also use the EXISTS statement, which in some cases is faster.
For example...
SELECT name, number
FROM cust
WHERE cust IN
( SELECT cust_id FROM big_table )
AND entered > SYSDATE -1
would be slow.
but
SELECT name, number
FROM cust c
WHERE EXISTS
( SELECT cust_id FROM big_table WHERE cust_id=c.cust_id )
AND entered > SYSDATE -1
would be very fast with proper indexing. You can also use this with multiple parameters.

There are many solutions. You could also keep your original query layout by simply adding table aliases and joining on the column names, you would still only have DEPTNO = 20 and JOB = 'CLERK' in the query once.
SELECT
*
FROM
scott.emp emptbl
WHERE
emptbl.DEPTNO = 20
AND emptbl.JOB = 'CLERK'
AND emptbl.SAL =
(
select
max(salmax.SAL)
from
scott.emp salmax
where
salmax.DEPTNO = emptbl.DEPTNO
AND salmax.JOB = emptbl.JOB
)
It could also be noted that the key word "ALL" can be used for these types of queries which would allow you to remove the "MAX" function.
SELECT
*
FROM
scott.emp emptbl
WHERE
emptbl.DEPTNO = 20
AND emptbl.JOB = 'CLERK'
AND emptbl.SAL >= ALL
(
select
salmax.SAL
from
scott.emp salmax
where
salmax.DEPTNO = emptbl.DEPTNO
AND salmax.JOB = emptbl.JOB
)
I hope that helps and makes sense.

Related

GROUP BY - inline view and unjoined tables

I have the following ORACLE query where I attempt to find the department with the highest average salary. I would like to use in-line view (i.e. retain the b dataset) for this implementation, but struggle to get the right part at the WHERE and GROUP BY components. I know the below GROUP BY and WHERE (which is non-existant) is wrong. But how do i correct them?
select a.deptno from emp a,
(select max(avg_sal) max_avg_sal from (select
avg(sal) avg_sal from emp group by deptno) ) b
group by a.deptno, b.max_avg_sal
having avg(a.sal) = b.max_avg_sal
Expected Result
deptno
10
Emp Structure
deptno staff sal
10 A 1000
10 B 1500
11 C 1100
12 D 1000
12 E 900
12 F 1000
Is this what you want?
select e.*
from (select e.*, avg(e.salary) over (partition by e.deptno) as avg_salary
from emp e
) e
order by avg_salary desc
fetch first 1 row only;
fetch first is available in Oracle 12c+. You can do similar things with an additional subquery in earlier versions.
You can use subquery
select deptno from tablename
group by deptno
having avg(sal)= (select max(asal) from (select avg(sal) as asal from tablename group by deptdno)A)
The straight-forward way is:
select deptno
from emp
group by deptno
order by avg(salary) desc
fetch first row with ties;
FETCH FIRST is available as of Oracle 12c.
In Oracle 11g we could use this instead:
select deptno
from
(
select deptno, avg(salary) as avg_salary, max(avg(salary)) over () as max_avg_salary
from emp
group by deptno
)
where avg_salary = max_avg_salary;
But you want an inline view, another word for a derived table (a subquery in the from clause). That looks way more clumsy. One example without FETCH FIRST and without window functions:
with d as
(
select deptno, avg(salary) as avg_salary
from emp
group by deptno
)
, dmax as
(
select max(avg_salary) as max_avg_salary
from d
)
select d.*
from d
join dmax on dmax.max_avg_salary = d.avg_salary;
I find this very obfuscated and don't recommend it at all. You can do the same without WITH clauses of course. Then it is even less readable.
I don't know why you'd want to write it this way, but if you really want only inline views and no windowing clauses, you can write it this way:
select b.deptno
from (SELECT deptno, avg(sal) avgsal from emp group by deptno ) b
cross join (SELECT max(avgsal) maxavgsal FROM (SELECT avg(sal) avgsal FROM emp group by deptno )) c
where b.avgsal = c.maxavgsal;
This the same thing, if you don't like CROSS JOIN for some reason:
select b.deptno
from (SELECT deptno, avg(sal) avgsal from emp group by deptno ) b
inner join ( SELECT max(avgsal) maxavgsal FROM
( SELECT avg(sal) avgsal FROM emp group by deptno ) ) c
on b.avgsal = c.maxavgsal;

Oracle query optimization on 100 million table

Below is the SMAPLE query. How can I write this optimally?? It is scanning around 80 million records.. It runs hours together without any results
select * FROM (SELECT organization, ROW_NUMBER() OVER (PARTITION BY empno, sal ORDER BY deptno DESC) AS row_num
FROM emp) x
Where
x.row_num=1
And
x.organization !=3
Beloe are all I tried, which didn’t help at all..
UNIQUE Index on deptno
COMPOSITE INDEX ON (empno, sal)
Additional NON-UNIQUE index on Organization column
I tried re-writing the inequality condition as x.organization< 3 or x.organization > 3 with no luck
Nothing is helping to produce results.. they query just runs for hours with NO RESULTS
please advice
For this query:
SELECT *
FROM (SELECT empname,
ROW_NUMBER() OVER (PARTITION BY empno, sal ORDER BY deptno DESC) AS row_num
FROM emp
) x
Where x.row_num = 1;
the optimal index is emp(empno, sal, deptno desc).
It might be faster to try a correlated subquery:
select e.*
from emp e
where e.deptno = (select max(e2.deptno) from emp e2 where e2.empno = e.empno and e2.sal = e.sal);
I think you need a composite index on empno, sal deptno.
This will speed things up as the ordering will be done and the DB will not have to do the sorting
But I can be wrong - The explain plan will help

Stuck with my query

Would really appreciate if someone could help me with my query.
I need it to show the difference between max salary and salary of others.
select e.ename, f.sal, e.sal
from emp e , (select max(sal) as "sal" from emp) f
where 1=1 ;
Where am I making the mistake?
Thanks in advance!
select
name
, sal
, max_sal - sal as diff
from (
select
e.ename
, e.sal
, max(salary) over() as max_sal
from emp e
) d
I suggest using max(salary) over() analytic (or window) function which places the maximum salary on every row of the subquery, then it is a simple subtraction to calculate the difference. No group by clause is required in this form of aggregate.
select e.ename,
e.sal sal_of_others,
(select max(sal) from emp where ename = e.ename ) - e.sal sal_diff
from emp e
No need to use where condition unnecessarily and also you can use subquery as shown below
SELECT e.ename, e.sal, (select max(sal) from emp) - E.Sal as difference
FROM emp e
You can use an analytic function to find the maximum salary - this negates the need to query the table twice:
select ename,
sal,
max(sal) over () max_overall_sal,
max(sal) over (partition by deptno) max_sal_per_dept
from emp;
I've given you two ways to find the max salary - one across the whole data set (over ()) and one per department (over (partition by deptno)) since I wasn't sure which you wanted.
If you are unfamiliar with analytic functions, I highly recommend you look them up. In their basic form, they're similar to how aggregate functions work, except instead of collapsing the rows, they report the value for each row. The partition by clause groups the data in the same way as the group by does in an aggregate query.
use the below query
select ename,sal,((select max(sal) from emp)-sal) as saldiff from emp

Selecting based on condition having multiple options

I'm totally confused about using aggregate functions after where clause or anywhere after mentioning the table_name
EMP Table as posted on http://viditkothari.co.in/post/27045365558/sql-commands-1
Query Info:
Display all the emp who have sal equal to any of the emp of deptno 30
Suggested query:
select *
from employee_4521
where sal having (select sal
from employee_4521
where deptno = 30);
Returns following error:
ERROR at line 1:
ORA-00920: invalid relational operator
with an asterik marked under 'h' of having clause
There doesn't appear to be any reason to use an aggregate function here. Just use an IN or an EXISTS
select *
from employee_4521
where sal in (select sal
from employee_4521
where deptno=30);
or
select *
from employee_4521 a
where exists( select 1
from employee_4521 b
where b.deptno = 30
and a.sal = b.sal );

why "ANY" isn't working properly?

I'm learning SQL using Oracle 10g. I need a query that returns the department with the most employees to use it in a update sentence. I already solved it, but I couldn't figure out why this query won't work:
select deptno
from (select deptno,
count(*) num
from emp
group by deptno)
where not num < any(select count(deptno)
from emp
group by deptno)
It puzzles me more since according to the documentation it should be equivalent and optimized into the following:
select deptno
from (select deptno,
count(*) num
from emp
group by deptno )
where not exists( select deptno,
count(*)
from emp
having count(*) > num
group by deptno)
That one works without errors. The following also work:
select deptno
from (select deptno,
count(*) num
from emp
group by deptno)
where num = (select max(alias)
from (select count(deptno) alias
from emp
group by deptno))
select deptno
from emp
group by deptno
having not count(deptno) < any( select count(deptno)
from emp
group by deptno)
Edit. Probably it'll help if I post the return values of the inner selects.
The first select returns:
Dept. Number Employees
30 6
20 5
10 3
The last one returns (3,5,6)
I checked them individually. It's also weird that if I put the values manually it works as expected and will return 30 as the department with most employees.
select deptno
from (select deptno,
count(*) num
from emp
group by deptno)
where not num < any(6,5,3)
I'm using Oracle 10g 10.2.0.1.0
Last edit, probably. Still don't know what's happening, but the behaviour is as if the last select is returning null somehow. So, even if I remove the ´not´, it still doesn't select anything.
If someone is interested I also found this useful:
TSQL - SOME | ANY why are they same with different names?
Read the first answer. It's probably better to avoid the use of any/some, all.
Here's a similar example which may clarify things (Standard SQL, can be easily transformed for Oracle):
WITH T
AS
(
SELECT *
FROM (
VALUES (0),
(1),
(2),
(NULL)
) AS T (c)
)
SELECT DISTINCT c
FROM T
WHERE 1 > ALL (SELECT c FROM T T2);
This returns the empty set, which is reasonable: given the presence of the null in the table, 1 > NULL is UNKNOWN, therefore it is not known whether the value 1 is greater than all values in the set.
However, adding the NOT operator:
WHERE NOT 1 > ALL (SELECT c FROM T T2);
returns all values in the set, including the null value. At first glance this seems wrong: given that 1 > 2 is FALSE we can say with certainty that the value 1 is not greater than all values in the set, regardless of the null.
However, in this case the NOT is simply flipping the earlier result i.e. the opposite of all no rows is all rows! ;)
Further consider the negated comparison using a column (rather than the literal value 1):
WHERE NOT c > ALL (SELECT c FROM T T2);
This time it returns all rows except for the null value.
Correction (update)
not num < any(select ...)
should be the same as your other queries. You can also try this variation:
num >= ALL(select ...)
but I can't understand why yours is giving wrong results. Perhaps because of the not precedence. Can you trythis instead?:
not ( num < ANY(select ...) )
Full queries:
select deptno
from (select deptno, count(*) num from emp group by deptno)
where num >= all(select count(deptno) from emp group by deptno)
and:
select deptno
from (select deptno, count(*) num from emp group by deptno)
where not ( num < any(select count(deptno) from emp group by deptno) )