Query to find No. of employees earning maximum salary

Query to find No. of employees earning maximum salary - sql

I am trying to find the number of employees in a table that earn exactly the maximum salary of all the employees in the table called tblPerson.
Select Max(x.[No of Employees]) as Number, x.Salary as Salary
from
(
Select Count(Id) as [No of Employees], Salary
from tblPerson
Group by Salary
Having Salary = MAX(Salary)
)x
where x.[No of Employees]=3
Now I know this is a kind of long and complex way of doing it, but I was trying to do it using a derived table. But I am getting the error:
"Column 'x.Salary' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
My question is, why am I getting this particular error since the main query is a simple Select statement with a where clause. Isn't it??

Mainly, aggregate functions work only with other aggregate functions or grouped by columns.
Why? Because an aggregate function needs to know the set of values to do calculation with.
In you case, the max() will want to use all the data available for the calculation and display a single result (single row) and the other column will want to be displayed row by row. So there's a conflict.

Thank you all. Every answer helped me. However, I think I found a pretty simple way to do it:
Select top 1 Count(Id) as [No of Employees], salary
from tblPerson
Group by Salary
Order by [No of Employees] DESC

select count(*) from tblPerson where salary=(select max(salary) from tblPerson)

You get the error because 'max' is an aggregation, while you have nothing to aggregate the number by.
Select Max(x.[No of Employees]) as Number, x.Salary as Salary
from
(
Select Count(Id) as [No of Employees], Salary
from tblPerson
Group by Salary
Having Salary = MAX(Salary)
)x
---------
Group by Salary -- all other items in your select statement
---------
where x.[No of Employees]=3
however, you can also use a temporary table or variable to find the persons.
To solve this via a variable, you could do the following
declare #maxSalary Decimal
set #maxSalary = (Select max(salary) from tblperson) --insert the max value into a variable
Then either aggregate the persons (or do some other logic):
Select ID from tblperson where salary = #maxSalary
The reason for not using a group by is that using a variable is more efficient, as you search the table instead of aggregating over it.

Create a CTE (RESULT), and using DENSE_RANK function, get the highest salary, together with the EmployeeID's.
The first row of the RESULT table will give the highest salary.
Using the aggregate function COUNT, get the number of Employees with the highest Salary.
with RESULT (EmployeeID, Salary, DenseRank) as
(select EmployeeID, Salary,
DENSE_RANK() over (ORDER BY SALARY DESC) AS DenseRank
from Employee)
select TOP 1 Salary,
(select COUNT(EmployeeID)
from Employee
where Salary = (select TOP 1 Salary)
from RESULT
where DenseRank = 1)
)
from RESULT
where DenseRank = 1;

Related

group by needed for window function?

I have this schema for employee table:
id int
first_name varchar
last_name varchar
department_id int
department_name varchar
position varchar
I want to rank departments by size. This works:
select
department_id d_id,
rank() over (order by count(*) desc) r
from employees
group by department_id
What I don't understand is why group by is required. If I remove it I get this error:
column "employee.department_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 2: department_id d_id,

This query:
select department_id d_id,
count(*) number_of_employees
from az_employees
group by department_id
returns 1 row for each department with the number of employees in the department.
Your query uses RANK() window function to rank the departments based on the results of the aggregate function count(*):
rank() over (order by count(*) desc) r
RANK() operates on the results of the aggregate query (1 row for each department with 2 columns: department_id and count(*)) and returns 1 more column for each department.
It would be the same as if you used the aggregate query as a subquery:
select d_id, rank() over (order by number_of_employees desc) r
from (
select department_id d_id,
count(*) number_of_employees
from az_employees
group by department_id
) t
but your query is simpler.

This is sort of an answer to another question but might be what you really want.
select
department_id d_id,
count(*) count // this line can be removed
from az_employees
group by department_id
order by count(*)
having count(*) > 0 // this line can be removed, but if you for instance change zero to one one departments with more than one employee will be shown.
as previous answer explained, group by is used to aggregate. So no reason to go into details
While I don't know this scenario it's generally a bad habit to remove original data unless there is some gain from it, and as rank is usually only used to sort by later on and that can be done just as easily with the original value.
But there is of-course situations when rank is a good solution, but I don't think this is one of those.

Why doesn't DISTINCT work in this case? (SQL)

SELECT DISTINCT
employees.departmentname,
employees.firstname,
employees.salary,
employees.departmentid
FROM employees
JOIN (
SELECT MAX(salary) AS Highest, departmentID
FROM employees
GROUP BY departmentID
) departments ON employees.departmentid = departments.departmentid
AND employees.salary = departments.highest;
Why doesn't the DISTINCT work here?
I'm trying to have each department to show only once because the question is asking the highest salary in each department.

Use the ROW_NUMBER() function, as in:
select departmentname, firstname, salary, departmentid
from (
select e.*,
row_number() over(partition by departmentid, order by salary desc) as rn
from employees e
) x
where rn = 1

I'm trying to have each department to show only once because the question is asking the highest salary in each department.
Use window functions:
SELECT e.*
FROM (SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY departmentID ORDER BY salary DESC) as seqnum
FROM employees e
) e
WHERE seqnum = 1;
This is guaranteed to return one row per department, even when there are ties. If you want all rows when there are ties, use RANK() instead.

Why doesn't the DISTINCT work here?
DISTINCT is not a function; it is a keyword that will eliminate duplicate rows when ALL the column values are duplicates. It does NOT apply to a single column.
The DISTINCT keyword has "worked" (i.e. done what it is intended to do) because there are no rows where all the column values are a duplicate of another row's values.
However, it hasn't solved your problem because DISTINCT is not the correct solution to your problem. For that, you want to "fetch the row which has the max value for a column [within each group]" (as per this question).

Gwen, Elena and Paula all have the same salary
and they are in the same department

Row_number function using directly

Is there any direct way of using row_number() function? I want to find 2 nd highest salary
SELECT DISTINCT id
,salary
,depid
,ROW_NUMBER() OVER (
PARTITION BY depid ORDER BY salary DESC
) AS rownum
FROM emp
WHERE rownum = 2;
It gives an error, However the below code works fine.
SELECT *
FROM (
SELECT DISTINCT id
,salary
,depid
,ROW_NUMBER() OVER (
PARTITION BY depid ORDER BY salary DESC
) AS rownum
FROM emp
) AS t
WHERE t.rownum = 2;
Is any way of directly using the row_number() function as in the first option which is giving the error?

You can not use the alias name of the same query as the condition for the where clause. You also can not use windowed queries as a passing condition in the where clause.
Here is a detailed explanation Why no windowed functions in where clauses?. It is so you need another query outside the inner query and needs to write sub-query.
You can get the Nth highest salary in SQL Server from the below query.
SELECT TOP 1 salary
FROM (
SELECT DISTINCT TOP N salary
FROM <YourTableNameHere>
ORDER BY salary DESC
) AS TEMP
ORDER BY salary

This query will give you the second highest salary ? No
SELECT id
,salary
,depid
from emp
ORDER BY salary DESC
OFFSET 1 ROWS
FETCH FIRST 1 ROWS ONLY;
Well actually, it will give you the salary that is on the second position when you order the salary's from highest to lowest... So if the highest is 100 and the second highest is 100 then you will get 100 as a result. To conclude this will return a row on the second place depending on the order by clause...
This next query will give you the second highest salary :
SELECT max(id)
, salary
, max(depid)
from emp
group by salary
ORDER BY salary DESC
OFFSET 1 ROWS
FETCH FIRST 1 ROWS ONLY;
But be aware, in case you have two employees from two different departments with the same salary then it will return you the one with the higher id and it will return the higher department id which can be incorrect.
And finally this will give you one employee that has a second largest salary with correct data:
SELECT id
, salary
, depid
from emp
where id = (SELECT max(id)
from emp
group by depid, salary
ORDER BY salary DESC
OFFSET 1 ROWS
FETCH FIRST 1 ROWS ONLY);

First, you want dense_rank(), not row_number() if you want the second highest value -- ties might get in the way otherwise.
You can use an arithmetic trick:
SELECT TOP (1) WITH TIES id, salary, depid
FROM emp
ORDER BY ABS(DENSE_RANK() over (PARTITION BY depid ORDER BY salary DESC) - 2)
The "-2" is an arithmetic trick to put the "second" values highest.
That said, I would stick with the subquery because the intent in clearer.

You could use a variation on the trick that uses a TOP 1 WITH TIES in combination with an ORDER BY ROW_NUMBER
SELECT TOP 1 WITH TIES
id,
salary,
depid
FROM emp
ORDER BY IIF(2 = ROW_NUMBER() OVER (PARTITION BY depid ORDER BY salary DESC), 1, 2)
But this trick does have the disadvantage that you can't sort it by something else.
Well, not unless you wrap it in a sub-query and sort the outer query.
A test on rextester here

I prefer to use dense_rank() instead of row_number() function with CTE (common table expression) for the scenario you have mentioned. CTE is modern, easy to use and have many cool features like it is memory resident, it can be used for DUI operations, it make code easy to understand etc.
To find Nth highest salary, the CTE look like
;with findnthsalary
as
(
select empid, deptid, salary,
dense_rank() over(partition by deptid order by salary desc) salrank
from
Employee
)
select distinct id, deptid, salary
from findnthsalary
where salrank = N
I used dense_rank() because if you use row_number() it will produce the wrong result in case multiple employees have the same salary in the same department.

Using MAX() without grouping by

Im using Oracle-Apex
I have a table with names and salaries. I want to get the name with the highest salary using MAX(salary).
So the query is like this:
SELECT NAME FROM EMPLOYEE
GROUP BY NAME
HAVING MAX(SALARY) = SALARY;
This doesnt work, error ORA-00979: not a GROUP BY expression, appears. So i use this to stop that error:
SELECT NAME FROM EMPLOYEE
GROUP BY NAME, SALARY
HAVING MAX(SALARY) = SALARY;
And it groups every different salary in one row and returns the max salary in each row, since every salary is different, it returns every row.
How do i group everything in one single big group without modyfing the tables? I mean i want this to work:
SELECT NAME FROM EMPLOYEE
WHERE MAX(SALARY) = SALARY;
But with having. Its simple really, but i cant find the way.

use subquery
SELECT NAME FROM EMPLOYEE
where salary = (select max( salary) from EMPLOYEE)

You need a WHERE clause and compare the salary with the max salary of the table:
SELECT NAME FROM EMPLOYEE
WHERE SALARY = (SELECT MAX(SALARY) FROM EMPLOYEE);

You can get the name of person who has max salary using following nested query.
select NAME
from EMPLOYEE
where SALARY= ( select max(SALARY)
from EMPLOYEE )

Select records with maximum value

I have a table that is called: customers.
I'm trying to get the name and the salary of the people who have the maximum salary.
So I have tried this:
SELECT name, salary AS MaxSalary
FROM CUSTOMERS
GROUP BY salary
HAVING salary = max(salary)
Unfortunately, I got this error:
Column 'CUSTOMERS.name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I know I should add the name column to the group by clause, but I get all the records of the table.
I know that I can do it by:
SELECT name, salary
FROM CUSTOMERS
WHERE salary = (SELECT MAX(salary) FROM CUSTOMERS)
But I want to achieve it by group by and having clauses.

This requirement isn't really suited for a group by and having solution. The easiest way to do so, assuming you're using a modern-insh version of MS SQL Server, is to use the rank window function:
SELECT name, salary
FROM (SELECT name, salary, RANK() OVER (ORDER BY salary DESC) rk
FROM customers) c
WHERE rk = 1

Mureinik's answer is good with rank, but if you didn't want a windowed function for whatever reason, you can just use a CTE or a subquery.
with mxs as (
select
max(salary) max_salary
from
customers
)
select
name
,salary
from
customers cst
join mxs on mxs.max_salary = cst.salary

There was no need to use group by and having clause there, you know. But if you want to use them then query should be
SELECT name, salary
FROM CUSTOMERS
GROUP BY salary
having salary = (select max(salary) from CUSTOMERS)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Query to find No. of employees earning maximum salary - sql

Thank you all. Every answer helped me. However, I think I found a pretty simple way to do it: Select top 1 Count(Id) as [No of Employees], salary from tblPerson Group by Salary Order by [No of Employees] DESC

select count(*) from tblPerson where salary=(select max(salary) from tblPerson)

Related

group by needed for window function?

Why doesn't DISTINCT work in this case? (SQL)

Row_number function using directly

Using MAX() without grouping by

Select records with maximum value

Categories

Resources