Subquery with multiple averages - sql

I've been struggling with a problem that goes something like this, "Find all employees with a salary greater than the average salary of their department." My sql subquery lumps all the departments salaries together to make one average salary but I need a way to get the each individual department's average salary.
My sql statement looks like this.
SELECT EmployeeName
FROM dbo.EMP
WHERE Salary > (
SELECT AVG(Salary)
FROM dbo.EMP
)
GROUP BY DeptNo

here is quick variant:
select EmployeeName
from
dbo.EMP as a
inner join
(
SELECT DeptNo, AVG(Salary) as avgSalary
FROM dbo.EMP
GROUP BY DeptNo
) as b
on (a.DeptNo=b.DeptNo and a.Salary > b.avgSalary)

You can just finish off that subquery to make it a correlated subquery:
SELECT EmployeeName
FROM dbo.EMP as t1
WHERE Salary > (
SELECT AVG(Salary)
FROM dbo.EMP
WHERE dbo.EMP.DeptNo = t1.DeptNo
)
Alternatively you could use Window Functions:
SELECT
EmployeeName,
CASE WHEN Salary > AVG(Salary) OVER (PARTITION BY DeptNo) Then 'X' END as [HigherThanAverage]
FROM dbo.EMP
That will give you all employees and an indicator if their salary is higher than their department's average, which you could filter out later on. I figured I'd stick this in here since it gives you some options as the scale of your query grows.

Related

Max Salary with a single GroupBy without Joins

Schema for EMPLOYEE
(ID, EMPLOYEENAME, SALARY, ORGANIZATIONID)
Query to Solve: Find employee Names in each organization with Maximum Salary without a Join.
SELECT E.*
FROM EMPLOYEE E,
(SELECT EMP.ORGANIZATIONID, MAX(EMP.SALARY)
FROM EMPLOYEE EMP
GROUP BY EMP.ORGANIZATIONID) MAXSALARY
WHERE MAXSALARY.SALARY =E.SALARY
AND E.ORGANIZATIONID=EMP.ORGANIZATIONID ;
Is there a way to avoid the join? I am using Spark SQL API and joins cause an extra shuffle operation which is expensive. Is there a way to get the employee name while getting the max salary?
Assume you have a single employee in each organization having the max salary
You can use PARTITION BY with Spark SQL as shown below (Although it will require a subquery)
SELECT E.*
FROM
(SELECT EMP.EMPLOYEENAME, EMP.ORGANIZATIONID, EMP.SALARY,
row_number() OVER (PARTITION BY ORGANIZATIONID ORDER BY SALARY DESC) as rank
FROM EMPLOYEE EMP
) AS E
WHERE E.rank=1
Try this:
SELECT P.ORGANIZATIONID, P.EMPLOYEENAME
FROM EMPLOYEE P
WHERE P.SALARY = (SELECT MAX(E.SALARY) FROM EMPLOYEE E WHERE P.ORGANIZATIONID = E.ORGANIZATIONID)
GROUP BY P.ORGANIZATIONID, P.EMPLOYEENAME
Try this:
SELECT EMPLOYEENAME FROM EMPLOYEE
WHERE SALARY IN (SELECT MAX(SALARY) FROM EMPLOYEE GROUP BY ORGANIZATIONID)

SQL combining two queries inside the same table without a subquery

A table, Employee has columns EmployeeID, Salary.
How do I find the EmployeeIDs which have a salary greater than the average salary?
Using Subqueries:
SELECT EmployeeID
FROM Employee
WHERE Salary > (SELECT AVG(Salary)
FROM Employee);
Is it possible using joins?
Is any other method possible?
It seems really silly to me to write it this way, but here it is without any subqueries:
SELECT a.EmployeeID
FROM Employee a
CROSS JOIN Employee b
GROUP BY a.EmployeeID, a.Salary
HAVING a.Salary > AVG(b.Salary)
SELECT EmployeeID
FROM Employee, (SELECT AVG(Salary) avg_savary
FROM Employee) sal
WHERE Salary > sal.avg_savary;
Move the average salary calculation to FROM to calculate it once. BTW most of moder DB can optimize your query to calculate it once.
SELECT * from
(SELECT EmployeeID,salary AVG(salary) OVER() avg_salary FROM Employee)
WHERE Salary >avg_salary
SELECT e.EmployeeID FROM Employee e
JOIN
(
SELECT avg(Salary) as Salary FROM Employee
) e1
ON e.Salary > e1.Salary
Declare #AverageSalary as Money
Select #AverageSalary = AVG(Salary) From Employee
Select * from Employee Where Salary > #AverageSalary

What's wrong with my simple SQL query?

What is wrong with this SQL query?
SELECT
department_id, MAX(AVG(SALARY))
FROM
EMPLOYEES
GROUP BY
department_id;
It shows not a single-group group function
2 Aggregate functions in one Query can not be done, you should use a Subquery to achieve your result.
I've not possibility to test it right now so no guarantees on this query but you may get an idea.
select max (avg_salary)
from (select department_id, avg(SALARY) AS avg_salary
from EMPLOYEES
group by department_id);
The inner query selects deparment_id and average salary.
Avarage salary is selected using the alias avg_salary using the AS statement.
The outer query selects the maximum of avg_salary-
That's maybe not a complete solution to your problem and as I said, not tested so no guarantees, but you should have an idea now how to start. ;-)
You cant have more than one aggregate functions in one query. try this one
select dept, max(average) over (partition by dept)
from (SELECT department_id dept,
(AVG(SALary) OVER (PARTITION BY department_id)) average
FROM employees);
Alternative 1, double GROUP BY:
SELECT department_id, AVG(SALARY)
FROM EMPLOYEES
GROUP BY department_id
HAVING AVG(SALARY) = (select max(avg_sal)
from (select avg(salary) as avg_sal
from EMPLOYEES
group by department_id))
Will return both department_id's if there's a tie!
Alternative 2, use a cte (common table expression):
with
(
SELECT department_id, AVG(SALARY) as avg_sal
FROM EMPLOYEES
GROUP BY department_id
) as cte
select department_id, avg_sal
from cte
where avg_sal = (select max(avg_sal) from cte)
This too will return both department_id's if there's a tie!

Employee that has a higher salary than the AVERAGE of his department - optimized

We have only a table named EMPLOYEESALARY in our database with the 3 following columns:
Employee_ID, Employee_Salary, Department_ID
Now I have to SELECT every employee that has a higher salary than the AVERAGE of his department. How do I do that?
I know this is a repeat question but the best solution I found everywere was:
SELECT * from employee join (SELECT AVG(employee_salary) as sal, department_ID
FROM employee GROUP BY Department_ID) as t1
ON employee.department_ID = t1.department_ID
where employee.employee_salary > t1.sal
Can we optimize it further and do it without a subquery?
Reference:
SELECT every employee that has a higher salary than the AVERAGE of his department
Employees with higher salary than their department average?
Find Schema here, to test: SQL Fiddle
Can we do it without a subquery?
Not that I can think of. Had the condition been >= then the following would have worked
SELECT TOP 1 WITH TIES *
FROM employee
ORDER BY CASE
WHEN employee_salary >= AVG(employee_salary)
OVER (
PARTITION BY Department_ID) THEN 0
ELSE 1
END
But this is not an optimisation and it won't work correctly for the > condition if no employee has a salary greater than the average anyway (i.e. all employees in a department had the same salary)
Can we optimize it further?
You could shorten the syntax a bit with
WITH T AS
(
SELECT *,
AVG(employee_salary) OVER (PARTITION BY Department_ID) AS sal
FROM employee
)
SELECT *
FROM T
WHERE employee_salary > sal
but it still has to do much the same work.
Assuming suitable indexes on the base table already exist then the only way of avoiding some more of that work at SELECT time would be to pre-calculate the grouped SUM and COUNT_BIG in an indexed view grouped by Department_ID (to allow the average to be cheaply derived) .
A more optimal form is likely to be:
select e.*
from (select e.*, avg(employee_salary) over (partition by department_id) as avgs
from employee e
) e
where employee_salary > avgs;
This (as well as other versions) can use an index on employee(department_id, employee_salary). The final where probably should not use an index, because it is selecting lots of rows.

How can I get all employees with a salary less than the average salary?

I can get the count of employees and avg salary but when I try to get the the addition select of listing the number of employees paid below the average it fails.
select count(employee_id),avg(salary)
from employees
Where salary < avg(salary);
select count(*), (select avg(salary) from employees)
from employees
where salary < (select avg(salary) from employees);
The problem is that AVG is an aggregation function. SQL is not smart enough to figure out how to mix aggregated results within the rows. The traditional way is to use a join:
select count(*), avg(e.salary),
sum(case when e.salary < const.AvgSalary then 1 else 0 end) as NumBelowAverage
from employees e cross join
(select avg(salary) as AvgSalary from employees) as const
select TotalNumberOfEmployees,
AverageSalary,
count(e.employee_id) NumberOfEmployeesBelowAverageSalary
from (
select count(employee_id) TotalNumberOfEmployees,
avg(salary) AverageSalary
from employees
) preagg
left join employees e on e.salary < preagg.AverageSalary
group by TotalNumberOfEmployees,
AverageSalary
Note: I used a LEFT join so if you had 3 equal employees, it would show 0 instead of no results (nobody below below average).
It isn't clear which columns you want in your result set, which makes it difficult to answer your question. Making the question clear improves the quality of the answers.
You seem to want 3 facts:
Number of employees.
Average salary.
Number of employees earning less than the average salary.
And you show a query which does the job for the first two facts:
SELECT COUNT(*) AS NumberOfEmployees,
AVG(Salary) AS AverageSalary
FROM Employees
What's the difference between COUNT(*) and COUNT(Employee_ID)? The difference is that the latter only counts the rows where there is a non-NULL value in the Employee_ID column. A good optimizer will recognize that Employee_ID is a primary key and contains no NULL values, and the query will be the same. But COUNT(*) is more conventional and less reliant on the optimizer.
The other statistic can be generated as a simple value in the select-list via a sub-query:
SELECT COUNT(*) AS NumberOfEmployees,
AVG(Salary) AS AverageSalary,
(SELECT COUNT(*)
FROM Employees
WHERE Salary < (SELECT AVG(Salary) FROM Employees)
) AS NumberOfEmployeesPaidSubAverageWages
FROM Employees
Under many circumstances, it would not be appropriate to write the sub-query like that, but for the interpretation of the specified query, it is fine.
select * from <table name> where salary < (select avg(<salary column name) from <table name>);
Example:
select * from EMPLOYEE where sal < (select avg(emp_sal) from EMPLOYEE);
SELECT e.ename,e.deptno,e.sal,d.avg
FROM emp e,(SELECT deptno, avg(sal) avg
FROM emp
GROUP BY deptno) d
WHERE e.deptno=d.deptno
AND
e.sal < d.avg