SQL - Retrieving data within groups before and after some condition - sql

With the two following tables:
EMPLOYEE (Fname, Lname, SSN, DNO)
DEPARTMENT (Dname, Dnumber)
For each department that has more than five employees, retrieve the
department name and the number of its employees who are making more
than $40,000
Here is an incorrect solution to this:
SELECT
dname,
COUNT(*)
FROM
Department, Employee
WHERE
dnumber = dno
AND salary > 40000
GROUP BY
dname
HAVING
COUNT(*) > 5;
It is clear that it would not list any department that have five or more employees unless they all have more than $40,000 salary, because where is applied before group by clause. which is not what we want.
Here is the correct solution:
SELECT
dname, COUNT(*)
FROM
Department, Employee
WHERE
dnumber = dno
AND salary > 40000
AND dno IN (SELECT dno
FROM Employee
GROUP BY dno
HAVING COUNT(*) > 5)
GROUP BY
dname
I cant see why is this correct?
Isn't it going to restrict the rows first with employees who have more than $40,000, then do the grouping just like the first query? what is different here?

Sub-Query, the basic:
First, let make this query a bit easier to read :
SELECT
dname,
COUNT(*)
FROM
Department,
Employee
WHERE
dnumber = dno
AND salary > 40000
AND dno IN (
SELECT dno
FROM Employee
GROUP BY dno
HAVING COUNT(*) > 5
)
GROUP BY dname
As you can see, there is what we call a "sub-query": a query inside the query.
This is the part in dno IN (/*HERE is the Sub-query*/).
As in mathematics parenthesis are run first, so SQL will go find DNO that have more than 5 employees, producing the following query :
SELECT
dname,
COUNT(*)
FROM
Department,
Employee
WHERE
dnumber = dno
AND salary > 40000
AND dno IN (
'dno10emp', 'dno24emp', 'dno45emp'
)
GROUP BY dname
Now, you find yourself with a simple query that will produce the result:
of department that have a least one employee with >40k$ salary
and are part of the department with more the 5 employee
What's wrong ?!
Well, I'll said your "good query" isn't that good, and that's why you're struggling: It'll not bring department if they don't have at least one employee with > 40k$.
Here is the query that'll do this :
SELECT
Department.dname,
COUNT(Employee.salary)
FROM
Department
LEFT JOIN Employee
ON Department.dnumber = Employee.dno
AND Employee.salary > 40000
WHERE
Department.dnumber IN (
SELECT Employee.dno
FROM Employee
GROUP BY Employee.dno
HAVING COUNT(*) > 5
)
GROUP BY Department.dname
This will bring you all department that have at least 6 employee, then count the number of employee with at least 40K$ (a department could have 0).
Could you show me ?
As an image worth a thousand word :
SQL Fiddle
MySQL 5.6 Schema Setup:
| dname | nb | salary |
|-------------------|----|--------|
| accounting | 2 | 30000 |
| accounting | 4 | 50000 |
| boss | 6 | 150000 |
| garbage-collector | 6 | 15000 |
Query 1:
SELECT
dname,
COUNT(*)
FROM
Department,
Employee
WHERE
dnumber = dno
AND salary > 40000
GROUP BY dname
HAVING COUNT(*) > 5
Results:
| dname | COUNT(*) |
|-------|----------|
| boss | 6 |
Query 2:
SELECT
dname,
COUNT(*)
FROM
Department,
Employee
WHERE
dnumber = dno
AND salary > 40000
AND
dno IN (
SELECT dno FROM Employee
GROUP BY dno
HAVING COUNT(*) > 5
)
GROUP BY dname
Results:
| dname | COUNT(*) |
|------------|----------|
| accounting | 4 |
| boss | 6 |
Query 3:
SELECT
Department.dname,
COUNT(Employee.salary)
FROM
Department
LEFT JOIN Employee
ON Department.dnumber = Employee.dno
AND Employee.salary > 40000
WHERE
Department.dnumber IN (
SELECT Employee.dno
FROM Employee
GROUP BY Employee.dno
HAVING COUNT(*) > 5
)
GROUP BY Department.dname
Results:
| dname | COUNT(Employee.salary) |
|-------------------|------------------------|
| accounting | 4 |
| boss | 6 |
| garbage-collector | 0 |

See sample data below.
http://sqlfiddle.com/#!9/357d29/2
The first query will only get departments with 6 or more highy paid employees WHILE the 2nd query will get highly paid employees of those departments with 6 or more employees. Below sample will not show in the 1st query but will show in the 2nd query.
Department Employee Salary
accounting john doe 50k
jan smith 55k
dan brown 60k
eric murphy 60k
al daniels 70k
ellen boyle 30k
1st query: nothing because only five emp have > 40k salary
2nd query: All except ellen boyle. Department has > 5 employees and all except 1 has > 40k salary

For the record, you already got correct answers. I'll just try to explain it in a different way.
Your first query has 1 select statement. It only returns employees with salary > 40k and from departments > 5 employees. Every record will only contain information about an employee with salary > 40k and from departments > 5 employees.
Your second query has 2 select statements:
This is the first one:
Select dname, count(*)
from Department, Employee
where dnumber = dno
and salary > 40000
it returns the count of all employees, by department name who earn > 40000. There are no conditions on the count(*) here. And the condition on the salary has no power over the second select statement:
SELECT Employee.dno
FROM Employee
GROUP BY Employee.dno
HAVING COUNT(*) > 5
This one returns ALL employees in all departments. This is where we have the condition on the count(*) - but it is only applied locally, to limit the number of employees per department.
And then two statements are joined together - so, first we limit the departments to the ones we are interested in, and then from those only select high-salary employees.

First, never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
I think the best and simplest solution uses conditional aggregation:
SELECT d.dname, SUM(CASE WHEN e.salary > 40000 THEN 1 ELSE 0 END) as num_40kplus
FROM Department d JOIN
Employee e
ON d.dno = e.dnumber
GROUP BY dname
HAVING COUNT(*) > 5;
I see no reason why a subquery would be necessary or desirable.

Related

SQL displaying staff with more salary than their managers

I'm trying to display staff from the same department who earn more than their managers.
SELECT ID, NAME, DEPARTMENT, SALARY, JOB
FROM STAFF
WHERE SALARY > ANY (SELECT SALARY FROM STAFF WHERE JOB = 'Manager')
This doesn't seem to work, and I'm reallly not sure why.
Here's a peep at how the tables are formatted:
ID | NAME | DEPARTMENT | SALARY | JOB
20 | JOHN | 180 | 52000 | Manager
30 | KATY | 180 | 60000 | Analyst
The problem is that you need to correlate the subquery to match the same departement:
SELECT s1.ID, s1.NAME, s1.DEPARTMENT, s1.SALARY, s1.JOB
FROM SALARY s1
WHERE
s1.JOB <> 'MANAGER' AND
s1.SALARY > (SELECT s2.SALARY FROM SALARY s2
WHERE s2.DEPARTMENT = s1.DEPARTMENT AND s2.JOB = 'MANAGER');
This answer assumes that each department would have only one manager. If there could be more than one manager, then it would be safer to write the above using exists logic:
SELECT s1.ID, s1.NAME, s1.DEPARTMENT, s1.SALARY, s1.JOB
FROM SALARY s1
WHERE
s1.JOB <> 'MANAGER' AND
NOT EXISTS (SELECT 1 FROM SALARY s2
WHERE s2.DEPARTMENT = s1.DEPARTMENT AND
s2.JOB = 'MANAGER' AND
s2.SALARY >= s1.SALARY);

How to get Details of all Department(dept_id, name, dept_manager and total_employees) from tables(Department, Employee)?

Here are the details of the tables:
Employee :
emp_ID | Primary Key
emp_name | Varchar
emp_email | varchar <br>
emp_dept_id | Foreign Key
Departments :
dept_ID | Primary Key
dept_name | Varchar
emp_id | Foreign Key
Manager details are already there in the employee table.
I am using Oracle Database.
Employee:
emp_ID emp_name emp_email emp_dept_id
1 Cyrus abc#xyz.com 10
2 Andrew xyz#abc.com 20
3 Mark xyz#abc.com 10
4 Tony xyz#abc.com 10
5 Elvis xyz#abc.com 20
6 Rock xyz#abc.com 10
7 George xyz#abc.com 20
8 Mary xyz#abc.com 10
9 Thomas xyz#abc.com 20
10 Martin xyz#abc.com 10
Depqartments:
dept_id dept_name emp_id
10 Accounts 4
20 Development 9
These are the data in the tables. In Department table, emp_id(Foreign key) indicates the head/manager of the department .
You can try following query:
SELECT
D.DEPT_ID,
D.DEPT_NAME,
D.EMP_ID AS MANAGER_ID,
E.EMP_NAME AS MANAGER_NAME,
E.EMP_EMAIL AS MANAGER_EMAIL,
E.CNT AS "number of employees"
FROM
DEPARTMENT D
JOIN (
SELECT
EMP_ID,
EMP_NAME,
EMP_EMAIL,
EMP_DEPT_ID,
COUNT(1) OVER(
PARTITION BY EMP_DEPT_ID
) AS CNT
FROM
EMPLOYEE
) E ON ( E.DEPT_ID = D.EMP_DEPT_ID
AND E.EMP_ID = D.EMP_ID );
Cheers!!
This is pretty straightforward. You want to get the department details (1) and also include the employees associated with that department (2).
Get the department details:
SELECT <dept_col1>, <dept_col2>, ...
FROM Department
Get a summary of employees per department:
SELECT dept_ID, <aggregation> AS MyAgg
FROM department
GROUP BY dept_ID
Here you replace <aggregation> with whatever you're trying to find. In your case it would be COUNT(*).
Combine the two results. You want to join the data from (2) to (1). How are these two tables related? Here you write a LEFT JOIN to connect them on the PK/FK relationship:
SELECT
dept.<col1>, dept.<col2>, ...,
COALESCE(emp.MyAgg,0) AS MyAgg
FROM Department dept
LEFT JOIN (
SELECT dept_ID, <aggregation> AS MyAgg
FROM department
GROUP BY dept_ID
) emp ON dept.<FK> = emp.<PK>
Get the department manager's info by adding another LEFT JOIN:
SELECT
dept.<col1>, dept.<col2>, ...,
mgr.<col1>,
COALESCE(emp.MyAgg,0) AS MyAgg -- In case there are 0 employees
FROM Department dept
LEFT JOIN employee mgr ON dept.<FK> = mgr.<PK>
LEFT JOIN (
SELECT dept_ID, <aggregation> AS MyAgg
FROM department
GROUP BY dept_ID
) emp ON dept.<FK> = emp.<PK>
Give it a try and see how far you get. If you get stuck let me know.

how to find the maximum salary of employees in a specific department

I have a table named SalaryTable containing salaries of employee in various departments:
dept_id name salary
12 a 100
13 b 200
12 c 300
14 d 400
12 e 500
13 f 600
I need to find the maximum salary of each department with given department id AND the name of that person along with maximum salary.
I am using the following sql query for this
select dept_id, name, max(salary)
from SalaryTable
group by salary
But the above code is giving me error:
dept_id must be an aggregate expression or appear in GROUP BY clause
I am able to get the following table easily with this below query:
select dept_id, max(salary) as max_salary
from SalaryTable
group by salary
dept_id max_salary
12 500
13 600
14 400
but I also need the name of that person as:
REQUIRED OUTPUT
dept_id name max_salary
12 e 500
13 f 600
14 d 400
You appear to be learning SQL, so you can build on what you have. The following gets the maximum salary:
select dept_id, max(salary)
from SalaryTable
group by dept_id;
You can use this as a subquery, to get all matching names:
select st.*
from SalaryTable st join
(select dept_id, max(salary) as max_salary
from SalaryTable
group by dept_id
) std
on st.dept_id = std.dept_id and
st.salary = std.max_salary
use correlated subquery
select dept_id, name, salary
from SalaryTable a
where salary =(select max(salary) from SalaryTable b where a.dept_id=b.dept_id)
To be exact:
SELECT dept_id, NAME, salary FROM SalaryTable a
WHERE salary =(SELECT MAX(salary) FROM SalaryTable b WHERE a.dept_id=b.dept_id)
ORDER BY dept_id;
Also see try by joins because see this
Remember: Whatever you put in between select and from in single sql statement that must be used in the group by clause (That's what your error says!).
You can do it with NOT EXISTS:
select s.* from SalaryTable s
where not exists (
select 1 from SalaryTable
where dept_id = s.dept_id and salary > s.salary
)
order by s.dept_id
See the demo.
Results:
> dept_id | name | salary
> ------: | :--- | -----:
> 12 | e | 500
> 13 | f | 600
> 14 | d | 400

Find the average value from the previous row and the current row for a same department id

I have 3 employees with the same departments. i wanted to find the average of their salaries based on the previous rows.
For example
Employee_id department_id salary avg(salary)
101 1 5000 5000
102 1 10000 7500
103 1 15000 10000
This is like for department id as 1 with first salary we find the average as 5000 for employee id 101 and for the same department id as 1 for the employee id as 102 , we find the average for the 2 values grouped by department id.
Hence
average is (10000+5000) /2 = 7500
But for the employee id as 103, department id is 1 and is grouped with all the above three values of amount.
Hence,
average of salary is (10000+5000+15000)/3 = 10000
The requirement is i have been asked to use query_partition_clause and order_by_clause.
Hence i tried as follows,
select avg(salary) OVER (partition by department_id ORDER BY department_id ) salary, department_id, salary from employee
But i am always getting the values by considering the department of 3 data values.
Henceforth can somebody help on this resolution?
Many Thanks for the help.
Use ORDER BY salary (or ORDER BY employee_id) rather than ORDER BY department_id:
Oracle Setup:
CREATE TABLE employees ( Employee_id, department_id, salary ) AS
SELECT 101, 1, 5000 FROM DUAL UNION ALL
SELECT 102, 1, 10000 FROM DUAL UNION ALL
SELECT 103, 1, 15000 FROM DUAL;
Query:
SELECT e.*,
AVG( salary ) OVER ( PARTITION BY department_id ORDER BY salary ) AS avg_salary
FROM employees e
Output:
EMPLOYEE_ID | DEPARTMENT_ID | SALARY | AVG_SALARY
----------: | ------------: | -----: | ---------:
101 | 1 | 5000 | 5000
102 | 1 | 10000 | 7500
103 | 1 | 15000 | 10000
db<>fiddle here
SELECT EMPLOYEE_ID,
DEPARTMENT_ID,
SALARY,
(SELECT AVG(SALARY)
FROM EMPLOYEES B
WHERE B.EMPLOYEE_ID <= A.EMPLOYEE_ID) AVG_SALARY
FROM EMPLOYEES A
GROUP BY EMPLOYEE_ID,
DEPARTMENT_ID,
SALARY
A sub query can be done in the query itself by filtering the employee ID. I hope I have helped with something.

Removing duplicates by adding them up [SQL]

I have a query like this:
select employee_id, salary
from salary
left join employee on salary.employee_id=employee.id_employee;
It returns me these results
EMPLOYEE ID | SALARY
-------------|-------
1 | 50
2 | 50
3 | 50
1 | 30
How do I remove duplicates by adding them up, like this:
EMPLOYEE ID | SALARY
------------|--------
1 | 80
2 | 50
3 | 50
There is no reason for a left join from salary to employee. Presumably, every employee_id in salary refers to a valid employee. So, this should do what you want:
select s.employee_id, sum(s.salary) as salary
from salary s
group by s.employee_id;
If you want all employees, even those who are not in the salary table, then an outer join is appropriate -- but employee should be first:
select e.id_employee, sum(s.salary) as salary
from employee e left join
salary s
on s.employee_id = e.id_employee
group by e.id_employee;
Employees not in salary will have a value of NULL.
Note that the group by condition in this query is on employee, the first table in the left join.
select employee_id, SUM(salary) as salary
from salary
left join employee on salary.employee_id=employee.id_employee
group by emplyee_id;
This is exactly what a group by clause is for. You'll have to group by the emplopyee_id and specify how you want to aggregate the salary:
SELECT employee_id, SUM(salary)
FROM salary
GROUP BY employee_id