How to use GROUP BY when fetching values from More than one Table [duplicate] - sql

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 4 months ago.
We have 2 Tables Employees and Department.
We want to show the maximum salary from each department and their corresponding employee name from the employee table and the department name from the department table.
Employee Table
EmpId | EmpName |salary |DeptId
101 shubh1 1000 1
101 shubh2 4000 1
102 shubh3 3000 2
102 shubh4 5000 2
103 shubh5 12000 3
103 shubh6 1000 3
104 shubh7 1400 4
104 shubh8 1000 4
Department Table
DeptId | DeptName
1 ComputerScience
2 Mechanical
3 Aeronautics
4 Civil
I tried doing it but was getting error
SELECT DeptName FROM Department where deptid IN(select MAX(salary),empname,deptid
FROM Employee
GROUP By Employee.deptid)
Error
Token error: 'Column 'Employee.EmpName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.' on server 4e0652f832fd executing on line 1 (code: 8120, state: 1, class: 16)
Can someone please help me.

select salary
,EmpName
,DeptName
from (
select e.salary
,e.EmpName
,d.DeptName
,rank() over(partition by e.DeptId order by e.salary desc) as rnk
from Employee e join Department d on d.DeptId = e.DeptId
) t
where rnk = 1
salary
EmpName
DeptName
4000
shubh2
ComputerScience
5000
shubh4
Mechanical
12000
shubh5
Aeronautics
1400
shubh7
Civil
Fiddle

Now that I know it's MS SQL Server, technically; we could use cross or outer Apply; it's a table value function not a join per say... but this will depend on the version of SQL Server; and if you want data if it doesn't exist in another
I find this the "Best" Design pattern to use for this type of query.
What the engine does is for each record in department, it runs a query for the employees Finding those in that department returning the 1 record having the max salary. With top we could specify with ties to return more than one. but we would need to know how to handle Ties of salary. Use top 1 with ties or order the results so you get the "Top" result you want.
Demo: dbfddle.uk
SELECT Sub.empName, Sub.Salary, D.DeptName
FROM Department D
CROSS Apply (SELECT Top 1 *
--(SELECT TOP 1 with Ties * -- could use this if we ties
FROM Employee E
WHERE E.DeptID = D.DeptID
ORDER BY Salary Desc) Sub --add additional order by if we don't want ties.
The cross apply gives us:
+---------+--------+-----------------+
| empName | Salary | DeptName |
+---------+--------+-----------------+
| shubh2 | 4000 | ComputerScience |
| shubh4 | 5000 | Mechanical |
| shubh5 | 12000 | Aeronautics |
| shubh7 | 1400 | Civil |
+---------+--------+-----------------+
Before window functions, before cross Apply or lateral... We'd write an inline view
It would get us the max salary for each dept, we then join that back to our base tables to find the employee within each dept with max salary...
Demo: DbFiddle.uk
SELECT E.*, D.*
FROM Employee E
INNER JOIN Department D
on E.DeptID = D.DeptID
INNER JOIN (SELECT MAX(SALARY) maxSal , DeptID
FROM Employee
GROUP BY DeptID) Sub
on Sub.DeptID = E.DeptID
and Sub.MaxSal = E.Salary
One has to do a join to get the department info an the employee info. However, we can eliminate the join for salarymax by using exists and correlation instead.
Demo DbFiddle.uk
SELECT E.*, D.*
FROM Employee E
INNER JOIN Department D
on E.DeptID = D.DeptID
WHERE EXISTS (SELECT MAX(Sub.SALARY) maxSal , Sub.DeptID
FROM Employee Sub
WHERE sub.DeptID=E.DeptID --correlation 1
GROUP BY Sub.DeptID
HAVING E.Salary = max(Sub.Salary)) --correlation 2
We could eliminate the last join too I suppose:
Demo: Dbfiddle.uk
SELECT E.*, (SELECT DeptName from Department where E.DeptID = DeptID)
FROM Employee E
WHERE EXISTS (SELECT MAX(Sub.SALARY) maxSal , Sub.DeptID
FROM Employee Sub
WHERE sub.DeptID=E.DeptID --correlation 1
GROUP BY Sub.DeptID
HAVING E.Salary = max(Sub.Salary)) --correlation 2
The top 3 give us this result:
+-----+---------+--------+--------+--------+-----------------+
| id | empName | salary | deptID | DeptID | DeptName |
+-----+---------+--------+--------+--------+-----------------+
| 101 | shubh2 | 4000 | 1 | 1 | ComputerScience |
| 102 | shubh4 | 5000 | 2 | 2 | Mechanical |
| 103 | shubh5 | 12000 | 3 | 3 | Aeronautics |
| 104 | shubh7 | 1400 | 4 | 4 | Civil |
+-----+---------+--------+--------+--------+-----------------+

Related

SQL displaying staff with more salary than their managers

I'm trying to display staff from the same department who earn more than their managers.
SELECT ID, NAME, DEPARTMENT, SALARY, JOB
FROM STAFF
WHERE SALARY > ANY (SELECT SALARY FROM STAFF WHERE JOB = 'Manager')
This doesn't seem to work, and I'm reallly not sure why.
Here's a peep at how the tables are formatted:
ID | NAME | DEPARTMENT | SALARY | JOB
20 | JOHN | 180 | 52000 | Manager
30 | KATY | 180 | 60000 | Analyst
The problem is that you need to correlate the subquery to match the same departement:
SELECT s1.ID, s1.NAME, s1.DEPARTMENT, s1.SALARY, s1.JOB
FROM SALARY s1
WHERE
s1.JOB <> 'MANAGER' AND
s1.SALARY > (SELECT s2.SALARY FROM SALARY s2
WHERE s2.DEPARTMENT = s1.DEPARTMENT AND s2.JOB = 'MANAGER');
This answer assumes that each department would have only one manager. If there could be more than one manager, then it would be safer to write the above using exists logic:
SELECT s1.ID, s1.NAME, s1.DEPARTMENT, s1.SALARY, s1.JOB
FROM SALARY s1
WHERE
s1.JOB <> 'MANAGER' AND
NOT EXISTS (SELECT 1 FROM SALARY s2
WHERE s2.DEPARTMENT = s1.DEPARTMENT AND
s2.JOB = 'MANAGER' AND
s2.SALARY >= s1.SALARY);

An SQL query to pull count of employees absent under each manager on all dates

The objective of the query is get a count of employees absent under each manager.
Attendance (Dates when employees are present)
id date
1 16/05/2020
2 16/05/2020
1 17/05/2020
2 18/05/2020
3 18/05/2020
Employee
id manager_id
1 2
2 3
3 NA
The desired output should be in this format:
Date manager_id Number_of_absent_employees
16/05/2020 NA 1
17/05/2020 3 1
17/05/2020 NA 1
18/05/2020 2 1
I have tried writing code but partially understood it, intuition being calculating total number of actual employees under each manager and subtracting it from number of employees present on given day. Please help me in completing this query, many thanks!
with t1 as /* for counting total employees under each manager */
(
select employee.manager_id,count(*) as totalc
from employee as e
inner join employee on e.employee_id=employee.employee_id
group by employee.manager_id
)
,t2 as /* for counting total employees present each day */
(
select Attendence.date, employee.manager_id,count(*) as present
from employee
Left join Attendence on employee.employee_id=Attendence.employee_id
group by Attendence.date, employee.manager_id
)
select * from t2
Left join t1 on t2.manager_id=t1.manager_id
order by date
Cross join the distinct dates from Attendance to Employee and left join Attendance to filter out the matching rows.
The remaining rows are the absences so then you need to aggregate:
select d.date, e.manager_id,
count(*) Number_of_absent_employees
from (select distinct date from Attendance) d
cross join Employee e
left join Attendance a on a.date = d.date and a.id = e.id
where a.id is null
group by d.date, e.manager_id
See the demo.
Results:
| date | manager_id | Number_of_absent_employees |
| ---------- | ---------- | -------------------------- |
| 16/05/2020 | NA | 1 |
| 17/05/2020 | 3 | 1 |
| 17/05/2020 | NA | 1 |
| 18/05/2020 | 2 | 1 |
Try this query. In first cte just simplify your code. And in the last query calculate absent employees.
--in this CTE just simplify counting
with t1 as /* for counting total employees under each manager */
(
select employee.manager_id,count(*) as totalc
from employee
group by manager_id
)
,t2 as
(
select Attendence.date, employee.manager_id,count(*) as present
from employee
Left join Attendence on employee.employee_id=Attendence.employee_id
group by Attendence.date, employee.manager_id
)
select t2.date,t2.manager_id, (t1.totalc-t2.present) as employees_absent from t2
Left join t1 on t2.manager_id=t1.manager_id
order by date
Select ec.manager_id, date, (total_employees - employee_attended) as employees_absent from
(Select manager_id, count(id) as total_employees
from employee
group by manager_id) ec,
(Select distinct e.manager_id, a.date, count(a.id) over (partition by e.manager_id, a.date) as employee_attended
from Employee e, attendence, a
where e.id = a.id(+)) ea
where ec.manager_id = ea.manager_id (+)
I guess this should work

Select rows where every child row meets a condition

In my Oracle DB, I have two tables in a one-to-many relationship: Managers and Employees.
+------------+-------+------------+
| Manager_ID | Name | Department |
+------------+-------+------------+
| 1 | Steve | Sales |
| 2 | Ben | Sales |
| 3 | Molly | Accounts |
+------------+-------+------------+
+-------------+------------+--------+-----+
| Employee_ID | Manager_ID | Name | Age |
+-------------+------------+--------+-----+
| 1 | 1 | Kyle | 25 |
| 2 | 1 | Gary | 31 |
| 3 | 2 | Renee | 31 |
| 4 | 2 | Oliver | 32 |
+-------------+------------+--------+-----+
How do I select only those Managers where every one of his Employees is over the age of 30?
In my example data, the only Manager who meets this condition is Ben, because both of his employees are over 30.
I thought something like this would do it, but it's wrong:
SELECT m.manager_id
FROM managers m
WHERE m.manager_id IN (SELECT e.manager_id
FROM employees e
GROUP BY e.manager_id
HAVING e.age > 30)
Use not exists :
select m.*
from manager m
where not exists (select 1
from Employees e
where e.Manager_ID = m.Manager_ID and e.Age < 30
) and
exists (select 1 from Employees e where e.Manager_ID = m.Manager_ID)
The only thing I don't like about Yogesh's answer (which I upvoted, since it's probably the way I'd write it) is that you have to go to the employees table a second time, to make sure the manager actually has at least one employee.
On the plus side, the NOT EXISTS that Yogesh used will allow Oracle to stop looking at a manager's employees once it finds one that is too young. So, maybe it's a toss-up.
I'll offer this alternative. It is shorter than the NOT EXISTS and does not have to go to the employees table a second time.
SELECT m.*
FROM manager m
CROSS APPLY (
SELECT min(age) min_age
FROM employee e
WHERE e.manager_id = m.manager_id ) ma
where ma.min_age >= 30;
Using sub-query for counts
SQL> WITH manager(Manager_ID, Name, Department) AS (
2 SELECT 1, 'Steve', 'Sales' FROM dual UNION ALL
3 SELECT 2, 'Ben', 'Sales' FROM dual UNION ALL
4 SELECT 3, 'Molly', 'Accounts' FROM dual),
5 employee(Employee_ID, Manager_ID, Name, Age) AS (
6 SELECT 1 , 1, 'Kyle', 25 FROM dual UNION ALL
7 SELECT 2 ,1, 'Gary', 31 FROM dual UNION ALL
8 SELECT 3, 2, 'Renee', 31 FROM dual UNION ALL
9 SELECT 4, 2 , 'Oliver', 32 FROM dual)
10 ---------------------------
11 --- End of data preparation
12 ---------------------------
13 SELECT m.name
14 FROM manager m
15 JOIN (SELECT manager_id,
16 COUNT(1) total,
17 COUNT(CASE WHEN age > 30 THEN 1 ELSE NULL END) age_30_above
18 FROM employee
19 GROUP BY manager_id) ee
20 ON m.manager_id = ee.manager_id
21 WHERE total = age_30_above;
Output
NAME
-----
Ben
Your query will be:
SELECT m.name
FROM manager m
JOIN (SELECT manager_id,
COUNT(1) total,
COUNT(CASE WHEN age > 30 THEN 1 ELSE NULL END) age_30_above
FROM employee
GROUP BY manager_id) ee
ON m.manager_id = ee.manager_id
WHERE total = age_30_above;
SELECT manager_id
FROM employees -- managers
minus
select manager_id
from employees
where age <= 30
You can use ALL function like this:
SELECT m.manager_id
FROM managers m
WHERE (30 <= ALL (SELECT e.age FROM employees e WHERE e.manager_id = m.manager_id));
You might want to reverse the conditions, select all managers, who dont have any employee below 30
select * from managers
where manager_id not in (select manager_id
from employees
where age < 30)

How to Improve This Self-Joins

I am learning Oracle SQL by working with its primitive HR schema where there is EMPLOYEES table which has three columns that I'm mainly interested in: MANAGER_ID, which is basically a self reference to EMPLOYEES.EMPLOYEE_ID, DEPARTMENT_ID, and SALARY. (You can find the schema diagram and schema objects here).
I wish, for each employee, to retrieve his/her SALARY, alongside of employee's manager's departmental average salary. For instance, if we have the following (EMPLOYEE_ID = 140 is the interested party here):
+-------------+--------+---------------+------------+
| EMPLOYEE_ID | SALARY | DEPARTMENT_ID | MANAGER_ID |
+-------------+--------+---------------+------------+
| 140 | 12000 | 50 | 110 |
| 110 | 20000 | 60 | 101 |
| 156 | 18000 | 60 | 101 |
| 175 | 15000 | 60 | 105 |
| 320 | 24000 | 60 | 105 |
+-------------+--------+---------------+------------+
I am interested in obtaining an average salary of all the managers (not all other non-managerial employees) in department where employee's manager works at (in this case, DEPARTMENT_ID =60), and compare it with employee's (in this case, 140). In a sample data above, the output should be:
+-------------+--------+-------------+-------------+------------+
| EMPLOYEE_ID | SALARY | AVG_MGR_SAL | MGR_DEPT_ID | MANAGER_ID |
+-------------+--------+-------------+-------------+------------+
| 140 | 12000 | 19250 | 60 | 110 |
+-------------+--------+-------------+-------------+------------+
where we have four (4) managers working in department 60, and $19250 being calculated as (20000 + 18000 + 15000 + 24000) / 4. I have come up with the following query that seems to work (and excludes those employees that don't have a manager):
select
employee_id
, salary employee_salary
, trunc(mgr_info.avg_manager_salary_per_dept, 0) emp_manager_avg_sal_dept
, mgr_info.manager_dept_id
, mgr_info.manager_id
from employees
join (
select
e1.employee_id manager_id
, e1.department_id manager_dept_id
, e1.salary manager_salary
, avg(e1.salary) over (partition by e1.department_id) avg_manager_salary_per_dept
from employees e1
join (
select distinct manager_id
from employees
where manager_id is not null
) mgr_ids
on e1.employee_id = mgr_ids.manager_id
) mgr_info
on employees.manager_id = mgr_info.manager_id
order by employee_id
However, I feel like that there should be a better way of getting the same result with fewer self-joins. Is there a way to get a better performance?
Something like this... You only need one join, you can compute the average salary for the manager's department on the "manager" copy of the table. I only included a few columns, you may need more, or fewer, but I believe the core of what you wanted is covered.
(NOTE: Edited since I realized I missed one detail in the requirement)
select e.employee_id as employee_id,
e.salary as employee_salary,
m.employee_id as manager_id,
m.department_id as manager_dept_id,
m.avg_salary as avg_sal_of_mgr_dept
from hr.employees e inner join
( select employee_id, department_id,
avg(salary) over (partition by department_id) as avg_salary
from hr.employees
where employee_id in (select manager_id from hr.employees)
) m
on e.manager_id = m.employee_id
;
Here is an option which uses a series of joins to get your result:
SELECT DISTINCT t1.EMPLOYEE_ID,
t1.SALARY,
t1.DEPARTMENT_ID,
COALESCE(t2.SALARY, 0.0) AS ManagerAvgSal
FROM employees t1
LEFT JOIN
(
SELECT e1.DEPARTMENT_ID, AVG(e1.SALARY) AS SALARY
FROM employees e1
WHERE e1.EMPLOYEE_ID IN (SELECT DISTINCT MANAGER_ID FROM employees)
GROUP BY e1.DEPARTMENT_ID
) t2
ON t1.DEPARTMENT_ID = t2.DEPARTMENT_ID

SQL - how to get certain column with MIN and MAX id for every department?

I'm trying to select some information using SQL, but with no success. Here's what I'm trying to do.
I have 2 tables:
Table employees with following columns:
IDemployee | name | surname | department_id
1 | John | Smith | 1
2 | Jane | Smith | 1
3 | Neo | Anderson | 1
4 | John | Mason | 2
5 | James | Cameron | 2
6 | Morpheus| Grumpy | 2
Table departments with columns:
IDdepartment | name
1 | Thieves
2 | Madmen
I want to select surnames of first and last employees of every department and count of their employees.
Result:
department_name | first_employee | last_employee | employee_count
Thieves | Smith | Anderson | 3
Madmen | Mason | Grumpy | 3
I was able to get count and ID's of first and last employees with following query:
SELECT d.IDdepartment, COUNT(*) as "employee_count", MIN(e.IDemployee) as "first_employee", MAX(e.IDemployee) as "last_employee"
FROM ( employees e INNER JOIN departments d ON d.IDdepartment=e.department_id)
GROUP BY d.name;
However, I can't find the right way to select their surnames. Any help would be greatly appreciated.
While there might be another way, one way is to use your query as a subquery:
SELECT d.name department_name,
e.surname first_employee,
e2.surname last_employee,
t.employee_count
FROM (
SELECT d.IDdepartment,
COUNT(*) as "employee_count",
MIN(e.IDemployee) as "first_employee",
MAX(e.IDemployee) as "last_employee"
FROM employees e
INNER JOIN departments d
ON d.IDdepartment=e.department_id
GROUP BY d.name
) t JOIN employees e on t.first_employee = e.IDemployee
JOIN employees e2 on t.last_employee = e2.IDemployee
JOIN departments d on t.IDdepartment = d.IDdepartment
And here is the fiddle: http://sqlfiddle.com/#!2/17a5b/2
Good luck.
This is general Oracle example based on existing Oracle table. You need to use analytic functions if available in your version of SQL. You do not specify which SQL you are using. If FIRST() and LAST() analytic f-ns available in your SQL then this should work:
SELECT empno, deptno, sal,
MIN(sal) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER (PARTITION BY deptno) "Lowest",
MAX(sal) KEEP (DENSE_RANK LAST ORDER BY sal) OVER (PARTITION BY deptno) "Highest"
FROM scott.emp
ORDER BY deptno, sal
/
See lowest and highest salary by dept in output of above query:
DEPTNO SAL Lowest Highest
---------------------------------
10 1300 1300 5000
10 2450 1300 5000
10 5000 1300 5000
20 800 800 3000
20 1100 800 3000
20 2975 800 3000
....