How do I join two results with avg()? - sql

I have two query results:
Select avg(salary), department_id
from employees
Group by department_id
and
Select d.department_name, e.department_id
From departments d, employees e
Where e.department_id=d.department_id
How can I combine the into one result. Dept. ID on first is primary key for both tables
I tried union, creating new tables... Nothing works I need something like this:
Dept_id | Dept_name | avg_salary |
---------------------------------------
121 | Management | 324000 |
---------------------------------------
102 | Tax | 432555 |
---------------------------------------
Etc

I am assuming you want avarage salary for each department. For that you shoudl use the JOIN with group by. And AVG aggregate to get the average salary for the department.Try this,
SELECT
d.department_id,
d.department_name,
AVG(e.Salary) AS avarageSalary
FROM
Department d JOIN Employee e
ON d.department_id= e.department_id
GROUP BY
d.department_id,d.department_name

Related

Can't find no employees in each department

DEPARTMENT Table
DEPARTMENT_ID
| 1 |
| 2 |
EMPLOYEES Table
EMPLOYEE_ID | DEPARTMENT_ID
ANDY | 1
Output
DEPARTMENT_ID|
2
Here is my code:
SELECT DEPARTMENT_ID
FROM DEPARTMENTS
where department_id!= all ( SELECT department_id
FROM employees
);
The code doesn't show 2,and the output is blank.
Like this:
DEPARTMENT_ID|
If you want to find the number of each employee in each department, you should use a left join aggregation:
SELECT d.DEPARTMENT_ID, COUNT(e.EMPLOYEE_ID) AS cnt
FROM DEPARTMENT d
LEFT JOIN EMPLOYEES e
ON d.DEPARTMENT_ID = e.DEPARTMENT_ID
GROUP BY d.DEPARTMENT_ID;
The above count expression counts the EMPLOYEE_ID column in the employee table. It is important to note that should a department have no employees, then COUNT would return zero, since NULL is not counted.
SELECT d.department_id
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
WHERE e.department_id IS NULL
SQLFiddle Demo

I want to get data from 3 tables

I have 3 tables employee, jobs and department as below,
------------------- ---------------- ------------------
| employee | | jobs | | department |
------------------- ---------------- ------------------
| empId | | jobId | | deptId |
| fname | | jobTitle | | deptName |
| lname | | | | managerId |
| jobId | | | | |
| managerId | | | | |
| departmentId | | | | |
------------------- ---------------- -------------------
I want to select all data from employee, the job title through jobId, the deptName through deptId and manager name through managerId and employeeId
SELECT EMPLOYEES.EMPLOYEE_ID, EMPLOYEES.FIRST_NAME, EMPLOYEES.LAST_NAME,
EMPLOYEES.JOB_ID, JOBS.JOB_TITLE AS JOB_TITLE, EMPLOYEES.SALARY,
DEPARTMENTS.DEPARTMENT_ID, DEPARTMENTS.DEPARTMENT_NAME AS DEPARTMENT_NAME
FROM EMPLOYEES
LEFT JOIN JOBS ON EMPLOYEES.JOB_ID = JOBS.JOB_ID
LEFT JOIN DEPARTMENTS ON EMPLOYEES.DEPARTMENT_ID = DEPARTMENTS.DEPARTMENT_ID
what should I do to get the name of the manager using the empId?
or is there any other way to simplify this block of code?
Just Add another LEFT JOIN to the EMPLOYEES table, but give it an alias, say MANAGERS.
SELECT
EMPLOYEES.EMPLOYEE_ID,
EMPLOYEES.FIRST_NAME,
EMPLOYEES.LAST_NAME,
EMPLOYEES.JOB_ID,
JOBS.JOB_TITLE AS JOB_TITLE,
EMPLOYEES.SALARY,
DEPARTMENTS.DEPARTMENT_ID,
DEPARTMENTS.DEPARTMENT_NAME AS DEPARTMENT_NAME,
MANAGERS.FIRST_NAME AS MANAGER_FIRST_NAME,
MANAGERS.LAST_NAME AS MANAGER_LAST_NAME
FROM EMPLOYEES
LEFT JOIN EMPLOYEES MANAGERS
ON EMPLOYEES.MANAGER_ID = MANAGERS.EMPLOYEE_ID
LEFT JOIN JOBS
ON EMPLOYEES.JOB_ID = JOBS.JOB_ID
LEFT JOIN DEPARTMENTS
ON EMPLOYEES.DEPARTMENT_ID = DEPARTMENTS.DEPARTMENT_ID
You can use another left join on employee table to get the manager name.
SELECT (
EMPLOYEES.EMPLOYEE_ID,
EMPLOYEES.FIRST_NAME,
EMPLOYEES.LAST_NAME,
EMPLOYEES.JOB_ID,
JOBS.JOB_TITLE AS JOB_TITLE,
EMPLOYEES.SALARY,
DEPARTMENTS.DEPARTMENT_ID,
DEPARTMENTS.DEPARTMENT_NAME AS DEPARTMENT_NAME,
(MGR.FIRST_NAME + ' ' + MGR.LAST_NAME) AS MANAGER_NAME)
FROM EMPLOYEES
LEFT JOIN JOBS
ON EMPLOYEES.JOB_ID = JOBS.JOB_ID
LEFT JOIN DEPARTMENTS
ON EMPLOYEES.DEPARTMENT_ID = DEPARTMENTS.DEPARTMENT_ID
LEFT JOIN EMPLOYEES MGR
ON EMPLOYEES.MANAGERID = MGR.EMPLOYEE_ID
try using aliases and join back to employee on the manager id.
SELECT
E.EMPLOYEE_ID, E.FIRST_NAME, E.LAST_NAME,
E.JOB_ID, J.JOB_TITLE AS JOB_TITLE, E.SALARY,
D.DEPARTMENT_ID, D.DEPARTMENT_NAME AS DEPARTMENT_NAME,
M.EMPLOYEE_ID as mgr_id, M.FIRST_NAME as mgr_name, M.LAST_NAME as mgr_lname
FROM EMPLOYEES E LEFT JOIN JOBS J ON
E.JOB_ID = J.JOB_ID
LEFT JOIN DEPARTMENTS D ON
E.DEPARTMENT_ID = D.DEPARTMENT_ID
join employees M ON
E.manager_ID = M.employee_ID
The reason this query works is because I am doing a self join back to the Employees table from the main result set that you already established. To put it simply, imagine you made an exact copy of the employees table and called it M. If you took your original query and joined to the M table using the original query's Employee.manager_id to the employee_ID in the M table, then you would have the manager for the employee.
There is no need to create an exact copy of the employee table just to look up the manager. We can just reference the employee table a second time and use an alias (I used M for manager). Then we join from your list of employees using the manager_id to get the employee's manager.
You could do this again to get the manager's manager if you need to. Here is that query:
SELECT
E.EMPLOYEE_ID, E.FIRST_NAME, E.LAST_NAME,
E.JOB_ID, J.JOB_TITLE AS JOB_TITLE, E.SALARY,
D.DEPARTMENT_ID, D.DEPARTMENT_NAME AS DEPARTMENT_NAME,
M.EMPLOYEE_ID as mgr_id, M.FIRST_NAME as mgr_name, M.LAST_NAME as mgr_lname
FROM EMPLOYEES E LEFT JOIN JOBS J ON
E.JOB_ID = J.JOB_ID
LEFT JOIN DEPARTMENTS D ON
E.DEPARTMENT_ID = D.DEPARTMENT_ID
join employees M ON /* The employee's manager */
E.manager_ID = M.employee_ID
LEFT join employees MM ON /* The employee's manager's manager */
M.manager_ID = MM.employee_ID
I used a left join for this last one, because at some point you will get to the top of the management hierarchy and might find that there are no more managers. You could also put a left join on the join employees M
Here it is in Tabular form
Employee_id | Name | manager_id
1 | Fred | 10
2 | Jane | 10
10 | Bob | 20
20 | Betty | Null
Looking at employee # 1. The values of E.employee_id = 1, E.Name = Fred and E.manager_id = 10.
So the relevant lines of the query evaluate as follows:
...
join employees M ON /* The employee's manager */
E.manager_ID /* i.e. 10 */ = M.employee_ID
...
So the M alias now refers to the employee record where M.employee_ID = 10 and as such, M.Name = Bob and M.employee_id = 20.
Using the last version of the query, we could then work out that Fred's manager's manager (i.e. Fred's manager is Bob and Bob's manager is Betty) will have an employee_id of 20 (i.e. M.manager_ID = 20), so the MM.employee_id would be 20 and hence refer to Betty who doesn't seem to have a manager.

Removing duplicates by adding them up [SQL]

I have a query like this:
select employee_id, salary
from salary
left join employee on salary.employee_id=employee.id_employee;
It returns me these results
EMPLOYEE ID | SALARY
-------------|-------
1 | 50
2 | 50
3 | 50
1 | 30
How do I remove duplicates by adding them up, like this:
EMPLOYEE ID | SALARY
------------|--------
1 | 80
2 | 50
3 | 50
There is no reason for a left join from salary to employee. Presumably, every employee_id in salary refers to a valid employee. So, this should do what you want:
select s.employee_id, sum(s.salary) as salary
from salary s
group by s.employee_id;
If you want all employees, even those who are not in the salary table, then an outer join is appropriate -- but employee should be first:
select e.id_employee, sum(s.salary) as salary
from employee e left join
salary s
on s.employee_id = e.id_employee
group by e.id_employee;
Employees not in salary will have a value of NULL.
Note that the group by condition in this query is on employee, the first table in the left join.
select employee_id, SUM(salary) as salary
from salary
left join employee on salary.employee_id=employee.id_employee
group by emplyee_id;
This is exactly what a group by clause is for. You'll have to group by the emplopyee_id and specify how you want to aggregate the salary:
SELECT employee_id, SUM(salary)
FROM salary
GROUP BY employee_id

SQL - Retrieving data within groups before and after some condition

With the two following tables:
EMPLOYEE (Fname, Lname, SSN, DNO)
DEPARTMENT (Dname, Dnumber)
For each department that has more than five employees, retrieve the
department name and the number of its employees who are making more
than $40,000
Here is an incorrect solution to this:
SELECT
dname,
COUNT(*)
FROM
Department, Employee
WHERE
dnumber = dno
AND salary > 40000
GROUP BY
dname
HAVING
COUNT(*) > 5;
It is clear that it would not list any department that have five or more employees unless they all have more than $40,000 salary, because where is applied before group by clause. which is not what we want.
Here is the correct solution:
SELECT
dname, COUNT(*)
FROM
Department, Employee
WHERE
dnumber = dno
AND salary > 40000
AND dno IN (SELECT dno
FROM Employee
GROUP BY dno
HAVING COUNT(*) > 5)
GROUP BY
dname
I cant see why is this correct?
Isn't it going to restrict the rows first with employees who have more than $40,000, then do the grouping just like the first query? what is different here?
Sub-Query, the basic:
First, let make this query a bit easier to read :
SELECT
dname,
COUNT(*)
FROM
Department,
Employee
WHERE
dnumber = dno
AND salary > 40000
AND dno IN (
SELECT dno
FROM Employee
GROUP BY dno
HAVING COUNT(*) > 5
)
GROUP BY dname
As you can see, there is what we call a "sub-query": a query inside the query.
This is the part in dno IN (/*HERE is the Sub-query*/).
As in mathematics parenthesis are run first, so SQL will go find DNO that have more than 5 employees, producing the following query :
SELECT
dname,
COUNT(*)
FROM
Department,
Employee
WHERE
dnumber = dno
AND salary > 40000
AND dno IN (
'dno10emp', 'dno24emp', 'dno45emp'
)
GROUP BY dname
Now, you find yourself with a simple query that will produce the result:
of department that have a least one employee with >40k$ salary
and are part of the department with more the 5 employee
What's wrong ?!
Well, I'll said your "good query" isn't that good, and that's why you're struggling: It'll not bring department if they don't have at least one employee with > 40k$.
Here is the query that'll do this :
SELECT
Department.dname,
COUNT(Employee.salary)
FROM
Department
LEFT JOIN Employee
ON Department.dnumber = Employee.dno
AND Employee.salary > 40000
WHERE
Department.dnumber IN (
SELECT Employee.dno
FROM Employee
GROUP BY Employee.dno
HAVING COUNT(*) > 5
)
GROUP BY Department.dname
This will bring you all department that have at least 6 employee, then count the number of employee with at least 40K$ (a department could have 0).
Could you show me ?
As an image worth a thousand word :
SQL Fiddle
MySQL 5.6 Schema Setup:
| dname | nb | salary |
|-------------------|----|--------|
| accounting | 2 | 30000 |
| accounting | 4 | 50000 |
| boss | 6 | 150000 |
| garbage-collector | 6 | 15000 |
Query 1:
SELECT
dname,
COUNT(*)
FROM
Department,
Employee
WHERE
dnumber = dno
AND salary > 40000
GROUP BY dname
HAVING COUNT(*) > 5
Results:
| dname | COUNT(*) |
|-------|----------|
| boss | 6 |
Query 2:
SELECT
dname,
COUNT(*)
FROM
Department,
Employee
WHERE
dnumber = dno
AND salary > 40000
AND
dno IN (
SELECT dno FROM Employee
GROUP BY dno
HAVING COUNT(*) > 5
)
GROUP BY dname
Results:
| dname | COUNT(*) |
|------------|----------|
| accounting | 4 |
| boss | 6 |
Query 3:
SELECT
Department.dname,
COUNT(Employee.salary)
FROM
Department
LEFT JOIN Employee
ON Department.dnumber = Employee.dno
AND Employee.salary > 40000
WHERE
Department.dnumber IN (
SELECT Employee.dno
FROM Employee
GROUP BY Employee.dno
HAVING COUNT(*) > 5
)
GROUP BY Department.dname
Results:
| dname | COUNT(Employee.salary) |
|-------------------|------------------------|
| accounting | 4 |
| boss | 6 |
| garbage-collector | 0 |
See sample data below.
http://sqlfiddle.com/#!9/357d29/2
The first query will only get departments with 6 or more highy paid employees WHILE the 2nd query will get highly paid employees of those departments with 6 or more employees. Below sample will not show in the 1st query but will show in the 2nd query.
Department Employee Salary
accounting john doe 50k
jan smith 55k
dan brown 60k
eric murphy 60k
al daniels 70k
ellen boyle 30k
1st query: nothing because only five emp have > 40k salary
2nd query: All except ellen boyle. Department has > 5 employees and all except 1 has > 40k salary
For the record, you already got correct answers. I'll just try to explain it in a different way.
Your first query has 1 select statement. It only returns employees with salary > 40k and from departments > 5 employees. Every record will only contain information about an employee with salary > 40k and from departments > 5 employees.
Your second query has 2 select statements:
This is the first one:
Select dname, count(*)
from Department, Employee
where dnumber = dno
and salary > 40000
it returns the count of all employees, by department name who earn > 40000. There are no conditions on the count(*) here. And the condition on the salary has no power over the second select statement:
SELECT Employee.dno
FROM Employee
GROUP BY Employee.dno
HAVING COUNT(*) > 5
This one returns ALL employees in all departments. This is where we have the condition on the count(*) - but it is only applied locally, to limit the number of employees per department.
And then two statements are joined together - so, first we limit the departments to the ones we are interested in, and then from those only select high-salary employees.
First, never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
I think the best and simplest solution uses conditional aggregation:
SELECT d.dname, SUM(CASE WHEN e.salary > 40000 THEN 1 ELSE 0 END) as num_40kplus
FROM Department d JOIN
Employee e
ON d.dno = e.dnumber
GROUP BY dname
HAVING COUNT(*) > 5;
I see no reason why a subquery would be necessary or desirable.

How to Improve This Self-Joins

I am learning Oracle SQL by working with its primitive HR schema where there is EMPLOYEES table which has three columns that I'm mainly interested in: MANAGER_ID, which is basically a self reference to EMPLOYEES.EMPLOYEE_ID, DEPARTMENT_ID, and SALARY. (You can find the schema diagram and schema objects here).
I wish, for each employee, to retrieve his/her SALARY, alongside of employee's manager's departmental average salary. For instance, if we have the following (EMPLOYEE_ID = 140 is the interested party here):
+-------------+--------+---------------+------------+
| EMPLOYEE_ID | SALARY | DEPARTMENT_ID | MANAGER_ID |
+-------------+--------+---------------+------------+
| 140 | 12000 | 50 | 110 |
| 110 | 20000 | 60 | 101 |
| 156 | 18000 | 60 | 101 |
| 175 | 15000 | 60 | 105 |
| 320 | 24000 | 60 | 105 |
+-------------+--------+---------------+------------+
I am interested in obtaining an average salary of all the managers (not all other non-managerial employees) in department where employee's manager works at (in this case, DEPARTMENT_ID =60), and compare it with employee's (in this case, 140). In a sample data above, the output should be:
+-------------+--------+-------------+-------------+------------+
| EMPLOYEE_ID | SALARY | AVG_MGR_SAL | MGR_DEPT_ID | MANAGER_ID |
+-------------+--------+-------------+-------------+------------+
| 140 | 12000 | 19250 | 60 | 110 |
+-------------+--------+-------------+-------------+------------+
where we have four (4) managers working in department 60, and $19250 being calculated as (20000 + 18000 + 15000 + 24000) / 4. I have come up with the following query that seems to work (and excludes those employees that don't have a manager):
select
employee_id
, salary employee_salary
, trunc(mgr_info.avg_manager_salary_per_dept, 0) emp_manager_avg_sal_dept
, mgr_info.manager_dept_id
, mgr_info.manager_id
from employees
join (
select
e1.employee_id manager_id
, e1.department_id manager_dept_id
, e1.salary manager_salary
, avg(e1.salary) over (partition by e1.department_id) avg_manager_salary_per_dept
from employees e1
join (
select distinct manager_id
from employees
where manager_id is not null
) mgr_ids
on e1.employee_id = mgr_ids.manager_id
) mgr_info
on employees.manager_id = mgr_info.manager_id
order by employee_id
However, I feel like that there should be a better way of getting the same result with fewer self-joins. Is there a way to get a better performance?
Something like this... You only need one join, you can compute the average salary for the manager's department on the "manager" copy of the table. I only included a few columns, you may need more, or fewer, but I believe the core of what you wanted is covered.
(NOTE: Edited since I realized I missed one detail in the requirement)
select e.employee_id as employee_id,
e.salary as employee_salary,
m.employee_id as manager_id,
m.department_id as manager_dept_id,
m.avg_salary as avg_sal_of_mgr_dept
from hr.employees e inner join
( select employee_id, department_id,
avg(salary) over (partition by department_id) as avg_salary
from hr.employees
where employee_id in (select manager_id from hr.employees)
) m
on e.manager_id = m.employee_id
;
Here is an option which uses a series of joins to get your result:
SELECT DISTINCT t1.EMPLOYEE_ID,
t1.SALARY,
t1.DEPARTMENT_ID,
COALESCE(t2.SALARY, 0.0) AS ManagerAvgSal
FROM employees t1
LEFT JOIN
(
SELECT e1.DEPARTMENT_ID, AVG(e1.SALARY) AS SALARY
FROM employees e1
WHERE e1.EMPLOYEE_ID IN (SELECT DISTINCT MANAGER_ID FROM employees)
GROUP BY e1.DEPARTMENT_ID
) t2
ON t1.DEPARTMENT_ID = t2.DEPARTMENT_ID