Some results disappear after GROUP BY - sql

So I have a table of Employment and I select everyone whose ID is repeated,
SELECT e.emp_id, e.first_name, e.last_name
FROM Employment Emp
JOIN Employee e
ON e.emp_id = Emp.emp_id
JOIN Department d
ON Emp.dept_id = d.dept_id
GROUP BY e.emp_id, e.first_name, e.last_name
HAVING count(Emp.emp_id) > 1
But after I use d.name (department name) in SELECT and GROUP BY some employees disappear.
SELECT e.emp_id, e.first_name, e.last_name, d.name
FROM Employment Emp
JOIN Employee e
ON e.emp_id = Emp.emp_id
JOIN Department d
ON Emp.dept_id = d.dept_id
GROUP BY e.emp_id, e.first_name, e.last_name,d.name
HAVING count(Emp.emp_id) > 1
I want to get all employee departments who are in first table.

You can use a window function instead:
SELECT ed.*
FROM (SELECT e.emp_id, e.first_name, e.last_name, d.name,
COUNT(*) OVER (PARTITION BY e.emp_id, e.first_name, e.last_name) as cnt
FROM Employment Emp JOIN
Employee e
ON e.emp_id = Emp.emp_id JOIN
Department d
ON Emp.dept_id = d.dept_id
) ed
WHERE cnt > 1
ORDER BY ed.emp_id, ed.first_name, ed.last_name;

Your two queries ask different questions. The second count() is more granulated (+ by department.name) and looks like no one of "disappeard" persons is employed at the same department twice or more.

Related

Oracle: range of standard deviation of salary in correlated subquery

I am using an Oracle Developer Database and I have the two tables, Employees and Departments :
I am supposed to find (using a correlated subquery) for each department all the employees, which have a salary within the range of the salary-standard deviation of every single department.
This is what I have tried so far:
SELECT d.department_name, e.employee_id, e.last_name, AVG(e.salary), e.salary
FROM hr.employees e
JOIN hr.departments d
on e.department_id = d.department_id
WHERE e.salary IN
(SELECT salary
FROM hr.employees
WHERE salary >
(SELECT ROUND(AVG(e.salary)-STDDEV(e.salary),2)
FROM hr.employees) AND salary < (SELECT ROUND(AVG(e.salary) + STDDEV(e.salary),2) FROM hr.employees)
GROUP BY d.department_name,
e.employee_id, e.last_name, e.salary;
Even though this query has the correct syntax, it does not show any result.
This other (partial) approach works but how can I query at the same time the employee table?
SELECT d.department_name, e.department_id, ROUND(AVG(e.salary) + STDDEV(e.salary), 2)
AS standard_deviation_max,
ROUND(AVG(e.salary) - STDDEV(e.salary), 2)
AS standard_deviation_min
FROM hr.employees e
JOIN hr.departments d
on e.department_id = d.department_id
GROUP BY department_name, e.department_id;
Since I am new to SQL, I would really appreciate any hint. Thank you in advance
You can use your query to find the employee details and then correlate the salary to the average ± the standard deviation:
SELECT d.department_name,
e.employee_id,
e.last_name,
e.salary
FROM hr.employees e
JOIN hr.departments d
on e.department_id = d.department_id
WHERE EXISTS (
SELECT 1
FROM hr.employees x
HAVING e.salary BETWEEN AVG(e.salary) + STDDEV(e.salary)
AND AVG(e.salary) + STDDEV(e.salary)
);
or, you can use analytic functions:
SELECT department_name,
employee_id,
last_name,
salary
FROM (
SELECT d.department_name,
e.employee_id,
e.last_name,
e.salary,
AVG(e.salary) OVER () AS avg_salary,
STDDEV(e.salary) OVER () AS stddev_salary
FROM hr.employees e
JOIN hr.departments d
on e.department_id = d.department_id
)
WHERE salary BETWEEN avg_salary - stddev_salary
AND avg_salary + stddev_salary;

i am not getting error missing keyword in my code

select e.department_id,e.salary,d.department_name
from employees e
join departments d
on d.department_id=e.department_id
inner join
(
select department_id,max(salary) as max_sal from employees
group by department_id
) as t
on e.department_id=t.department_id
where e.salary =t.max_sal;
Oracle does not support as for table aliases, so you can try:
select e.department_id, e.salary, d.department_name
from employees e join
departments d
on d.department_id = e.department_id inner join
(select department_id, max(salary) as max_sal
from employees
group by department_id
) t
on e.department_id = t.department_id
where e.salary = t.max_sal;
Of course, this would be better written using window functions, but this answers the question that you asked.

Highest paid employee + average salary of the departments

An employee belongs to a department (foreign key = D_ID). An employee has a SSN (primary key), name, salary and D_ID.
A department has multiple employees (primary key = ID)
I want a query that returns Department name| Name of Highest Paid Employee of that department | His Salary | Average salary of employees working in the same department.
I know how to select the first part:
SELECT
D.name, E.name, E.salary
FROM
Employee E, Department D
WHERE
salary IN (SELECT MAX(E.salary)
FROM Employee E
GROUP BY E.D_ID)
AND E.D_ID = D.ID
I know also how to select the last part:
SELECT AVG(E.salary)
FROM Employee E
GROUP BY E.D_ID
How do I put these together in a single query?
You can use window functions for that:
select department_name, employee_name, salary, avg_dept_salary
from (
select e.name as employee_name,
d.name as department_name,
e.salary,
max(e.salary) over (partition by d.id) as max_dept_salary,
avg(e.salary) over (partition by d.id) as avg_dept_salary
from Employee E
join Department D on e.d_id = d.id
) t
where salary = max_dept_salary
order by department_name;
The above is standard ANSI SQL and runs on all modern DBMS.
I would do something like this:
SELECT d.name
, e.name
, e.salary
, n.avg_salary
FROM Department d
JOIN ( SELECT m.d_id
, MAX(m.salary) AS max_salary
, AVG(m.salary) AS avg_salary
FROM Employee m
GROUP BY m.d_id
) n
ON n.d_id = d.id
JOIN Employee E
ON e.d_id = d.id
AND e.salary = n.max_salary

How do I combine two working SELECT queries into one query?

I have two queries which work independently of one another;
SELECT e.employee_id,
e.first_name,
e.last_name,
e.job_id,
e.salary,
e.commission_pct, e.manager_id,
e.department_id,
j.start_date,
j.end_date
FROM hr.employees e
LEFT OUTER JOIN hr.job_history j
ON e.employee_id = j.employee_id
WHERE commission_pct IS NULL
This first one recalls the information from two different tables, hr.employees and hr.job_history.
Here is the second;
SELECT e.employee_id,
e.last_name,
m.employee_id "MgrNo",
m.last_name "MgrName"
FROM hr.employees e
INNER JOIN hr.employees m ON e.manager_id=m.employee_id
This is to link employee_id with manager_id to display each employee's manager surname. However, when I try to include the two together I keep getting an error telling me I have an invalid prefix. Any ideas?
"I keep getting an error telling me I have an invalid prefix."
That is a typo. We can't spot it in a version of the query you haven't posted. But basically you just need another join in the first query like this:
SELECT e.employee_id,
e.first_name,
e.last_name,
e.job_id,
e.salary,
e.commission_pct,
m.employee_id "MgrNo",
m.last_name "MgrName",
e.department_id,
j.start_date,
j.end_date
FROM hr.employees e
LEFT OUTER JOIN hr.job_history j
ON e.employee_id = j.employee_id
LEFT OUTER JOIN hr.employees m
ON m.employee_id = e.manager_id
WHERE e.commission_pct IS NULL
I suggest using an outer join, because normally not every employee has a manager (depending on how you have implemented the hierarchy).
Try:
WITH temptable AS (
SELECT e.employee_id, e.last_name, m.employee_id "MgrNo", m.last_name "MgrName"
FROM hr.employees e
INNER JOIN hr.employees m ON
e.manager_id=m.employee_id)
SELECT e.employee_id, e.first_name, e.last_name, e.job_id, e.salary, e.commission_pct, e.manager_id, e.department_id, j.start_date, j.end_date, t.MgrNo, t.MgrName
FROM hr.employees e
LEFT OUTER JOIN hr.job_history j ON e.employee_id = j.employee_id
LEFT JOIN temptable t ON t.employee_id = e.employee_id
WHERE commission_pct IS NULL
This uses one as a subquery then joins its key on your first query's key.
It would seem that you are attempting to create a query which will produce employee salary/commission history with their manager info. You can combine the queries like this:
SELECT e.employee_id, e.first_name, e.last_name, e.job_id, e.salary,
commission_pct, e.manager_id, e.department_id, j.start_date, j.end_date,
m.employee_id "MgrNo", m.last_name "MgrName"
FROM hr.employees e
INNER JOIN hr.employees m ON e.manager_id=m.employee_id
LEFT OUTER JOIN hr.job_history j ON e.employee_id = j.employee_id
WHERE commission_pct IS NULL

Employees with largest salary in department

I found a couple of SQL tasks on Hacker News today, however I am stuck on solving the second task in Postgres, which I'll describe here:
You have the following, simple table structure:
List the employees who have the biggest salary in their respective departments.
I set up an SQL Fiddle here for you to play with. It should return Terry Robinson, Laura White. Along with their names it should have their salary and department name.
Furthermore, I'd be curious to know of a query which would return Terry Robinsons (maximum salary from the Sales department) and Laura White (maximum salary in the Marketing department) and an empty row for the IT department, with null as the employee; explicitly stating that there are no employees (thus nobody with the highest salary) in that department.
Return one employee with the highest salary per dept.
Use DISTINCT ON for a much simpler and faster query that does all you are asking for:
SELECT DISTINCT ON (d.id)
d.id AS department_id, d.name AS department
,e.id AS employee_id, e.name AS employee, e.salary
FROM departments d
LEFT JOIN employees e ON e.department_id = d.id
ORDER BY d.id, e.salary DESC;
->SQLfiddle (for Postgres).
Also note the LEFT [OUTER] JOIN that keeps departments with no employees in the result.
This picks only one employee per department. If there are multiple sharing the highest salary, you can add more ORDER BY items to pick one in particular. Else, an arbitrary one is picked from peers.
If there are no employees, the department is still listed, with NULL values for employee columns.
You can simply add any columns you need in the SELECT list.
Find a detailed explanation, links and a benchmark for the technique in this related answer:
Select first row in each GROUP BY group?
Aside: It is an anti-pattern to use non-descriptive column names like name or id. Should be employee_id, employee etc.
Return all employees with the highest salary per dept.
Use the window function rank() (like #Scotch already posted, just simpler and faster):
SELECT d.name AS department, e.employee, e.salary
FROM departments d
LEFT JOIN (
SELECT name AS employee, salary, department_id
,rank() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rnk
FROM employees e
) e ON e.department_id = d.department_id AND e.rnk = 1;
Same result as with the above query with your example (which has no ties), just a bit slower.
This is with reference to your fiddle:
SELECT * -- or whatever is your columns list.
FROM employees e JOIN departments d ON e.Department_ID = d.id
WHERE (e.Department_ID, e.Salary) IN (SELECT Department_ID, MAX(Salary)
FROM employees
GROUP BY Department_ID)
EDIT :
As mentioned in a comment below, if you want to see the IT department also, with all NULL for the employee records, you can use the RIGHT JOIN and put the filter condition in the joining clause itself as follows:
SELECT e.name, e.salary, d.name -- or whatever is your columns list.
FROM employees e RIGHT JOIN departments d ON e.Department_ID = d.id
AND (e.Department_ID, e.Salary) IN (SELECT Department_ID, MAX(Salary)
FROM employees
GROUP BY Department_ID)
This is basically what you want. Rank() Over
SELECT ename ,
departments.name
FROM ( SELECT ename ,
dname
FROM ( SELECT employees.name as ename ,
departments.name as dname ,
rank() over (
PARTITION BY employees.department_id
ORDER BY employees.salary DESC
)
FROM Employees
JOIN Departments on employees.department_id = departments.id
) t
WHERE rank = 1
) s
RIGHT JOIN departments on s.dname = departments.name
Good old classic sql:
select e1.name, e1.salary, e1.department_id
from employees e1
where e1.salary=
(select maxsalary=max(e.salary) --, e. department_id
from employees e
where e.department_id = e1.department_id
group by e.department_id
)
Table1 is emp - empno, ename, sal, deptno
Table2 is dept - deptno, dname.
Query could be (includes ties & runs on 11.2g):
select e1.empno, e1.ename, e1.sal, e1.deptno as department
from emp e1
where e1.sal in
(SELECT max(sal) from emp e, dept d where e.deptno = d.deptno group by d.dname)
order by e1.deptno asc;
SELECT
e.first_name, d.department_name, e.salary
FROM
employees e
JOIN
departments d
ON
(e.department_id = d.department_id)
WHERE
e.first_name
IN
(SELECT TOP 2
first_name
FROM
employees
WHERE
department_id = d.department_id);
`select d.Name, e.Name, e.Salary from Employees e, Departments d,
(select DepartmentId as DeptId, max(Salary) as Salary
from Employees e
group by DepartmentId) m
where m.Salary = e.Salary
and m.DeptId = e.DepartmentId
and e.DepartmentId = d.DepartmentId`
The max salary of each department is computed in inner query using GROUP BY. And then select employees who satisfy those constraints.
Assuming Postgres
Return highest salary with employee details, assuming table name emp having employees department with dept_id
select e1.* from emp e1 inner join (select max(sal) avg_sal,dept_id from emp group by dept_id) as e2 on e1.dept_id=e2.dept_id and e1.sal=e2.avg_sal
Returns one or more people for each department with the highest salary:
SELECT result.Name Department, Employee2.Name Employee, result.salary Salary
FROM ( SELECT dept.name, dept.department_id, max(Employee1.salary) salary
FROM Departments dept
JOIN Employees Employee1 ON Employee1.department_id = dept.department_id
GROUP BY dept.name, dept.department_id ) result
JOIN Employees Employee2 ON Employee2.department_id = result.department_id
WHERE Employee2.salary = result.salary
SQL query:
select d.name,e.name,e.salary
from employees e, depts d
where e.dept_id = d.id
and (d.id,e.salary) in
(select dept_id,max(salary) from employees group by dept_id);
Take look at this solution
SELECT
MAX(E.SALARY),
E.NAME,
D.NAME as Department
FROM employees E
INNER JOIN DEPARTMENTS D ON D.ID = E.DEPARTMENT_ID
GROUP BY D.NAME