subquery and join not giving the same result - sql

1
select *
from employees
where salary > (select max(salary) from employees where department_id=50)
2
select *
from employees e left join
employees d
on e.DEPARTMENT_ID =d.DEPARTMENT_ID
where d.salary > (select max(salary) from employees where department_id=50)
why the second query is giving multiple record
i want achieve the same result as of 1st query using join.....
Thanks in Advance......

Rocky, the first select is correct. Why do you want to do any join? Without further information the objective of the second select is not clear (nonsense).
I can't see the point about joining against the same table by DEPARTMENT_ID. Anyway, the problem about duplicates is because you are joining the same two tables by a key is not pk, basically you are multiplyng each employee for all the employees of the same department. This version eliminate duplicates but still has no improvement from the first one.
select *
from employees e left join
employees d
on e.employee_ID = d.employee_ID
where d.salary > (select max(salary) from employees where department_id=50)

You are probably looking for an anti join. This is a pattern mainly used in a young DBMS where IN and EXISTS clauses are slow compared to joins, because the developers focused on joins only.
You are looking for all employees whose salaries are greater than all salaries in department 50. With other words: WHERE NOT EXISTS a salary greater or equal in department 50.
Your query can hence be written as:
select *
from employees e
where not exists
(
select null
from employees e50
where e50.department_id = 50
and e50.salary >= e.salary
);
As an anti join (an outer join where you dismiss all matches):
select *
from employees e
left join employees e50 on e50.department_id = 50 and e50.salary >= e.salary
where e50.salary is null;

Related

INNER JOIN and Count POSTGRESQL

I am learning postgresql and Inner join I have following table.
Employee
Id Name DepartmentId
1 John S. 1
2 Smith P. 1
3 Anil K. 2
Department
Department
Id Name
1 HR
2 Admin
I want to query to return the Department Name and numbers of employee in each department.
SELECT Department.name , COUNT(Employee.id) FROM Department INNER JOIN Employee ON Department.Id = Employee.DepartmentId Group BY Employee.department_id;
I dont know what I did wrong as I am new to database Query.
When involving all rows or major parts of the "many" table, it's typically faster to aggregate first and join later. Certainly the case here, since we are after counts for "each department", and there is no WHERE clause at all.
SELECT d.name, COALESCE(e.ct, 0) AS nr_employees
FROM department d
LEFT JOIN (
SELECT department_id AS id, count(*) AS ct
FROM employee
GROUP BY department_id
) e USING (id);
Also made it a LEFT [OUTER] JOIN, to keep departments without any employees in the result. And COALESCE to report 0 employees instead of NULL in that case.
Related, with more explanation:
Query with LEFT JOIN not returning rows for count of 0
Your original query would work too, after fixing the GROUP BY clause:
SELECT department.name, COUNT(employee.id)
FROM department
INNER JOIN employee ON department.id = employee.department_id
Group BY department.id; --!
That's assuming department.id is the PRIMARY KEY of the table, in which case it covers all columns of that table, including department.name. And you may want LEFT JOIN like above.
Aside: Consider legal, lower-case names exclusively in Postgres. See:
Are PostgreSQL column names case-sensitive?

practice sql explanation

http://studybyyourself.com/seminar/sql/exercises/8-3/?lang=en
Please provide data about all employees whose salary is higher or equal to the average salary of employees working in the same department (regardless if employees have left the company or not). Required attributes are last name, first name, salary and department name.
SELECT emp.Last_name, emp.First_name, emp.Salary, D.Name
FROM Employee AS emp
INNER JOIN Department AS D ON emp.Department_id = D.ID
WHERE emp.Salary >=
(
SELECT AVG(e.Salary)
FROM Employee AS e
GROUP BY e.Department_id
HAVING e.Department_id = emp.Department_id
)
Can anyone please help explain the solution? Specifically, what does the 'having' clause do in this case that allows the sub-query to work? I get stuck up until that point without the having clause and I expectedly get the 'subquery returns more than 1 row' error but I am not sure how the having clause is fixing the problem.

JOIN - 2 tasks - sql developer

I have 2 tasks:
1. FIRST TASK
Show first_name, last_name (from employees), job_title, employee_id (from jobs) start_date, end_date (from job_history)
My idea:
SELECT s.employee_id
, first_name
, last_name
, job_title
, employee_id
, start_date
, end_date
FROM employees
INNER JOIN jobs hp
on s.employee_id = hp.employee_id
INNER JOIN job_history
on hp.jobs = h.jobs
I know it doesn't work. I'm receiving: "HP"."EMPLOYEE_ID": invalid identifier
What does it mean "on s.employee_id = hp.employee_id". Maybe I should write sthg else instead of this.
2. SECOND TASK
Show department_name (from departments), average and max salary for each department (those data are from employees) and how many employees are working in those departments (from employees). Choose only departments with more than 1 person. The result round to 2 decimal places.
I have the pieces, but i don't know to connect it
My idea:
SELECT department_name,average(salary),max(salary),count(employees_id)
FROM employees
INNER JOIN departments
on employees_id = departments_id
HAVING count(department) > 1
SELECT ROUND(average(salary),2) from employees
I modified your queries a bit by improving table aliasing. Hopefully, if the right columns are present in the tables as you say, it should work:
SELECT s.employee_id, s.first_name, s.last_name,
hp.job_title, hp.employee_id,
h.start_date, h.end_date
FROM employees s
INNER JOIN jobs hp
on s.employee_id = hp.employee_id
INNER JOIN job_history h
on hp.jobs = h.jobs;
When we say on s.employee_id = hp.employee_id it means that if, for example, there is an employee_id = 1234 present in both the tables employees and jobs, then SQL will bring all the columns from both the tables in the same line that corresponds to employee_id = 1234. You can now pick different columns in the SELECT clause as if they are in the same/single table(which was not the case before joining). This is the main logic behind SQL joins.
As to your 2nd task, try the below query. I made some modifications in aggregation by introducing COUNT(DISTINCT s.employees_id). If the same employees_id is present twice for some reason, you still want to count that as one person.
SELECT d.department_name, avg(s.salary), max(s.salary), count(distinct s.employees_id)
FROM employees s
INNER JOIN departments d
on e.employees_id = d.departments_id
GROUP BY d.department_name
HAVING COUNT(DISTINCT s.employees_id) > 1;
Let me know if there is still any issue. Hopefully, this works.

Employees without subordinates

I'm trying to print all the employees that don't have subordinates.
I have been thinking about a tree data structure. Practically, most of the employees have subordonates (those are called managers). The only ones without subordonates are the leafs (they don't have any children).
However, I don't understand how can I select the leafs from this tree.
--following prints employees without manager.
SELECT e.employee_id, e.last_name, e.first_name
FROM employees e
WHERE e.employee_id = (SELECT employee_id FROM employees WHERE manager_id IS NULL AND employee_id = e.employee_id);
In short, you want to select all employees, who don't act as managers for other employees. That means, you want to select such employees, whose employee_id is not used as manager_id for any other employee.
Try this:
SELECT *
FROM employees e
WHERE NOT EXISTS (SELECT 1
FROM employees e2
WHERE e2.manager_id = e.employee_id)
You can do this via an outer join:
SELECT e.employee_id, e.last_name, e.first_name
FROM
employees e
LEFT JOIN employees sub
ON e.employee_id = sub.manager_id
WHERE sub.manager_id IS NULL
The filter condition selects only those rows of the left table that have no matching rows in the right table.
This is preferable to filtering via a correlated subquery, as the latter may require performing the subquery separately for every single employee row. (If the query planner avoids that, it will be by transforming it into an equivalent of the outer join.)
SELECT e.employee_ID, e.last_name, e.First_name, CONNECT_BY_ISLEAF "IsLeaf",
LEVEL, SYS_CONNECT_BY_PATH(e.employee_ID, '/') "Path"
FROM employees e
CONNECT BY PRIOR E.employeeID = E.Manager_ID;
where isLeaf =1
Basically stolen from help docs:
http://docs.oracle.com/cd/B12037_01/server.101/b10759/pseudocolumns001.htm#i1007332
or another stack question: get ALL last level children (leafs) from a node (hierarhical queries Oracle 11G)
SELECT *
FROM employees e
WHERE e.employee_id NOT IN ( SELECT nvl(manager_id, 0)
FROM employees );

Representing 'not in' subquery as join

I am trying to convert the following query:
select *
from employees
where emp_id not in (select distinct emp_id from managers);
into a form where I represent the subquery as a join. I tried doing:
select *
from employees a, (select distinct emp_id from managers) b
where a.emp_id!=b.emp_id;
I also tried:
select *
from employees a, (select distinct emp_id from managers) b
where a.emp_id not in b.emp_id;
But it does not give the same result. I have tried the 'INNER JOIN' syntax as well, but to no avail. I have become frustrated with this seemingly simple problem. Any help would be appreciated.
Assume employee Data set of
Emp_ID
1
2
3
4
5
6
7
Assume Manger data set of
Emp_ID
1
2
3
4
5
8
9
select *
from employees
where emp_id not in (select distinct emp_id from managers);
The above isn't joining tables so no Cartesian product is generated... you just have 7 records you're looking at...
The above would result in 6 and 7 Why? only 6 and 7 from Employee Data isn't in the managers table. 8,9 in managers is ignored as you're only returning data from employee.
select *
from employees a, (select distinct emp_id from managers) b
where a.emp_id!=b.emp_id;
The above didnt' work because a Cartesian product is generated... All of Employee to all of Manager (assuming 7 records in each table 7*7=49)
so instead of just evaluating the employee data like you were in the first query. Now you also evaluate all managers to all employees
so Select * results in
1,1
1,2
1,3
1,4
1,5
1,8
1,9
2,1
2,2...
Less the where clause matches...
so 7*7-7 or 42. and while this may be the answer to the life universe and everything in it, it's not what you wanted.
I also tried:
select *
from employees a, (select distinct emp_id from managers) b
where a.emp_id not in b.emp_id;
Again a Cartesian... All of Employee to ALL OF Managers
So this is why a left join works
SELECT e.*
FROM employees e
LEFT OUTER JOIN managers m
on e.emp_id = m.emp_id
WHERE m.emp_id is null
This says join on ID first... so don't generate a Cartesian but actually join on a value to limit the results. but since it's a LEFT join return EVERYTHING from the LEFT table (employee) and only those that match from manager.
so in our example would be returned as e.emp_Di = m.Emp_ID
1,1
2,2
3,3
4,4
5,5
6,NULL
7,NULL
now the where clause so
6,Null
7,NULL are retained...
older ansii SQL standards for left joins would have been *= in the where clause...
select *
from employees a, managers b
where a.emp_id *= b.emp_id --I never remember if the * is the LEFT so it may be =*
and b.emp_ID is null;
But I find this notation harder to read as the join can get mixed in with the other limiting criteria...
Try this:
select e.*
from employees e
left join managers m on e.emp_id = m.emp_id
where m.emp_id is null
This will join the two tables. Then we discard all rows where we found a matching manager and are left with employees who aren't managers.
Your best bet would probably be a left join:
select
e.*
from employees e
left join managers m on e.emp_id = m.emp_id
where
m.emp_id is null;
The idea here is you're saying that you want to select everything from employees, including anything that matches in the manager table based on emp_id and then filtering out the rows that actually have something in the manager table.
Use Left Outer Join instead
select e.*
from employees e
left outer join managers m
on e.emp_id = m.emp_id
where m.emp_id is null
left outer join will preserve the rows from m table even if they do not have a match i e table based on the emp_id field. The we filter on where m.emp_id is null - give me all the rows from e where there's no matching record in m table.
A bit more on the subject can be found here:
Visual representation of joins
from employees a, (select distinct emp_id from managers) b implies cross join - all posible combinations between tables (and you needed left outer join instead)
The MINUS keyword should do the trick:
SELECT e.* FROM employees e
MINUS
Select m.* FROM managers m
Hope that helps...
select *
from employees
where Not (emp_id in (select distinct emp_id from managers));