Difference between old and new SQL JOINS - sql

I am currently learing SQL and I can't understand why these two queries return different numbers of rows (the first one returns 53, while the second returns 69).
SELECT d.department_name "dept_name",
j.job_title "job_title",
e.manager_id "manager_id",
MAX(e.salary) "max_salary",
SUM(e.salary) "sum_salary"
FROM employees e, jobs j, departments d
WHERE j.job_id = e.job_id AND
d.department_id(+) = e.department_id
GROUP BY GROUPING SETS( (d.department_name, j.job_title), (j.job_title,e.manager_id), ());
And the second one:
SELECT d.department_name AS "dept_name",
j.job_title AS "job_title",
e.manager_id AS "manager_id",
MAX(e.salary) AS "max_salary",
SUM(e.salary) AS "sum_salary"
FROM employees e
INNER JOIN jobs j ON j.job_id = e.job_id
RIGHT OUTER JOIN departments d ON d.department_id = e.department_id
GROUP BY GROUPING SETS( (d.department_name, j.job_title), (j.job_title,e.manager_id), ());
Thank you for your help!

Your queries differ because in the first one you outer join table departments, whereas in the second you outer join jobs and employees. (Which is why I don't like right outer joins so much. I don't find them very readable.) You take employees, join with jobs and then by RIGHT OUTER JOIN you say: and give me all departments anyhow, so give me additional (outer joined records) for jobs and employees. The second statement is equal to:
SELECT d.department_name AS "dept_name",
j.job_title AS "job_title",
e.manager_id AS "manager_id",
MAX(e.salary) AS "max_salary",
SUM(e.salary) AS "sum_salary"
FROM departments d
LEFT OUTER JOIN employees e ON e.department_id = d.department_id
LEFT OUTER JOIN jobs j ON j.job_id = e.job_id
GROUP BY GROUPING SETS( (d.department_name, j.job_title), (j.job_title,e.manager_id), ());
In old Oracle syntax you would write:
SELECT d.department_name "dept_name",
j.job_title "job_title",
e.manager_id "manager_id",
MAX(e.salary) "max_salary",
SUM(e.salary) "sum_salary"
FROM employees e, jobs j, departments d
WHERE e.department_id(+) = d.department_id
AND j.job_id(+) = e.job_id;

In old-style joins the syntax
FROM employees e, departments d
WHERE d.department_id(+) = e.department_id
is a left join and is equal to:
FROM employees e
LEFT JOIN departments d ON d.department_id = e.department_id
For these queries two work exactly the same, you should also change the RIGHT JOIN to LEFT JOIN in the second query, or d.department_id(+) = e.department_id to d.department_id = e.department_id(+) in the first.

Related

How can I write this query correctly?

How can I write a query that gives the country name, city, postal code, street address and the number of departments where at least 2 employees work? Below is the query I wrote, but I get "not a GROUP BY expression" error as a result of the query.
SELECT k.COUNTRY_NAME,
l.CITY,
l.POSTAL_CODE,
l.STREET_ADDRESS,
e.DEPARTMENT_ID,
COUNT(EMPLOYEE_ID)
FROM hr.employees e
JOIN hr.departments c
ON (c.DEPARTMENT_ID = e.DEPARTMENT_ID)
JOIN hr.locations l
ON (c.LOCATION_ID = l.LOCATION_ID)
JOIN hr.countries k
ON (k.COUNTRY_ID = l.COUNTRY_ID)
GROUP BY e.DEPARTMENT_ID
HAVING COUNT(EMPLOYEE_ID) > 2;
Because all non-aggregated columns(c.country_name,l.city,l.postal_code,l.street_address,e.department_id) should be listed within the GROUP BY list which is not suitable for your case. Rather use COUNT(.) OVER (..) analytic function with PARTITION BY e.department_id option in order to group by department_id column such as
SELECT DISTINCT *
FROM
(
SELECT c.country_name,
l.city,
l.postal_code,
l.street_address,
e.department_id,
COUNT(e.employee_id) OVER (PARTITION BY e.department_id) AS count
FROM hr.employees e
JOIN hr.departments d
ON d.department_id = e.department_id
JOIN hr.locations l
ON d.location_id = l.location_id
JOIN hr.countries c
ON c.country_id = l.country_id
)
WHERE count >= 2 -- equality is added considering "at least" 2
ORDER BY count
Btw, the parentheses next to the ON clause are redundant
You would first get the set of all departments that have at least 2 employees as follows(atleast_two)
After that you would join the data with the rest of your query and pull the attributes of interest.
with atleast_two
as (select c.DEPARTMENT_ID
,count(employee_id) as cnt_employees
from hr.employees e
join hr.departments c
on (c.DEPARTMENT_ID=e.DEPARTMENT_ID)
group by c.deptid
having count(employee_id)>2
)
select k.COUNTRY_NAME
, l.CITY
, l.POSTAL_CODE
, l.STREET_ADDRESS
, e.DEPARTMENT_ID
, c.cnt_employees
from hr.employees e
join atleast_two c
on (c.DEPARTMENT_ID=e.DEPARTMENT_ID)
join hr.locations l
on (c.LOCATION_ID=l.LOCATION_ID)
join hr.countries k
on (k.COUNTRY_ID=l.COUNTRY_ID);

What are Oracle's old-syntax join equivalents of these queries?

What are the equivalent joins written in the Oracle's old join syntax of these queries?
SELECT first_name, last_name, department_name, job_title
FROM employees e RIGHT JOIN departments d
ON(e.department_id = d.department_id)
RIGHT JOIN jobs j USING(job_id);
-->106 rows returned
SELECT first_name, last_name, department_name, job_title
FROM employees e RIGHT JOIN jobs j
ON(e.job_id = j.job_id)
RIGHT JOIN departments d
USING(department_id);
--> 122 rows returned
I would do something like this (for the first query) - making explicit the fact that a multiple join is, by definition, an iteration of joins of two tables (or more generally "rowsets") at a time. Think of it as "using parentheses explicitly".
select first_name, last_name, department_name, job_title
from (
select first_name, last_name, job_id, department_name
from employees e, departments d
where e.department_id (+) = d.department_id
) sq
, jobs j
where sq.job_id (+) = j.job_id
;
This can be rewritten (perhaps) using a single SELECT statement, with more WHERE conditions - but the query will be less readable; it wont' be quite as clear what it is doing.
Respectively:
SELECT first_name,
last_name,
department_name,
job_title
FROM employees e,
jobs j,
departments d
WHERE e.job_id (+) = j.job_id
AND e.department_id = d.department_id (+);
and:
SELECT first_name,
last_name,
department_name,
job_title
FROM employees e,
departments d,
jobs j
WHERE e.department_id (+) = d.department_id
AND e.job_id = j.job_id (+);
db<>fiddle here
However, please just use the ANSI join syntax. The old legacy join syntax is confusing to read and you will get errors from putting the (+) on the wrong side of the join condition and you should be teaching people how to use the less-confusing, "new" (its hard to call it new when its been around since Oracle 9i in 2001) syntax rather than reverting to old methods.
Just to add to Mathguy's answer, this is interesting because those innocent-looking right joins are not what they seem. My first (incorrect) attempt was this:
select e.department_id, e.job_id, e.first_name, e.last_name, d.department_name
from jobs j
, departments d
, employees e
where e.job_id(+) = j.job_id
and e.department_id(+) = d.department_id;
but as Mathguy points out it gives different results because of the departments with no employees and the cross join between departments and jobs, and a subtle join precedence effect that appears as a result of the right joins not being in one chain.
I'm not sure what the intention of the original query is. Using the Oracle HR demo schema, the results are the same as an inner join, but only because every job has at least one employee. This illustrates a pitfall in testing outer join queries, as you might run a test, get the same results, and think your rewrite was logically the same thing when it is not.
If you rewrite the original right joins as left joins, it would have to become something like this:
select e.department_id, e.job_id, e.first_name, e.last_name, d.department_name
from jobs j
left join (
departments d
left join employees e on e.department_id = d.department_id
)
on e.job_id = j.job_id;
(You could also expand the departments > employees join into an inline view or with clause, or use an outer apply construction to include the job_id join.)
This is because the two right joins in the original query are driven from jobs and departments, so even though the outer join from departments to employees includes the 16 departments with no employees, once we outer join from jobs to that, we implicitly exclude rows with no job_id, because we are driving it from jobs. So the outer join to departments is filtered to become in effect an inner join, and so long as all jobs have corresponding employees then that gives the same results as an inner join too. To see the difference you would have to insert another job, which adds a row in the results with the job title but no employee details.
Therefore the old-style version needs to be either this:
select de.first_name, de.last_name, de.department_name, j.job_title
from jobs j
, lateral (
select e.department_id, e.job_id, e.first_name, e.last_name, d.department_name
from departments d
, employees e
where e.department_id(+) = d.department_id
) de
where de.job_id(+) = j.job_id;
or without lateral:
select first_name, last_name, department_name, job_title
from jobs j
, ( select e.first_name, e.last_name, e.job_id, d.department_name
from departments d, employees e
where e.department_id (+) = d.department_id ) de
where de.job_id(+) = j.job_id
The second query just switches jobs and departments:
select first_name, last_name, department_name, job_title
from departments d
, ( select e.first_name, e.last_name, e.department_id, e.job_id, j.job_title
from jobs j, employees e
where e.job_id(+) = j.job_id ) je
where je.department_id(+) = d.department_id

How to debug this SQL query?

I run this code and it does not work (I am expecting a result set but I do not get one):
select e.employee_id,e.first_name,e.last_name,e.salary,d.department_name,l.city from employees e
join departments d on e.department_id = d.department_id
join locations l on l.location_id = d.location_id
where e.salary = select max(salary) from employees where hire_date between '2002-01-01' and '2003-12-31';
however if I run the queries
select max(salary) from employees where hire_date between '2002-01-01' and '2003-12-31';
and
select e.employee_id,e.first_name,e.last_name,e.salary,d.department_name,l.city from employees e
join departments d on e.department_id = d.department_id
join locations l on l.location_id = d.location_id
where e.salary = 24000.00
they run fine. max(salary) from the second query is 24000.00.
This is the website where I am trying to practice (question no 34)
https://www.w3resource.com/sql-exercises/sql-subqueries-exercises.php
You are missing parenthesis for sub-query
SELECT e.employee_id,
e.first_name,
e.last_name,
e.salary,
d.department_name,
l.city
FROM employees e
JOIN departments d
ON e.department_id = d.department_id
JOIN locations l
ON l.location_id = d.location_id
WHERE e.salary = (SELECT Max(salary)
FROM employees
WHERE hire_date BETWEEN '2002-01-01' AND '2003-12-31');
Please use "IN" clause instead of "=" while filtering using a Subquery
SELECT e.employee_id,
e.first_name,
e.last_name,
e.salary,
d.department_name,
l.city
FROM employees e
JOIN departments d
ON e.department_id = d.department_id
JOIN locations l
ON l.location_id = d.location_id
WHERE e.salary IN (SELECT Max(salary)
FROM employees
WHERE hire_date BETWEEN '2002-01-01' AND '2003-12-31');

SQL group functions using joins

Problem:
Create a list of department names, the manager id,
manager name (employee last name) of that department, and the average salary in each
department.
SELECT d.department_name, d.manager_id, AVG(e.salary)
FROM employees e
INNER JOIN departments d ON (e.department_id = d.department_id)
GROUP BY d.department_name, d.manager_id;
And it works nice, but when I add the e.last_name, I get all the last names from employees table.
I do believe the answer to be out here and not quite far, although out of my reach at this point.
In order to pull the name of the manager, you need to join employees again, this time on d.manager_id:
SELECT d.department_name, d.manager_id, m.name, AVG(e.salary)
FROM employees e
INNER JOIN departments d ON (e.department_id = d.department_id)
LEFT OUTER JOIN employees m ON (m.employee_id = d.manager_id)
GROUP BY d.department_name, d.manager_id, m.name;
The kind of join (inner or outer) is not essential here, because you group by d.manager_id.
It looks like you need to join d.manager_id to employees again to get the managers last_name:
SELECT d.department_name, d.manager_id, e2.last_name, AVG(e.salary)
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
INNER JOIN employees e2 ON d.manager_id = e2.employee_id
GROUP BY d.department_name, d.manager_id, e2.last_name

How can I show in a single SQL query: What departments don't have any employees and what employees don't have a department (Oracle SQL)?

I've been trying to figure this out for a while and still no luck. Would I combine the following with a 'is null'?
select distinct
e.employee_id, e.last_name, e.department_id,
d.department_id, d.location_id
from employees e
join departments d on (e.department_id = d.department_id)
Example based on Joe's comment
select distinct
e.employee_id, e.last_name, e.department_id,
d.department_id, d.location_id
from employees e
full outer join departments d on (e.department_id = d.department_id)
where e.department_id is null or d.department_id is null
Or this way with union
select distinct
e.employee_id, e.last_name, e.department_id,
d.department_id, d.location_id
from employees e
left outer join departments d on (e.department_id = d.department_id)
where d.department_id is null
union
select distinct
e.employee_id, e.last_name, e.department_id,
d.department_id, d.location_id
from employees e
right outer join departments d on (e.department_id = d.department_id)
where e.department_id is null