How can I write this query correctly? - sql

How can I write a query that gives the country name, city, postal code, street address and the number of departments where at least 2 employees work? Below is the query I wrote, but I get "not a GROUP BY expression" error as a result of the query.
SELECT k.COUNTRY_NAME,
l.CITY,
l.POSTAL_CODE,
l.STREET_ADDRESS,
e.DEPARTMENT_ID,
COUNT(EMPLOYEE_ID)
FROM hr.employees e
JOIN hr.departments c
ON (c.DEPARTMENT_ID = e.DEPARTMENT_ID)
JOIN hr.locations l
ON (c.LOCATION_ID = l.LOCATION_ID)
JOIN hr.countries k
ON (k.COUNTRY_ID = l.COUNTRY_ID)
GROUP BY e.DEPARTMENT_ID
HAVING COUNT(EMPLOYEE_ID) > 2;

Because all non-aggregated columns(c.country_name,l.city,l.postal_code,l.street_address,e.department_id) should be listed within the GROUP BY list which is not suitable for your case. Rather use COUNT(.) OVER (..) analytic function with PARTITION BY e.department_id option in order to group by department_id column such as
SELECT DISTINCT *
FROM
(
SELECT c.country_name,
l.city,
l.postal_code,
l.street_address,
e.department_id,
COUNT(e.employee_id) OVER (PARTITION BY e.department_id) AS count
FROM hr.employees e
JOIN hr.departments d
ON d.department_id = e.department_id
JOIN hr.locations l
ON d.location_id = l.location_id
JOIN hr.countries c
ON c.country_id = l.country_id
)
WHERE count >= 2 -- equality is added considering "at least" 2
ORDER BY count
Btw, the parentheses next to the ON clause are redundant

You would first get the set of all departments that have at least 2 employees as follows(atleast_two)
After that you would join the data with the rest of your query and pull the attributes of interest.
with atleast_two
as (select c.DEPARTMENT_ID
,count(employee_id) as cnt_employees
from hr.employees e
join hr.departments c
on (c.DEPARTMENT_ID=e.DEPARTMENT_ID)
group by c.deptid
having count(employee_id)>2
)
select k.COUNTRY_NAME
, l.CITY
, l.POSTAL_CODE
, l.STREET_ADDRESS
, e.DEPARTMENT_ID
, c.cnt_employees
from hr.employees e
join atleast_two c
on (c.DEPARTMENT_ID=e.DEPARTMENT_ID)
join hr.locations l
on (c.LOCATION_ID=l.LOCATION_ID)
join hr.countries k
on (k.COUNTRY_ID=l.COUNTRY_ID);

Related

I need to get number of employees per depatment

This gets me the number of employees per department
SELECT department, COUNT(idEmployees) empCount
FROM employees
GROUP BY department;
You could join on departments to get the buildingID, and then group by that:
SELECT buildingID, COUNT(*)
FROM employees e
JOIN department d ON e.department = d.departmentID
GROUP BY buildingID
i think this is the solution
select d.buildingID, count(e.ID)
from department d
inner join employees e
on d.departmentID = e.department
group by d.buildingID
This question can be answered in two ways. Firstly, if all employees of a department must be attached in a building. Secondly few employees of a department are attached to a building.
For one where employees are attached to a building it's mandatory
SELECT d.buildingId
, COUNT(e.id) count_employee
FROM departments d
INNER JOIN employees e
ON d.departmentid = e.department
GROUP BY d.buildingId
For second where employees are attached to a building but it's optional. In that case LEFT JOIN is used.
SELECT b.buildingId
, COALESCE(t.count_employee, 0) count_employee
FROM building b
LEFT JOIN (SELECT d.buildingId
, COUNT(e.id) count_employee
FROM departments d
INNER JOIN employees e
ON d.departmentid = e.department
GROUP BY d.buildingId) t
ON b.buildingId = t.buildingId
If a building is attached with multiple departments and one employee is assigned with multiple departments then count building wise same employee is only onetime not multiple times. In that case DISTINCT keyword is used inside COUNT().
SELECT d.buildingId
, COUNT(DISTINCT e.id) count_employee
FROM departments d
INNER JOIN employees e
ON d.departmentid = e.department
GROUP BY d.buildingId

How to check if a value is larger than the average of these values?

I need to find the a list of people who earn more money than the average salary in their country
SELECT e1.first_name || ' ' ||e1.last_name
FROM hr.employees e1
WHERE salary >= (
Select AVG(salary)
FROM HR.employees e2
INNER JOIN HR.departments ON e2.department_id = departments.department_id
INNER JOIN HR.locations ON HR.departments.location_id = HR.locations.location_id
INNER JOIN HR.countries ON HR.locations.country_id = hr.countries.country_id
GROUP BY hr.countries.country_id
);
The subquery gives me the average salary values per country and works right.
However when I try to combine them I get the following error:
ORA-01427: Subquery returns more than one row
Any idea what I should do to get the query working?
Use AVG as an analytic function:
SELECT full_name
FROM (
SELECT e.first_name || ' ' ||e.last_name AS full_name,
e.salary,
AVG( e.salary ) OVER ( PARTITION BY l.country_id ) AS avg_salary
FROM hr.employees e
INNER JOIN HR.departments d
ON e.department_id = d.department_id
INNER JOIN HR.locations l
ON d.location_id = l.location_id
)
WHERE salary > avg_salary;
Use window functions:
SELECT e.first_name || ' ' || e.last_name
FROM (SELECT e.*,
AVG(salary) OVER (PARTITION BY l.country_id) as country_avg
FROM HR.employees e JOIN
HR.departments d
ON e.department_id = d.department_id JOIN
HR.locations l
ON d.location_id = l.location_id
) e
WHERE e.salary >= e.country_avg;
Note that the country table is not needed. The country_id is sufficient to identify the country.

Left semi Join in Hive for multiple table

How can we use left semi join in multiple tables . For example, in SQL the query to retrieve no. of employees working in US is :
select name,job_id,sal
from emp
where dept_id IN (select dept_id
from dept d
INNER JOIN Location L
on d.location_id = L.location_id
where L.city='US'
)
As IN query is not supported in Hive, how can we write this in Hive.
Seems like a simple inner join
select e.name
,e.job_id
,e.sal
from emp as e
join dept as d
on d.dept_id =
e.dept_id
join location as l
on l.location_id =
d.location_id
where l.city='US'
P.s.
Hive does support IN.
The only issue with your query is that dept_id of emp is not qualified (should be emp.dept_id).
This works:
select name,job_id,sal
from emp
where emp.dept_id IN (select dept_id
from dept d
INNER JOIN Location L
on d.location_id = L.location_id
where L.city='US'
)
Use exists instead:
select e.name, e.job_id, e.sal
from emp e
where exists (select 1
from dept d join
location L
on d.location_id = L.location_id
where l.city = 'US' and d.dept_id = e.dept_id
);
You can refer to the documentation, which covers subqueries in the WHERE clause.
This query appears to be answering the question: What employees work in departments that have a location in the US. You can also do this in the FROM clause with a subquery;
select e.name, e.job_id, e.sal
from emp e join
(select distinct d.dept_id
from dept d join
location L
on d.location_id = L.location_id
where l.city = 'US'
) d
on d.dept_id = e.dept_id;
I should note, though, that "US" is not usually considered a city.
EDIT:
Obviously, if a department can only have one location, then "semi-join" is not necessary. The SELECT DISTINCT can just be SELECT . . . Or, you can use the JOINs as in Dudu's answer. In any case, the EXISTS will work. In many databases it would have good (sometimes the best performance); I'm not sure about the performance implications in Hive.

Not a single-group group function Oracle SQL

I have the following table:
I have to find the name and the country of the department with the highest number of employees.
SELECT d.department_name, c.country_name FROM employees e, departments d, locations l, countries c
WHERE d.location_id = l.location_id AND l.country_id = c.country_id
HAVING MAX(e.employee_id) = (SELECT MAX(MAX(employee_id)) FROM employees GROUP BY department_id);
I'm getting a not a single-group group function error. Why is that?
If am not wrong this is what you are looking for
SELECT *
FROM (SELECT d.department_name,
c.country_name
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id
INNER JOIN locations l
ON d.location_id = l.location_id
INNER JOIN countries c
ON l.country_id = c.country_id
GROUP BY d.department_name,
c.country_name
ORDER BY Count(1) DESC)
WHERE ROWNUM = 1
is this what you want?
SELECT d.department_name, c.country_name FROM employees e, departments d, locations l, countries c
WHERE d.location_id = l.location_id AND l.country_id = c.country_id
AND ROWNUM = 1
ORDER BY e.employee_id DESC

Difference between old and new SQL JOINS

I am currently learing SQL and I can't understand why these two queries return different numbers of rows (the first one returns 53, while the second returns 69).
SELECT d.department_name "dept_name",
j.job_title "job_title",
e.manager_id "manager_id",
MAX(e.salary) "max_salary",
SUM(e.salary) "sum_salary"
FROM employees e, jobs j, departments d
WHERE j.job_id = e.job_id AND
d.department_id(+) = e.department_id
GROUP BY GROUPING SETS( (d.department_name, j.job_title), (j.job_title,e.manager_id), ());
And the second one:
SELECT d.department_name AS "dept_name",
j.job_title AS "job_title",
e.manager_id AS "manager_id",
MAX(e.salary) AS "max_salary",
SUM(e.salary) AS "sum_salary"
FROM employees e
INNER JOIN jobs j ON j.job_id = e.job_id
RIGHT OUTER JOIN departments d ON d.department_id = e.department_id
GROUP BY GROUPING SETS( (d.department_name, j.job_title), (j.job_title,e.manager_id), ());
Thank you for your help!
Your queries differ because in the first one you outer join table departments, whereas in the second you outer join jobs and employees. (Which is why I don't like right outer joins so much. I don't find them very readable.) You take employees, join with jobs and then by RIGHT OUTER JOIN you say: and give me all departments anyhow, so give me additional (outer joined records) for jobs and employees. The second statement is equal to:
SELECT d.department_name AS "dept_name",
j.job_title AS "job_title",
e.manager_id AS "manager_id",
MAX(e.salary) AS "max_salary",
SUM(e.salary) AS "sum_salary"
FROM departments d
LEFT OUTER JOIN employees e ON e.department_id = d.department_id
LEFT OUTER JOIN jobs j ON j.job_id = e.job_id
GROUP BY GROUPING SETS( (d.department_name, j.job_title), (j.job_title,e.manager_id), ());
In old Oracle syntax you would write:
SELECT d.department_name "dept_name",
j.job_title "job_title",
e.manager_id "manager_id",
MAX(e.salary) "max_salary",
SUM(e.salary) "sum_salary"
FROM employees e, jobs j, departments d
WHERE e.department_id(+) = d.department_id
AND j.job_id(+) = e.job_id;
In old-style joins the syntax
FROM employees e, departments d
WHERE d.department_id(+) = e.department_id
is a left join and is equal to:
FROM employees e
LEFT JOIN departments d ON d.department_id = e.department_id
For these queries two work exactly the same, you should also change the RIGHT JOIN to LEFT JOIN in the second query, or d.department_id(+) = e.department_id to d.department_id = e.department_id(+) in the first.