Hive SQL + FROM not in to JOIN - sql

I have a query with NOT IN clause,need to convert into join statement.
SELECT EMP_NBR
FROM employees not in (select emp_id from departments where dept_id = 10 and division = 'sales')

I think the proper transformation would be a left join:
select EMP_NBR
from employees e join
departments d
on e.dept_id = d.dept_id and
d.dept_id = 10 and
d.division = 'sales'
where d.dept_id is null;
Note: I added what I consider to be correct JOIN conditions.

not in could be mimicked in SQL using just not in the where clause, e.g.
SELECT EMP_NBR FROM employees inner join department on
employees.emp_id =departments.emp_id
where NOT (dept_id = 10 and division = 'sales')

Related

Complex query - retrieve employees working in both two specific departments

I'm trying to figure a way to retrieve employees working in two different departments.
I have 3 simple tables:
employee (employee_id, employee_name)
department (department_id, department_name)
working (eid, did, work_time)
So I have tried to write a SQL query:
select employee_name
from employee, working,department
where eid = employee_id
and did = department_id
and department_name = 'software'
and dname = 'hardware';
But it doesn't work, what is my problem?
The problem is that you are requiring department to be both 'software' and 'hardware'. Also, dname is not a field.
Correcting your query:
select employee_name
from employee, working, department
where eid = employee_id and did = department_id
and (department_name = 'software' or department_name = 'hardware');
But I would prefer this kind of query:
SELECT DISTINCT e.employee_name
FROM employee e
JOIN working w ON w.eid = e.employee_id
JOIN department d ON d.department_id = w.did
WHERE d.department_name IN ('software', 'hardware');
That is to get employees that work in any of the two departments (or both).
If you want only employees that work in both departments, try this:
SELECT e.employee_id, e.employee_name
FROM employee e
JOIN working w ON w.eid = e.employee_id
JOIN department d ON d.department_id = w.did
WHERE d.department_name IN ('software', 'hardware')
GROUP BY e.employee_id HAVING COUNT(DISTINCT d.department_id) = 2;
Would something like this work for you?
SELECT
count(*) as cnt,
employee.employee_name
FROM
employee
JOIN working ON working.eid = employee.employee_id
JOIN department ON department.department_id = working.did
WHERE
department.department_name = 'software' or department.department_name = 'hardware'
GROUP BY employee.employee_name
HAVING cnt > 1
This would count each employee who is linked both software or hardware department. Or you can leave WHERE clause away to get all employees working more than one departments.
What is my problem?
There is no dname column in your tables.
You can simplify the problem as you don't need the department table since the working table contains the department id in the did column.
Then you need to GROUP BY each employee and find those HAVING a COUNT of two DISTINCT department ids:
SELECT MAX(e.employee_name)
FROM employee e
INNER JOIN working w
ON e.employee_id = w.eid
GROUP BY e.employee_id
HAVING COUNT(DISTINCT w.did) = 2
If you want to consider only the software and hardware departments then:
SELECT MAX(e.employee_name)
FROM employee e
INNER JOIN working w
ON e.employee_id = w.eid
INNER JOIN department d
ON w.did = d.department_id
WHERE d.department_name IN ('software', 'hardware')
GROUP BY e.employee_id
HAVING COUNT(DISTINCT w.did) = 2
You can easily obtain employees who work in one specific department:
select *
from Employee e inner join
Working w on e.employee_id = w.eid inner join
Department d on w.did = d.department_id
where d.name = 'software'
Now ambiguity cames. If you want to get all employees work either in software or in hardware:
-- Employees who work at either software or hardware or both departments
select *
from Employee e inner join
Working w on e.employee_id = w.eid inner join
Department d on w.did = d.department_id
where d.name = in ('software', 'hardware')
If you want to get employees who works in both software and hardware departments:
-- Employees who work in both hardware and software deparments simultaneously
select *
from Employee e inner join
Working w on e.employee_id = w.eid inner join
Department d on w.did = d.department_id
where d.name = 'software'
intersect
select *
from Employee e inner join
Working w on e.employee_id = w.eid inner join
Department d on w.did = d.department_id
where d.name = 'hardware'

How to get result if exactly one match inner join

How can I write a query to join two tables and return result if exactly one match in there. I have to discard results if zero match and more than one match.
All I am looking for is to extend the INNER JOIN. Let me just get to the point. I have two tables Dept & Emp. One Dept can have multiple Emp's & not the other way around.
Table Dept
Table Emp
I need to JOIN it on Dept_id
Expected Results
You can join with a not exists condition:
select d.*, e.emp_id, e.emp_name
from dept d
inner join emp e
on d.dept_id = e.dept_id
and not exists (
select 1
from emp e1
where e1.dept_id = d.dept_id and e1.emp_id != e.emp_id
)
One alternative to existing solutions can be one using analytics (window functions),
instead of joining twice:
select dept_id, dept_name, emp_id, emp_name
from
(
SELECT
d.Dept_id, d.Dept_name, e.Emp_id, e.Emp_Name,
count(*) over (partition by d.dept_id) cnt1
FROM d
INNER JOIN e
ON d.Dept_id = e.Dept_id
) where cnt = 1;
You could use a subquery for group by dept_id haing count = 1
select t.dept_id, dept.dept_name, emp.Emp_name
from (
select dept_id
from emp
group by dept_id
having count(*) = 1
) t
INNER JOIN dept on t.dept_id = dept.dept_id
INNER JOIN emp ON t.dept_id = emp.dept_id
You can phrase this as an aggregation query in Oracle:
select d.dept_id, d.dept_name,
max(e.emp_id) as emp_id,
max(e.emp_name) as emp_name
from dept d inner join
emp e
using (dept_id)
group by d.dept_id, d.dept_name
having count(*) = 1;
This works because if there is only one match, then max() returns the value from the one row.
Also, try below query;
SELECT a.depid dept_id,dept_name,emp_id,emp_name
FROM
(SELECT case WHEN count(*)=1 THEN dept_id END depid FROM emp GROUP BY dept_id) a INNER JOIN emp ON depid=dept_id
INNER JOIN dept b ON a.depid = b.dept_id
WHERE depid IS NOT NULL
Another way would be
select d.dept_id, d.dept_name, e.emp_name
from emp e
join dept d on d.dept_id = e.dept_id
where e.dept_id in
( select dept_id from emp group by dept_id having count(*) = 1 )

SQL Query with two conditions

I'm new to SQL queries, so I have some problems making them. I'm using SQL Server Management Studio.
My task is to select regional groups of departments where average of salary + commission of employees is less than 2500.
My SQL statement:
select regional_group
from LOCATION
join DEPARTMENT on location.location_id = DEPARTMENT.location_id
join EMPLOYEE on DEPARTMENT.department_id = EMPLOYEE.department_id
where EMPLOYEE.department_id in (select avg(salary + commission)
from employee)
Structure of the database
you have to put condition in where clause of inner query
select regional_group
from LOCATION
join DEPARTMENT on location.location_id = DEPARTMENT.location_id
join EMPLOYEE on DEPARTMENT.department_id = EMPLOYEE.department_id
where EMPLOYEE.department_id in (select department_id
from employee
where salary + commission < 2500)
SQL Query
Select L.Reginal_group
From Employee E
Join Department D ON
D.Department_id = E.Department_id
Join Location L ON
D.Location_id = L.Location_id
WHERE avg(E.salary+E.commission) < 2500

Left semi Join in Hive for multiple table

How can we use left semi join in multiple tables . For example, in SQL the query to retrieve no. of employees working in US is :
select name,job_id,sal
from emp
where dept_id IN (select dept_id
from dept d
INNER JOIN Location L
on d.location_id = L.location_id
where L.city='US'
)
As IN query is not supported in Hive, how can we write this in Hive.
Seems like a simple inner join
select e.name
,e.job_id
,e.sal
from emp as e
join dept as d
on d.dept_id =
e.dept_id
join location as l
on l.location_id =
d.location_id
where l.city='US'
P.s.
Hive does support IN.
The only issue with your query is that dept_id of emp is not qualified (should be emp.dept_id).
This works:
select name,job_id,sal
from emp
where emp.dept_id IN (select dept_id
from dept d
INNER JOIN Location L
on d.location_id = L.location_id
where L.city='US'
)
Use exists instead:
select e.name, e.job_id, e.sal
from emp e
where exists (select 1
from dept d join
location L
on d.location_id = L.location_id
where l.city = 'US' and d.dept_id = e.dept_id
);
You can refer to the documentation, which covers subqueries in the WHERE clause.
This query appears to be answering the question: What employees work in departments that have a location in the US. You can also do this in the FROM clause with a subquery;
select e.name, e.job_id, e.sal
from emp e join
(select distinct d.dept_id
from dept d join
location L
on d.location_id = L.location_id
where l.city = 'US'
) d
on d.dept_id = e.dept_id;
I should note, though, that "US" is not usually considered a city.
EDIT:
Obviously, if a department can only have one location, then "semi-join" is not necessary. The SELECT DISTINCT can just be SELECT . . . Or, you can use the JOINs as in Dudu's answer. In any case, the EXISTS will work. In many databases it would have good (sometimes the best performance); I'm not sure about the performance implications in Hive.

Sub-Query Problem

Department(DepartID,DepName)
Employees(Name,DepartID)
What i need is the Count of Employees in the Department with DepName.
If you are using SQL Server version 2005 or above, here is another possible way of getting employees count by department.
.
SELECT DPT.DepName
, EMP.EmpCount
FROM dbo.Department DPT
CROSS APPLY (
SELECT COUNT(EMP.DepartId) AS EmpCount
FROM dbo.Employees EMP
WHERE EMP.DepartId = DPT.DepartId
) EMP
ORDER BY DPT.DepName
Hope that helps.
Sample test query output:
I'd use an outer join rather than a subquery.
SELECT d.DepName, COUNT(e.Name)
FROM Department d
LEFT JOIN Employees e ON e.DepartID = d.DepartID
GROUP BY d.DepartID, d.DepName
SELECT d.DepName, COUNT(e.Name)
FROM Department d
LEFT JOIN Employees e
ON d.DepartID = e.DepartID
GROUP BY d.DepName
No need for a subquery.
SELECT dep.DepName, COUNT(emp.Name)
FROM DepName dep
LEFT OUTER JOIN Employees emp ON dep.DepartID = emp.DepartID
GROUP BY dep.DepName
SELECT COUNT(DISTINCT Name) FROM
Department AS d, Employees AS e
WHERE d.DepartID=e.DepartID AND d.DepName = '$thename'
And to avoid using a group by and save you a Sort operation in the queryplan:
SELECT
Department.DepName,
(SELECT COUNT(*)
FROM Employees
WHERE Employees.DepartID = Department.DepartID)
FROM
Department