SQL count occurrences of an id from another table in multiple rows - sql

TABLE 1 employee:
employee_id, first_name, last_name
2 John Appleseed
TABLE 2 performance_review:
employee_id, reviewer_id
2 1
2 3
2 4
1 2
3 2
QUESTION: print the first_name and last_name in a single row, then how many times that id is found in the employee_id column, then how many times that same id is found in the reviewer_id column.
Example output:
Name Employee_id count Received_review count
-------------------------------------------------------------
John Appleseed 3 2
What I got so far (it doesn't work)
SELECT
CONCAT([employee_first_name], ' ' , [employee_last_name]) AS employee_full_name,
(SELECT COUNT(employee.employee_id)
FROM performance_review AS received_review
LEFT JOIN performance_review ON employee.employee_id = performance_review.employee_id) AS received_reviews
FROM
employee

Since this involves separate aggregation over two different columns you need two subqueries, one for each.
Here is an example [edit] left joins should be used here because the inner joins would fail for example if the performance review table has all rows with null reviewer for a particular employee.
with
emp as (select employee_id,count(*) employee_count
from performance_review
group by employee_id),
rev as (select reviewer_id,count(*) reviewer_count
from performance_review
group by reviewer_id)
select
first_name,
last_name,
employee_count,
reviewer_count
from
employee
left join emp on employee.employee_id=emp.employee_id
left join rev on employee.employee_id=rev.reviewer_id;
The result
first_name
last_name
employee_count
reviewer_count
John
Appleseed
3
2

Robert's answer is the clearest way to do it but I thought I would show another way to do it with a join -- here you use a trick of doing a test and sum to count certain items. I join both cases
SELECT e.first_name, e.last_name,
SUM(CASE WHEN e.employee_id = p.employee_id THEN 1 ELSE 0 END) as employee_count,
SUM(CASE WHEN e.employee_id = p.reviewer_id THEN 1 ELSE 0 END) as reviewer_count
FROM employee e
LEFT JOIN performance_review p on e.employee_id = p.reviewer_id
or e.employee_id = p.employee_id
GROUP BY e.first_name, e.last_name

Related

Include Groups not having values for count

I have the following query:
SELECT DepartmentID
, Count(EmployeeID) AS CountEmployee
FROM HR.EmployeeTransfer
GROUP BY DepartmentID
This is my current output:
DepartmentID CountEmployee
1 15
2 20
I want to include Departments which don't have any employees count like below:
DepartmentID CountEmployee
1 15
2 20
3 NULL
4 NULL
Presumably, you have a "departments" table of some sort. For this query, you want a LEFT JOIN from this table:
SELECT d.DepartmentID, Count(et.EmployeeID) AS CountEmployee
FROM HR.Departments d LEFT JOIN
HR.EmployeeTransfer et
ON d.DepartmentID = et.DepartmentID
GROUP BY d.DepartmentID;
Note that this returns 0 instead of NULL. If you really want NULL, you can use:
SELECT d.DepartmentID, NULLIF(Count(et.EmployeeID), 0) AS CountEmployee
SELECT ET.DepartmentID
,Count(E.EmployeeID) AS CountEmployee
FROM HR.EmployeeTransfer AS ET
LEFT JOIN EmployeesTable AS E ON E.DepartmentID=ET.DepartmentID
GROUP BY ET.DepartmentID

How to get the department name in this tutorial sql query?

Following the tutorial on SQL here I want to query the number of employees per department together with the department name.
I tried the following query in that tutorial:
SELECT count(*), dept_name
FROM employees, departments
WHERE employees.dept_id = departments.dept_id
GROUP BY departments.dept_id
but it returns
COUNT(*) dept_name
2 NULL
2 NULL
instead of the expected output
COUNT(*) dept_name
2 Accounting
2 Sales
What am I doing wrong here?
First use JOIN instead of WHERE
Then you group by dept_id to make sure you dont have duplicate name like 2 Sales department or 2 employee with same name.
SELECT departments.dept_id, dept_name, count(*)
FROM employees
JOIN departments
ON employees.dept_id = departments.dept_id
GROUP BY departments.dept_id, departments.dept_name
Group by dept_name not dept_id
SELECT count(*), dept_name
FROM employees, departments
WHERE employees.dept_id = departments.dept_id
GROUP BY departments.dept_name
And you can better use join like Juan Carlos Oropeza's answer:
SELECT count(*), dept_name
FROM employees JOIN departments ONemployees.dept_id = departments.dept_id
GROUP BY departments.dept_name

Select users belonging only to particular departments

I have the following table with two fields namely a and b as shown below:
create table employe
(
empID varchar(10),
department varchar(10)
);
Inserting some records:
insert into employe values('A101','Z'),('A101','X'),('A101','Y'),('A102','Z'),('A102','X'),
('A103','Z'),('A103','Y'),('A104','X'),('A104','Y'),('A105','Z'),('A106','X');
select * from employe;
empID department
------------------
A101 Z
A101 X
A101 Y
A102 Z
A102 X
A103 Z
A103 Y
A104 X
A104 Y
A105 Z
A106 X
Note: Now I want to show the employee who is only and only belongs to the department Z and Y.
So according to the condition the only employee A103 should be displayed because of he only belongs
to the department Z and Y. But employee A101 should not appear because he belong to Z,X, and Y.
Expected Result:
If condition is : Z and Y then result should be:
empID
------
A103
If condition is : Z and X then result should be:
empID
------
A102
If condition is : Z,X and Y then result should be:
empID
------
A101
Note: I want to do it in the where clause only (don't want to use the group by and having clauses), because I'm going to include this one in the other where also.
This is a Relational Division with no Remainder (RDNR) problem. See this article by Dwain Camps that provides many solution to this kind of problem.
First Solution
SQL Fiddle
SELECT empId
FROM (
SELECT
empID, cc = COUNT(DISTINCT department)
FROM employe
WHERE department IN('Y', 'Z')
GROUP BY empID
)t
WHERE
t.cc = 2
AND t.cc = (
SELECT COUNT(*)
FROM employe
WHERE empID = t.empID
)
Second Solution
SQL Fiddle
SELECT e.empId
FROM employe e
WHERE e.department IN('Y', 'Z')
GROUP BY e.empID
HAVING
COUNT(e.department) = 2
AND COUNT(e.department) = (SELECT COUNT(*) FROM employe WHERE empID = e.empId)
Without using GROUP BY and HAVING:
SELECT DISTINCT e.empID
FROM employe e
WHERE
EXISTS(
SELECT 1 FROM employe WHERE department = 'Z' AND empID = e.empID
)
AND EXISTS(
SELECT 1 FROM employe WHERE department = 'Y' AND empID = e.empID
)
AND NOT EXISTS(
SELECT 1 FROM employe WHERE department NOT IN('Y', 'Z') AND empID = e.empID
)
I know that this question has already been answered, but it was a fun problem to do and I tried to do it in a way that no one else has. Benefits of mine is that you can input any list of strings as long as each value has a comma afterwards and you don't have to worry about checking counts.
Note: Values must be listed in alphabetic order.
XML Solution with CROSS APPLY
select DISTINCT empID
FROM employe A
CROSS APPLY
(
SELECT department + ','
FROM employe B
WHERE A.empID = B.empID
ORDER BY department
FOR XML PATH ('')
) CA(Deps)
WHERE deps = 'Y,Z,'
Results:
empID
----------
A103
For condition 1:z and y
select z.empID from (select empID from employe where department = 'z' ) as z
inner join (select empID from employe where department = 'y' ) as y
on z.empID = y.empID
where z.empID Not in(select empID from employe where department = 'x' )
For condition 1:z and x
select z.empID from (select empID from employe where department = 'z' ) as z
inner join (select empID from employe where department = 'x' ) as x
on z.empID = x.empID
where z.empID Not in(select empID from employe where department = 'y' )
For condition 1:z,y and x
select z.empID from (select empID from employe where department = 'z' ) as z
inner join (select empID from employe where department = 'x' ) as x
on z.empID = x.empID
inner join (select empID from employe where department = 'y' ) as y on
y.empID=Z.empID
You can use GROUP BY with having like this. SQL Fiddle
SELECT empID
FROM employe
GROUP BY empID
HAVING SUM(CASE WHEN department= 'Y' THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN department= 'Z' THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN department NOT IN('Y','Z') THEN 1 ELSE 0 END) = 0
Without GROUP BY and Having
SELECT empID
FROM employe E1
WHERE (SELECT COUNT(DISTINCT department) FROM employe E2 WHERE E2.empid = E1.empid and department IN ('Z','Y')) = 2
EXCEPT
SELECT empID
FROM employe
WHERE department NOT IN ('Z','Y')
If you want to use any of the above query with other tables using a join you can use CTE or a derived table like this.
;WITH CTE AS
(
SELECT empID
FROM employe
GROUP BY empID
HAVING SUM(CASE WHEN department= 'Y' THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN department= 'Z' THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN department NOT IN('Y','Z') THEN 1 ELSE 0 END) = 0
)
SELECT cols from CTE join othertable on col_cte = col_othertable
try this
select empID from employe
where empId in (select empId from employe
where department = 'Z' and department = 'Y')
and empId not in (select empId from employe
where department = 'X') ;
for If condition is : Z and Y
SELECT EMPID FROM EMPLOYE WHERE DEPARTMENT='Z' AND
EMPID IN (SELECT EMPID FROM EMPLOYE WHERE DEPARTMENT ='Y')AND
EMPID NOT IN(SELECT EMPID FROM EMPLOYE WHERE DEPARTMENT NOT IN ('Z','Y'))
The following query works when you want employees from departments 'Y' and 'Z' and not 'X'.
select empId from employe
where empId in (select empId from employe
where department = 'Z')
and empId in (select empId from employe
where department = 'Y')
and empId not in (select empId from employe
where department = 'X') ;
For your second case, simply replace not in with in in the last condition.
Try this,
SELECT a.empId
FROM employe a
INNER JOIN
(
SELECT empId
FROM employe
WHERE department IN ('X', 'Y', 'Z')
GROUP BY empId
HAVING COUNT(*) = 3
)b ON a.empId = b.empId
GROUP BY a.empId
Count must based on number of conditions.
You can too use GROUP BY and HAVING — you just need to do it in a subquery.
For example, let's start with a simple query to find all employees in departments X and Y (and not in any other departments):
SELECT empID,
GROUP_CONCAT(DISTINCT department ORDER BY department ASC) AS depts
FROM emp_dept GROUP BY empID
HAVING depts = 'X,Y'
I've used MySQL's GROUP_CONCAT() function as a convenient shortcut here, but you could get the same results without it, too, e.g. like this:
SELECT empID,
COUNT(DISTINCT department) AS all_depts,
COUNT(DISTINCT CASE
WHEN department IN ('X', 'Y') THEN department ELSE NULL
END) AS wanted_depts
FROM emp_dept GROUP BY empID
HAVING all_depts = wanted_depts AND wanted_depts = 2
Now, to combine this with other query condition, simply take a query that includes the other conditions, and join your employees table against the output of the query above:
SELECT empID, name, depts
FROM employees
JOIN (
SELECT empID,
GROUP_CONCAT(DISTINCT department ORDER BY department ASC) AS depts
FROM emp_dept GROUP BY empID
HAVING depts = 'X,Y'
) AS tmp USING (empID)
WHERE -- ...add other conditions here...
Here's an SQLFiddle demonstrating this query.
Ps. The reason why you should use a JOIN instead of an IN subquery for this is because MySQL is not so good at optimizing IN subqueries.
Specifically (as of v5.7, at least), MySQL always converts IN subqueries into dependent subqueries, so that the subquery must be re-executed for every row of the outer query, even if the original subquery was independent. For example, the following query (from the documentation linked above):
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
gets effectively converted into:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
This may still be reasonably fast, if t2 is small and/or has an index allowing fast lookups. However, if (like in the original example above) executing the subquery might take a lot of work, the performance can suffer badly. Using a JOIN instead allows the subquery to only be executed once, and thus typically offers much better performance.
What about a self join? (ANSI Compliant - worked for 20+ years)
SELECT * FROM employee e JOIN employee e2 ON e.empid = e2.empid
WHERE e.department = 'x' AND e2.department ='y'
This shows that a101 and a104 both work in both departments.
Solution using where clause:
select distinct e.empID
from employe e
where exists( select *
from employe
where empID = e.empID
having count(department) = count(case when department in('Y','X','Z') then department end)
and count(distinct department) = 3)
exists checks if there are records for specific EmpId that have total count of departments equal to conditional count of only matching departments and that it is also equal to the number of departments provided to the in clause. Also worth mentioning that here we apply having clause without the group by clause, on the whole set, but with already specified, only one empID.
SQLFiddle
You can achieve this without the correlated subquery, but with the group by clause:
select e.empId
from employe e
group by e.empID
having count(department) = count(case when department in('Y','X','Z') then department end)
and count(distinct department) = 3
SQLFiddle
You can also use another variation of having clause for the query above:
having count(case when department not in('Y','X', 'Z') then department end) = 0
and count(distinct case when department in('Y','X','Z') then department end) = 3
SQLFiddle
In Postgres this can be simplified using arrays:
select empid
from employee
group by empid
having array_agg(department order by department)::text[] = array['Y','Z'];
It's important to sort the elements in the array_agg() and compare them to a sorted list of departments in the same order. Otherwise this won't return correct answers.
E.g. array_agg(department) = array['Z', 'Y'] might potentially return wrong results.
This can be done in a more flexible manner using a CTE to supply the departments:
with depts_to_check (dept) as (
values ('Z'), ('Y')
)
select empid
from employee
group by empid
having array_agg(department order by department) = array(select dept from depts_to_check order by dept);
That way the sorting of the elements is always done by the database and will be consistent between the values in the aggregated array and the one to which it is compared.
An option with standard SQL is to check if at least one row has a different department together with counting all rows
select empid
from employee
group by empid
having min(case when department in ('Y','Z') then 1 else 0 end) = 1
and count(case when department in ('Y','Z') then 1 end) = 2;
The above solution won't work if it's possible that a single employee is assigned twice to the same department!
The having min (...) can be simplified in Postgres using the aggregate bool_and().
When applying the standard filter() condition to do conditional aggregation this can also be made to work with situation where an employee can be assigned to the same department twice
select empid
from employee
group by empid
having bool_and(department in ('Y','Z'))
and count(distinct department) filter (where department in ('Y','Z')) = 2;
bool_and(department in ('Y','Z')) only returns true if the condition is true for all rows in the group.
Another solution with standard SQL is to use the intersection between those employees that have at least those two departments and those that are assigned to exactly two departments:
-- employees with at least those two departments
select empid
from employee
where department in name in ('Y','Z')
group by empid
having count(distinct department) = 2
intersect
-- employees with exactly two departments
select empid
from employee
group by empid
having count(distinct department) = 2;

SQL Server Query using GROUP BY

I am having trouble writing a query that will select all Skills, joining the Employee and Competency records, but only return one skill per employee, their newest Skill. Using this sample dataset
Skills
======
id employee_id competency_id created
1 1 1 Jan 1
2 2 2 Jan 1
3 1 2 Jan 3
Employees
===========
id first_name last_name
1 Mike Jones
2 Steve Smith
Competencies
============
id title
1 Problem Solving
2 Compassion
I would like to retrieve the following data
Skill.id Skill.employee_id Skill.competency_id Skill.created Employee.id Employee.first_name Employee.last_name Competency.id Competency.title
2 2 2 Jan 1 2 Steve Smith 2 Compassion
3 1 2 Jan 3 1 Mike Jones 2 Compassion
I was able to select the employee_id and max created using
SELECT MAX(created) as created, employee_id FROM skills GROUP BY employee_id
But when I start to add more fields in the select statement or add in a join I get the 'Column 'xyz' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.' error.
Any help is appreciated and I don't have to use GROUP BY, it's just what I'm familiar with.
The error that you were getting is because SQL Server requires any item in the SELECT list to be included in the GROUP BY if there is an aggregate function being used.
The problem with that is you might have unique values in some columns which can throw off the result. So you will want to rewrite the query to use one of the following:
You can use a subquery to get this result. This gets the max(created) in a subquery and then you use that result to get the correct employee record:
select s.id SkillId,
s.employee_id,
s.competency_id,
s.created,
e.id employee,
e.first_name,
e.last_name,
c.id competency,
c.title
from Employees e
left join Skills s
on e.id = s.employee_id
inner join
(
SELECT MAX(created) as created, employee_id
FROM skills
GROUP BY employee_id
) s1
on s.employee_id = s1.employee_id
and s.created = s1.created
left join Competencies c
on s.competency_id = c.id
See SQL Fiddle with Demo
Or another way to do this is to use row_number():
select *
from
(
select s.id SkillId,
s.employee_id,
s.competency_id,
s.created,
e.id employee,
e.first_name,
e.last_name,
c.id competency,
c.title,
row_number() over(partition by s.employee_id
order by s.created desc) rn
from Employees e
left join Skills s
on e.id = s.employee_id
left join Competencies c
on s.competency_id = c.id
) src
where rn = 1
See SQL Fiddle with Demo
For every non-aggregated column you add to your SELECT statement you need to update your GROUP BY to include it.
This article may help you understand why.
;WITH
MAX_SKILL_created AS
(
SELECT
MAX(skills.created) as created,
skills.employee_id
FROM
skills
GROUP BY
skills.employee_id
),
MAX_SKILL_id AS
(
SELECT
MAX(skills.id) as id,
skills.employee_id
FROM
skills
INNER JOIN MAX_SKILL_created
ON MAX_SKILL_created.employee_id = skills.employee_id
AND MAX_SKILL_created.created = skills.created
GROUP BY
skills.employee_id
)
SELECT
* -- type all your columns here
FROM
employees
INNER JOIN MAX_SKILL_id
ON MAX_SKILL_id.employee_id = employees.employee_id
INNER JOIN skills
ON skills.id = MAX_SKILL_id.id
INNER JOIN competencies
ON competencies.id = skills.competency_id
If you are using SQL Server than you can use OUTER APPLY
SELECT *
FROM employees E
OUTER APPLY (
SELECT TOP 1 *
FROM skills
WHERE employee_id = E.id
ORDER BY created DESC
) S
INNER JOIN competencies C
ON C.id = S.competency_id

Joining three tables (link table)

Need help with a query that I wrote:
I have three tables
Company
id name
1 Gary's
Employee
id name company_id
1 Tim Jones 1
2 Sam Adams 1
reports to
employee_id reports_to_id
1 2
My current query is:
select
temp.company.name as comp_name,
temp.employee.name as employee_name,
temp.employee.id as employee_id
from temp.employee, temp.employee
where temp.company.id = temp.employee.company_id and temp.company.id = 1
Which gives me the output of:
comp_name employee_name employee_id
Gary's Tim Jones 1
I need something like this:
comp_name employee_name reports_to
Gary's Tim Jones Sam Adams
What's a good way to modify my query to do this? I have a query and then I take those results and run a second query against that result set (which is excessively unnecessary).
Assuming an employee only reports to one person then we could have (no link table)
Employee (Id, Name, CompanyId, ReportsToId)
Company (Id, Name)
Then you could have a query similar to
select e.Name EmployeeName, c.Name CompanyName, r.Name ReportsTo
from
Employee e
inner join Company c on e.CompanyId = c.Id
inner join Employee r on e.ReportsToId = r.Id
where
e.CompanyId = 1
If the employee reports to multiple people then we would use a link table
Employee (Id, Name, CompanyId)
EmployeeReportsTo (EmployeeId, ManagerId)
Company (Id, Name)
select e.Name EmployeeName, c.Name CompanyName, r.Name ReportsTo
from
Employee e
inner join Company c on e.CompanyId = c.Id
inner join EmployeeReportsTo ert on ert.EmployeeId = e.Id
inner join Employee r on ert.ManagerId = r.Id
where
e.CompanyId = 1