Selecting the Id's that have the same EmailAddress column value - sql

What I need:
I am looking for a solution that can give me all the Employee Id's that have the same EmailAddress Column (the filter needs to be by EmailAddress).
I want to know what are the Id's correspondent to the duplicated Email Addresses and retrieve that information.
Table Employee:
Id | PlNumber | EmailAddress | EmployeeBeginingDate | EmployedEndDate | Name UserId(FK) | CreatedBy | CreatedOn
SELECT a.Id,a.EmailAddress
FROM Employee a
INNER JOIN (SELECT
Employee.Id as EmployeeId,
Employee.EmailAddress as EmailAddress,
FROM Employee
GROUP BY Employee.Id,Employee.EmailAddress
HAVING count(Employee.EmailAddress) > 1
) b
ON a.Id= b.EmployeeId
ORDER BY a.Id
I am always getting an error:
the multi-part identifier could not be bound.
I know why the error is happening but I couldn't solve this.
UPDATE: After a few changes the query is returning 0 rows but I know it should return at least 3 rows that I have duplicate values.

Try the below query as you have an aliased table Employee as a. So in place of Employee, you have to use a.
SELECT a.Id, a.EmailAddress
FROM Employee a
INNER JOIN (SELECT
Employee.EmailAddress as EmailAddress
FROM Employee
GROUP BY Employee.EmailAddress
HAVING count(Employee.EmailAddress) > 1
) b
ON a.EmailAddress = b.EmailAddress
ORDER BY a.Id
Live db<>fiddle demo.

Assuming the ids are different on each row, I would go for exists:
SELECT e.Id, e.EmailAddress
FROM Employee e
WHERE EXISTS (SELECT 1
FROM Employee e2
WHERE e2.EmailAddress = e.EmailAddress AND
e2.Id <> e.Id
)
ORDER BY e.EmailAddress;
Or, if you want to know the number of matches, use window functions:
SELECT e.Id, e.EmailAddress, cnt
FROM (SELECT e.*, COUNT(*) OVER (PARTITION BY e.EmailAddress) as cnt
FROM Employee e
) e
WHERE cnt >= 2;

Related

Paging in sql server for top level query

In sql server I have an Employee table and an Address table.
An Employee can have many Addresses.
I want to get the first 10 Employees with their Addresses.
SELECT *
FROM
Employee e
LEFT JOIN Address a ON a.EmployeeID = e.Id
WHERE
a.Street LIKE '%a%'
AND e.Name LIKE '%bob%'
ORDER BY e.Id
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
If each employee has 2 addresses associated with it I will get 10 rows but only 5 employees.
How do I get the 10 employees I want?
You can use dense_rank():
SELECT ea.*
FROM (SELECT e.*, a.*, -- should select the columns you really need
DENSE_RANK() OVER (ORDER BY e.id) as seqnum
FROM Employee e JOIN
Address a
ON a.EmployeeID = e.Id
WHERE a.Street LIKE '%a%' AND e.Name LIKE '%bob%'
) ea
WHERE seqnum <= 10
ORDER BY e.Id ;

SQL count number of occurrences in different tables

I have 4 tables in my db: incoming letters, outgoing letters, local letters and employee. In "letters" tables, there is a register_employee that have register this letter.
Every employee has his/her unique id (for example "3de9a23e-b927-4a27-a66a-c1f30b8c1464").
Tables look like this:
employee:
id first_name last_name age
incoming:
id register_employee summary letter adress date_sent
"Outgoing" and "local" tables look same as "incoming" table.
I need to count how many letters have been registered by each employee and which type.
Example results:
employee incoming outgoing local
Bob Marley 45 33 5
John Travolta 31 10 9
George Bush 98 15 38
I tried to use full joins and nested Select statments, but results that I have got are probably wrong cause I am getting millions of count for each employee (I don't have so many letters in db).
You can use OUTER JOIN and GROUP BY as follows:
-- Updated
You can use multiple sub-query:
select e.id, e.firstname,
(select count(*) from incoming i where i.register_employee = e.id) as incoming_letter,
(select count(*) from outgoing o where o.register_employee = e.id) as outgoing_letter,
(select count(*) from local l where l.register_employee = e.id) as local_letter
from employee e
We count the number of letters under different categories & left join them to the master employee table
select employee.first_name , t1.incoming, t2.outgoing, t3.local from (
select * from employee
left join
(select register_employee, count(*) as incoming from incoming_letters group by register_employee) t1 on employee.id = t1.register_employee
left join
(select register_employee, count(*) as outgoing from outgoing_letters group by register_employee) t2 on employee.id = t2.register_employee
left join
(select register_employee, count(*) as local from local group by register_employee) t3 on employee.id = t3.register_employee ) v
SELECT incom.id,incom.name,incom.incoming,out.outgoings,loc.local
FROM (SELECT
employees.id as id,
employees.short_name as name,
COUNT(incomings.register_employee) as incoming
FROM employees
LEFT JOIN incomings ON incomings.register_employee = employees.id)
GROUP BY employees.id, employees.short_name) as incom
LEFT JOIN (SELECT
employees.id as id,
COUNT(outgoings.register_employee) as outgoings
FROM employees
LEFT JOIN outgoings ON outgoings.register_employee = employees.id
GROUP BY employees.id) as out
ON incom.id=out.id
LEFT JOIN (SELECT
employees.id as id,
COUNT(local.register_employee) as local
FROM employees
LEFT JOIN outgoings ON local.register_employee = employees.id
GROUP BY employees.id) as loc
ON incom.id=loc.id;

Select users belonging only to particular departments

I have the following table with two fields namely a and b as shown below:
create table employe
(
empID varchar(10),
department varchar(10)
);
Inserting some records:
insert into employe values('A101','Z'),('A101','X'),('A101','Y'),('A102','Z'),('A102','X'),
('A103','Z'),('A103','Y'),('A104','X'),('A104','Y'),('A105','Z'),('A106','X');
select * from employe;
empID department
------------------
A101 Z
A101 X
A101 Y
A102 Z
A102 X
A103 Z
A103 Y
A104 X
A104 Y
A105 Z
A106 X
Note: Now I want to show the employee who is only and only belongs to the department Z and Y.
So according to the condition the only employee A103 should be displayed because of he only belongs
to the department Z and Y. But employee A101 should not appear because he belong to Z,X, and Y.
Expected Result:
If condition is : Z and Y then result should be:
empID
------
A103
If condition is : Z and X then result should be:
empID
------
A102
If condition is : Z,X and Y then result should be:
empID
------
A101
Note: I want to do it in the where clause only (don't want to use the group by and having clauses), because I'm going to include this one in the other where also.
This is a Relational Division with no Remainder (RDNR) problem. See this article by Dwain Camps that provides many solution to this kind of problem.
First Solution
SQL Fiddle
SELECT empId
FROM (
SELECT
empID, cc = COUNT(DISTINCT department)
FROM employe
WHERE department IN('Y', 'Z')
GROUP BY empID
)t
WHERE
t.cc = 2
AND t.cc = (
SELECT COUNT(*)
FROM employe
WHERE empID = t.empID
)
Second Solution
SQL Fiddle
SELECT e.empId
FROM employe e
WHERE e.department IN('Y', 'Z')
GROUP BY e.empID
HAVING
COUNT(e.department) = 2
AND COUNT(e.department) = (SELECT COUNT(*) FROM employe WHERE empID = e.empId)
Without using GROUP BY and HAVING:
SELECT DISTINCT e.empID
FROM employe e
WHERE
EXISTS(
SELECT 1 FROM employe WHERE department = 'Z' AND empID = e.empID
)
AND EXISTS(
SELECT 1 FROM employe WHERE department = 'Y' AND empID = e.empID
)
AND NOT EXISTS(
SELECT 1 FROM employe WHERE department NOT IN('Y', 'Z') AND empID = e.empID
)
I know that this question has already been answered, but it was a fun problem to do and I tried to do it in a way that no one else has. Benefits of mine is that you can input any list of strings as long as each value has a comma afterwards and you don't have to worry about checking counts.
Note: Values must be listed in alphabetic order.
XML Solution with CROSS APPLY
select DISTINCT empID
FROM employe A
CROSS APPLY
(
SELECT department + ','
FROM employe B
WHERE A.empID = B.empID
ORDER BY department
FOR XML PATH ('')
) CA(Deps)
WHERE deps = 'Y,Z,'
Results:
empID
----------
A103
For condition 1:z and y
select z.empID from (select empID from employe where department = 'z' ) as z
inner join (select empID from employe where department = 'y' ) as y
on z.empID = y.empID
where z.empID Not in(select empID from employe where department = 'x' )
For condition 1:z and x
select z.empID from (select empID from employe where department = 'z' ) as z
inner join (select empID from employe where department = 'x' ) as x
on z.empID = x.empID
where z.empID Not in(select empID from employe where department = 'y' )
For condition 1:z,y and x
select z.empID from (select empID from employe where department = 'z' ) as z
inner join (select empID from employe where department = 'x' ) as x
on z.empID = x.empID
inner join (select empID from employe where department = 'y' ) as y on
y.empID=Z.empID
You can use GROUP BY with having like this. SQL Fiddle
SELECT empID
FROM employe
GROUP BY empID
HAVING SUM(CASE WHEN department= 'Y' THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN department= 'Z' THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN department NOT IN('Y','Z') THEN 1 ELSE 0 END) = 0
Without GROUP BY and Having
SELECT empID
FROM employe E1
WHERE (SELECT COUNT(DISTINCT department) FROM employe E2 WHERE E2.empid = E1.empid and department IN ('Z','Y')) = 2
EXCEPT
SELECT empID
FROM employe
WHERE department NOT IN ('Z','Y')
If you want to use any of the above query with other tables using a join you can use CTE or a derived table like this.
;WITH CTE AS
(
SELECT empID
FROM employe
GROUP BY empID
HAVING SUM(CASE WHEN department= 'Y' THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN department= 'Z' THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN department NOT IN('Y','Z') THEN 1 ELSE 0 END) = 0
)
SELECT cols from CTE join othertable on col_cte = col_othertable
try this
select empID from employe
where empId in (select empId from employe
where department = 'Z' and department = 'Y')
and empId not in (select empId from employe
where department = 'X') ;
for If condition is : Z and Y
SELECT EMPID FROM EMPLOYE WHERE DEPARTMENT='Z' AND
EMPID IN (SELECT EMPID FROM EMPLOYE WHERE DEPARTMENT ='Y')AND
EMPID NOT IN(SELECT EMPID FROM EMPLOYE WHERE DEPARTMENT NOT IN ('Z','Y'))
The following query works when you want employees from departments 'Y' and 'Z' and not 'X'.
select empId from employe
where empId in (select empId from employe
where department = 'Z')
and empId in (select empId from employe
where department = 'Y')
and empId not in (select empId from employe
where department = 'X') ;
For your second case, simply replace not in with in in the last condition.
Try this,
SELECT a.empId
FROM employe a
INNER JOIN
(
SELECT empId
FROM employe
WHERE department IN ('X', 'Y', 'Z')
GROUP BY empId
HAVING COUNT(*) = 3
)b ON a.empId = b.empId
GROUP BY a.empId
Count must based on number of conditions.
You can too use GROUP BY and HAVING — you just need to do it in a subquery.
For example, let's start with a simple query to find all employees in departments X and Y (and not in any other departments):
SELECT empID,
GROUP_CONCAT(DISTINCT department ORDER BY department ASC) AS depts
FROM emp_dept GROUP BY empID
HAVING depts = 'X,Y'
I've used MySQL's GROUP_CONCAT() function as a convenient shortcut here, but you could get the same results without it, too, e.g. like this:
SELECT empID,
COUNT(DISTINCT department) AS all_depts,
COUNT(DISTINCT CASE
WHEN department IN ('X', 'Y') THEN department ELSE NULL
END) AS wanted_depts
FROM emp_dept GROUP BY empID
HAVING all_depts = wanted_depts AND wanted_depts = 2
Now, to combine this with other query condition, simply take a query that includes the other conditions, and join your employees table against the output of the query above:
SELECT empID, name, depts
FROM employees
JOIN (
SELECT empID,
GROUP_CONCAT(DISTINCT department ORDER BY department ASC) AS depts
FROM emp_dept GROUP BY empID
HAVING depts = 'X,Y'
) AS tmp USING (empID)
WHERE -- ...add other conditions here...
Here's an SQLFiddle demonstrating this query.
Ps. The reason why you should use a JOIN instead of an IN subquery for this is because MySQL is not so good at optimizing IN subqueries.
Specifically (as of v5.7, at least), MySQL always converts IN subqueries into dependent subqueries, so that the subquery must be re-executed for every row of the outer query, even if the original subquery was independent. For example, the following query (from the documentation linked above):
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
gets effectively converted into:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
This may still be reasonably fast, if t2 is small and/or has an index allowing fast lookups. However, if (like in the original example above) executing the subquery might take a lot of work, the performance can suffer badly. Using a JOIN instead allows the subquery to only be executed once, and thus typically offers much better performance.
What about a self join? (ANSI Compliant - worked for 20+ years)
SELECT * FROM employee e JOIN employee e2 ON e.empid = e2.empid
WHERE e.department = 'x' AND e2.department ='y'
This shows that a101 and a104 both work in both departments.
Solution using where clause:
select distinct e.empID
from employe e
where exists( select *
from employe
where empID = e.empID
having count(department) = count(case when department in('Y','X','Z') then department end)
and count(distinct department) = 3)
exists checks if there are records for specific EmpId that have total count of departments equal to conditional count of only matching departments and that it is also equal to the number of departments provided to the in clause. Also worth mentioning that here we apply having clause without the group by clause, on the whole set, but with already specified, only one empID.
SQLFiddle
You can achieve this without the correlated subquery, but with the group by clause:
select e.empId
from employe e
group by e.empID
having count(department) = count(case when department in('Y','X','Z') then department end)
and count(distinct department) = 3
SQLFiddle
You can also use another variation of having clause for the query above:
having count(case when department not in('Y','X', 'Z') then department end) = 0
and count(distinct case when department in('Y','X','Z') then department end) = 3
SQLFiddle
In Postgres this can be simplified using arrays:
select empid
from employee
group by empid
having array_agg(department order by department)::text[] = array['Y','Z'];
It's important to sort the elements in the array_agg() and compare them to a sorted list of departments in the same order. Otherwise this won't return correct answers.
E.g. array_agg(department) = array['Z', 'Y'] might potentially return wrong results.
This can be done in a more flexible manner using a CTE to supply the departments:
with depts_to_check (dept) as (
values ('Z'), ('Y')
)
select empid
from employee
group by empid
having array_agg(department order by department) = array(select dept from depts_to_check order by dept);
That way the sorting of the elements is always done by the database and will be consistent between the values in the aggregated array and the one to which it is compared.
An option with standard SQL is to check if at least one row has a different department together with counting all rows
select empid
from employee
group by empid
having min(case when department in ('Y','Z') then 1 else 0 end) = 1
and count(case when department in ('Y','Z') then 1 end) = 2;
The above solution won't work if it's possible that a single employee is assigned twice to the same department!
The having min (...) can be simplified in Postgres using the aggregate bool_and().
When applying the standard filter() condition to do conditional aggregation this can also be made to work with situation where an employee can be assigned to the same department twice
select empid
from employee
group by empid
having bool_and(department in ('Y','Z'))
and count(distinct department) filter (where department in ('Y','Z')) = 2;
bool_and(department in ('Y','Z')) only returns true if the condition is true for all rows in the group.
Another solution with standard SQL is to use the intersection between those employees that have at least those two departments and those that are assigned to exactly two departments:
-- employees with at least those two departments
select empid
from employee
where department in name in ('Y','Z')
group by empid
having count(distinct department) = 2
intersect
-- employees with exactly two departments
select empid
from employee
group by empid
having count(distinct department) = 2;

SQL Server Query using GROUP BY

I am having trouble writing a query that will select all Skills, joining the Employee and Competency records, but only return one skill per employee, their newest Skill. Using this sample dataset
Skills
======
id employee_id competency_id created
1 1 1 Jan 1
2 2 2 Jan 1
3 1 2 Jan 3
Employees
===========
id first_name last_name
1 Mike Jones
2 Steve Smith
Competencies
============
id title
1 Problem Solving
2 Compassion
I would like to retrieve the following data
Skill.id Skill.employee_id Skill.competency_id Skill.created Employee.id Employee.first_name Employee.last_name Competency.id Competency.title
2 2 2 Jan 1 2 Steve Smith 2 Compassion
3 1 2 Jan 3 1 Mike Jones 2 Compassion
I was able to select the employee_id and max created using
SELECT MAX(created) as created, employee_id FROM skills GROUP BY employee_id
But when I start to add more fields in the select statement or add in a join I get the 'Column 'xyz' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.' error.
Any help is appreciated and I don't have to use GROUP BY, it's just what I'm familiar with.
The error that you were getting is because SQL Server requires any item in the SELECT list to be included in the GROUP BY if there is an aggregate function being used.
The problem with that is you might have unique values in some columns which can throw off the result. So you will want to rewrite the query to use one of the following:
You can use a subquery to get this result. This gets the max(created) in a subquery and then you use that result to get the correct employee record:
select s.id SkillId,
s.employee_id,
s.competency_id,
s.created,
e.id employee,
e.first_name,
e.last_name,
c.id competency,
c.title
from Employees e
left join Skills s
on e.id = s.employee_id
inner join
(
SELECT MAX(created) as created, employee_id
FROM skills
GROUP BY employee_id
) s1
on s.employee_id = s1.employee_id
and s.created = s1.created
left join Competencies c
on s.competency_id = c.id
See SQL Fiddle with Demo
Or another way to do this is to use row_number():
select *
from
(
select s.id SkillId,
s.employee_id,
s.competency_id,
s.created,
e.id employee,
e.first_name,
e.last_name,
c.id competency,
c.title,
row_number() over(partition by s.employee_id
order by s.created desc) rn
from Employees e
left join Skills s
on e.id = s.employee_id
left join Competencies c
on s.competency_id = c.id
) src
where rn = 1
See SQL Fiddle with Demo
For every non-aggregated column you add to your SELECT statement you need to update your GROUP BY to include it.
This article may help you understand why.
;WITH
MAX_SKILL_created AS
(
SELECT
MAX(skills.created) as created,
skills.employee_id
FROM
skills
GROUP BY
skills.employee_id
),
MAX_SKILL_id AS
(
SELECT
MAX(skills.id) as id,
skills.employee_id
FROM
skills
INNER JOIN MAX_SKILL_created
ON MAX_SKILL_created.employee_id = skills.employee_id
AND MAX_SKILL_created.created = skills.created
GROUP BY
skills.employee_id
)
SELECT
* -- type all your columns here
FROM
employees
INNER JOIN MAX_SKILL_id
ON MAX_SKILL_id.employee_id = employees.employee_id
INNER JOIN skills
ON skills.id = MAX_SKILL_id.id
INNER JOIN competencies
ON competencies.id = skills.competency_id
If you are using SQL Server than you can use OUTER APPLY
SELECT *
FROM employees E
OUTER APPLY (
SELECT TOP 1 *
FROM skills
WHERE employee_id = E.id
ORDER BY created DESC
) S
INNER JOIN competencies C
ON C.id = S.competency_id

Can I use more than one column in subquery?

I want to show the names of all employees from the EMPLOYEES table who are working on more than three projects from the PROJECT table.
PROJECTS.PersonID is a a foreign key referencing EMPLOYEES.ID:
SELECT NAME, ID
FROM EMPLOYEES
WHERE ID IN
(
SELECT PersonID, COUNT(*)
FROM PROJECTS
GROUP BY PersonID
HAVING COUNT(*) > 3
)
Can I have both PersonID, COUNT(*) in that subquery, or there must be only one column?
Not in an IN clause (or at least not the way you are trying to use it. Some RDBMSs allow tuples with more than one column in the IN clause but it wouldn't help your case here)
You just need to remove the COUNT(*) from the SELECT list to achieve your desired result.
SELECT NAME, ID
FROM EMPLOYEES
WHERE ID IN
(
SELECT PersonID
FROM PROJECTS
GROUP BY PersonID
HAVING COUNT(*) > 3
)
If you wanted to also return the count you could join onto a derived table or common table expression with more than one column though.
SELECT E.NAME,
E.ID,
P.Cnt
FROM EMPLOYEES E
JOIN (SELECT PersonID,
Count(*) AS Cnt
FROM PROJECTS
GROUP BY PersonID
HAVING Count(*) > 3) P
ON E.ID = P.PersonID
To answer your question, you can only have 1 column for the IN subquery. You could get your results using the query below:
SELECT e.ID
,e.Name
FROM dbo.Projects p
LEFT OUTER JOIN dbo.Employees e
ON p.PersonID = e.ID
GROUP BY e.ID
,e.Name
HAVING COUNT(*) > 3