Oracle SQL - finding sets that contain another set - sql

Say I have a table A that contains a list of a potential employees ID's and their professional skills in the form of a skill code:
ID | skill code
005 12
005 3
007 42
007 8
013 6
013 22
013 18
And I have another table B that lists several job position ID's and their corresponding required skill ID's:
Job ID | skill code
1 3
1 32
1 21
1 44
2 15
2 62
.
.
.
How can I find out which Job Id's a specific person is qualified for? I need to select all Job Id's that contain all the person's skills. Say for instance I need to find all job ID's that employee ID 003 is qualified for, how would I structure an Oracle SQL query to get this information?
I want to be able to enter any employee ID in a WHERE clause to find what jobs that person is qualified for.

An idea would be to count the number of skills for every person and job:
SELECT A.id as person_id,
B.JOB_ID
FROM A
JOIN B
ON A.skill_code=B.skill_code
GROUP BY a.id, b.job_id
HAVING count(*) = (select count(*) from b b2 where b2.job_id = b.job_id);
Not tested and assuming that tables are well normalized.
UPDATE after the OP's comment.
It is asked for all the jobs which necessitate all skills of a person:
SELECT A.id as person_id,
B.JOB_ID
FROM A
JOIN B
ON A.skill_code=B.skill_code
GROUP BY a.id, b.job_id
HAVING count(*) = (select count(*) from a a2 where a2.job_id = b.job_id);
Update2: The question was updated with:
I want to be able to enter any employee ID in a WHERE clause to find what jobs that person is qualified for.
For this, you just add WHERE a.id = :emp_id to the first query. (above group by)

Try this one
WITH b1 AS
(SELECT job_id,
skill,
COUNT(*) over (partition BY job_id order by job_id) rr
FROM b
) ,
res1 AS
(SELECT a.id,
b1.job_id,
rr,
COUNT(*) over (partition BY id, job_id order by id) rr2
FROM A
JOIN B1
ON A.skill=B1.skill
)
SELECT id, job_id FROM res1 WHERE rr=rr2

Related

Hive - How to combine a group by over columns A and B and a distinct over column C

I need to create a query which selects from a particular table the users which have more than one different email. To distinguish users, I group them based on two fields: name and age. Let's see this with an example.
So I have a table like this:
name age email phone
----------------------------------
Andy 20 Andy#du 1234
Berni 21 Berni#du 2345
Carol 22 Carol#du 3456
Andy 20 Andy#du 4321
Berni 21 Berni#et 2345
Dody 28 Dodi#du 7869
Carol 22 Carol#pt 3456
What I want to get is:
Berni 21 Berni#du, Berni#et
Carol 22 Carol#du, Carol#pt
Note that Andy is also twice in the database but with same email (what changes is the phone number). Because of this user I need to make a distinc over email, so only users with two different emails are selected.
With this query I am able to solve the issue and I have the desired result.
select * from
(
select aux.name,
aux.age,
concat_ws(',',collect_set(email)) as email
FROM
(select a.name, a.age, a.email
FROM TestUsers a
RIGHT JOIN
(select name,
age
FROM TestUsers
GROUP BY
name,
age
having count(*) > 1
)b
ON a.name = b.name
AND a.age = b.age
)aux
GROUP BY aux.name,
aux.age
)tr
where locate(",",tr.email) > 0;
But I am sure it has to be a more efficient way than checking when there is not a comma in the email field(which means more than one email).
Has anyone in mind a better approach?
If I understand correctly, you should be able to do this using a having clause:
select tu.name, tu.age,
concat_ws(',', collect_list(tu.email)) as emails
from (select distinct tu.name, tu.age, tu.email
from TestUsers tu
) tu
group by tu.name, tu.age
having count(*) > 1;
Actually, because collect_set() removes duplicates, this should work without a subquery:
select tu.name, tu.age,
concat_ws(',', collect_set(tu.email)) as emails
from testusers tu
group by tu.name, tu.age
having min(tu.email) <> max(tu.email);

SQL: Finding duplicate records based on custom criteria

I need to find duplicates based on two tables and based on custom criteria. The following determines whether it's a duplicate, and if so, show only the most recent one:
If Employee Name and all EmployeePolicy CoverageId(s) are an exact match another record, then that's considered a duplicate.
--Employee Table
EmployeeId Name Salary
543 John 54000
785 Alex 63000
435 John 75000
123 Alex 88000
333 John 67000
--EmployeePolicy Table
EmployeePolicyId EmployeeId CoverageId
1 543 8888
2 543 7777
3 785 5555
4 435 8888
5 435 7777
6 123 4444
7 333 8888
8 333 7776
For example, the duplicates in the example above are the following:
EmployeeId Name Salary
543 John 54000
435 John 75000
This is because they are the only ones that have a matching name in the Employee table as well as both have the same exact CoverageIds in the EmployeePolicy table.
Note: EmployeeId 333 also with Name = John is not a match because both of his CoverageIDs are not the same as the other John's CoverageIds.
At first I have been trying to find duplicates the old fashioned way by Grouping records and saying having count(*) > 1, but then quickly realized that it would not work because while in English my criteria defines a duplicate, in SQL the CoverageIDs are different so they are NOT considered duplicates.
By that same accord, I tried something like:
-- Create a TMP table
INSERT INTO #tmp
SELECT *
FROM Employee e join EmployeePolicy ep on e.EmpoyeeId = ep.EmployeeId
SELECT info.*
FROM
(
SELECT
tmp.*,
ROW_NUMBER() OVER(PARTITION BY tmp.Name, tmp.CoverageId ORDER BY tmp.EmployeeId DESC) AS RowNum
FROM #tmp tmp
) info
WHERE
info.RowNum = 1 AND
Again, this does not work because SQL does not see this as duplicates. Not sure how to translate my English definition of duplicate into SQL definition of duplicate.
Any help is most appreciated.
The easiest way is to concatenate the policies into a string. That, alas, is cumbersome in SQL Server. Here is a set-based approach:
with ep as (
select ep.*, count(*) over (partition by employeeid) as cnt
from employeepolicy ep
)
select ep.employeeid, ep2.employeeid
from ep join
ep ep2
on ep.employeeid < ep2.employeeid and
ep.CoverageId = ep2.CoverageId and
ep.cnt = ep2.cnt
group by ep.employeeid, ep2.employeeid, ep.cnt
having count(*) = cnt -- all match
The idea is to match the coverages for different employees. A simple criteria is that the number of coverages need to match. Then, it checks that the number of matching coverages is the actual count.
Note: This puts the employee id pairs in a single row. You can join back to the employees table to get the additional information.
I have not tested the T-SQL but I believe the following should give you the output you are looking for.
;WITH CTE_Employee
AS
(
SELECT E.[Name]
,E.[EmployeeId]
,P.[CoverageId]
,E.[Salary]
FROM Employee E
INNER JOIN EmployeePolicy P ON E.EmployeeId = P.EmployeeId
)
, CTE_DuplicateCoverage
AS
(
SELECT E.[Name]
,E.[CoverageId]
FROM CTE_Employee E
GROUP BY E.[Name], E.[CoverageId]
HAVING COUNT(*) > 1
)
SELECT E.[EmployeeId]
,E.[Name]
,MAX(E.[Salary]) AS [Salary]
FROM CTE_Employee E
INNER JOIN CTE_DuplicateCoverage D ON E.[Name] = D.[Name] AND E.[CoverageId] = D.[CoverageId]
GROUP BY E.[EmployeeId], E.[Name]
HAVING COUNT(*) > 1
ORDER BY E.[EmployeeId]

sql select tuples and group by id

I have the current database schema
EMPLOYEES
ID | NAME | JOB
JOBS
ID | JOBNAME | PRICE
I want to query so that it goes through each employee, and gets all their jobs, but I want each employee ID to be grouped so that it returns the employee ID followed by all the jobs they have. e.g if employee with ID 1 had jobs with ID, JOBNAME (1, Roofing), (1,Brick laying)
I want it to return something like
1 Roofing Bricklaying
I was trying
SELECT ID,JOBNAME FROM JOBS WHERE ID IN (SELECT ID FROM EMPLOYEES) GROUP BY ID;
but get the error
not a GROUP BY expression
Hope this is clear enough, if not please say and I'll try to explain better
EDIT:
WITH ALL_JOBS AS
(
SELECT ID,LISTAGG(JOBNAME || ' ') WITHIN GROUP (ORDER BY ID) JOBNAMES FROM JOBS GROUP BY ID
)
SELECT ID,JOBNAMES FROM ALL_JOBS A,EMPLOYEES B
WHERE A.ID = B.ID
GROUP BY ID,JOBNAMES;
In the with clause, I am grouping by on ID and concatenating the columns corresponding to an ID(also concatenating with ' ' to distinguish the columns).
For example, if we have
ID NAME
1 Roofing
1 Brick laying
2 Michael
2 Schumacher
we will get the result set as
ID NAME
1 Roofing Brick laying
2 Michael Schumacher
Then, I am join this result set with the EMPLOYEES table on ID.
You need to put JobName to group by expression too.
SELECT ID,JOBNAME FROM JOBS WHERE ID IN (SELECT ID FROM EMPLOYEES) GROUP BY ID,JOBNAME;

SQL - Removing Duplicate without 'hard' coding?

Heres my scenario.
I have a table with 3 rows I want to return within a stored procedure, rows are email, name and id. id must = 3 or 4 and email must only be per user as some have multiple entries.
I have a Select statement as follows
SELECT
DISTINCT email,
name,
id
from table
where
id = 3
or id = 4
Ok fairly simple but there are some users whose have entries that are both 3 and 4 so they appear twice, if they appear twice I want only those with ids of 4 remaining. I'll give another example below as its hard to explain.
Table -
Email Name Id
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3
jimmy#domain.com jimmy 3
So in the above scenario I would want to ignore the jimmy with the id of 3, any way of doing this without hard coding?
Thanks
SELECT
email,
name,
max(id)
from table
where
id in( 3, 4 )
group by email, name
Is this what you want to achieve?
SELECT Email, Name, MAX(Id) FROM Table WHERE Id IN (3, 4) GROUP BY Email;
Sometimes using Having Count(*) > 1 may be useful to find duplicated records.
select * from table group by Email having count(*) > 1
or
select * from table group by Email having count(*) > 1 and id > 3.
The solution provided before with the select MAX(ID) from table sounds good for this case.
This maybe an alternative solution.
What RDMS are you using? This will return only one "Jimmy", using RANK():
SELECT A.email, A.name,A.id
FROM SO_Table A
INNER JOIN(
SELECT
email, name,id,RANK() OVER (Partition BY name ORDER BY ID DESC) AS COUNTER
FROM SO_Table B
) X ON X.ID = A.ID AND X.NAME = A.NAME
WHERE X.COUNTER = 1
Returns:
email name id
------------------------------
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3

write a query to identify discrepancy

I have a table with Student ID's and Student Names. There has been issues with assigning unique Student Id's to students and Hence I want to find the duplicates
Here is the sample Table:
Student ID Student Name
1 Jack
1 John
1 Bill
2 Amanda
2 Molly
3 Ron
4 Matt
5 James
6 Kathy
6 Will
Here I want a third column "Duplicate_Count" to display count of duplicate records.
For e.g. "Duplicate_Count" would display "3" for Student ID = 1 and so on. How can I do this?
Thanks in advance
Select StudentId, Count(*) DupCount
From Table
Group By StudentId
Having Count(*) > 1
Order By Count(*) desc,
Select
aa.StudentId, aa.StudentName, bb.DupCount
from
Table as aa
join
(
Select StudentId, Count(*) as DupCount from Table group by StudentId
) as bb
on aa.StudentId = bb.StudentId
The virtual table gives the count for each StudentId, this is joined back to the original table to add the count to each student record.
If you want to add a column to the table to hold dupcount, this query can be used in an update statement to update that column in the table
This should work:
update mytable
set duplicate_count = (select count(*) from mytable t where t.id = mytable.id)
UPDATE:
As mentioned by #HansUp, adding a new column with the duplicate count probably doesn't make sense, but that really depends on what the OP originally thought of using it for. I'm leaving the answer in case it is of help for someone else.