PL/SQL Comparing Tables - sql

I have task of matching candidates in my database with suitable job vacancies based on skill and availability, using sql and pl/sql only.
I have managed to write the following code which matches available candidates to available vacancies.
DECLARE
CURSOR availableCandidates_cur IS
SELECT * FROM candidate
WHERE candidate.available = 'True';
CURSOR availableJobs_cur IS
SELECT *
FROM position WHERE status = 'Open';
BEGIN
DBMS_OUTPUT.PUT_LINE('Available Candidates with matching vacencies');
FOR availableCandidates_rec IN availableCandidates_cur
LOOP
DBMS_OUTPUT.PUT_LINE('Candidate: ' || availableCandidates_rec.firstName || ' ' || availableCandidates_rec.lastName);
FOR availableJobs_rec IN availableJobs_cur
LOOP
IF (availableCandidates_rec.positionType = availableJobs_rec.positionType) THEN
DBMS_OUTPUT.PUT_LINE(availableJobs_rec.positionName);
END IF;
END LOOP;
END LOOP;
END;
I am struggling to figure out how to now match candidates to positions based on matching skills. The tables in question are
candidateSkills
candidateID | skillID
1 | 2
1 | 3
2 | 1
3 | 1
3 | 3
positionSkills
positionID | skillID
1 | 1
1 | 3
2 | 1
3 | 2
3 | 3
So for example i would like to output that
Candidate 1 Matches
position 3
Candidate 2 Matches
position 2
Candidate 3 Matches
position 2
position 3
I fear i may have gone down the wrong path intially which has lead to my confusion.
I would appreciate if someone could help steer me in the right direction.
Thanks

Corrected. Candidate 3 matches jobs 1 and 2, candidate 2 matches job 2, candidate 1 matches job 3
select distinct c.cid, j.jid
from candidate c, jobs j
where j.sid=c.sid
and not exists
(select 'x' from jobs j2 where j2.jid=j.jid
and j2.sid not in (select c2.sid from candidate c2
where c2.cid=c.cid))

--All candidates that match every skill in a position
select distinct candidateID, positionID
from
(
--Match candidates and positions, count number of skills that match
select candidateID, positionID, skills_per_position
,count(*) over (partition by candidateID, positionID) matched_skills
from candidateSkills
inner join
(
--Number of skills per position
select positionID, skillID
,count(*) over (partition by positionID) skills_per_position
from positionSkills
where status = 'Open'
) positionSkills_with_count
on candidateSkills.skillID = positionSkills_with_count.skillID
where available = 'True'
)
where matched_skills = skills_per_position
order by candidateID, positionID;
Using these scripts to build the tables:
create table candidateSkills as
select 1 candidateid, 2 skillID, 'True' available from dual union all
select 1 candidateid, 3 skillID, 'True' available from dual union all
select 2 candidateid, 1 skillID, 'True' available from dual union all
select 3 candidateid, 1 skillID, 'True' available from dual union all
select 3 candidateid, 3 skillID, 'True' available from dual;
create table positionSkills as
select 1 positionID, 1 skillID, 'Open' status from dual union all
select 1 positionID, 3 skillID, 'Open' status from dual union all
select 2 positionID, 1 skillID, 'Open' status from dual union all
select 3 positionID, 2 skillID, 'Open' status from dual union all
select 3 positionID, 3 skillID, 'Open' status from dual;
However, my results are slightly different. Candidate 3 matches position 1 and 2, not 2 and 3. I hope this is just a typo in your example.
Also, I didn't format my output exactly like yours. It can be a bit tricky to have SQL to display results in a multi-line format. However, leaving the SQL unformatted will also make it more useful if you want to use it in some other process.

Related

How to replace subquery with in statement in bigquery

I happen to stumble with a problem using bigquery, I have to build a query where I need to limit the number of ids within the left join to a subset of a query, unfortunately bigquery does not support subquery.
I've been trying to find a solution that will allow me to place this constraint within the join but haven't been successful usually the solution I encounter suggest the usage of crossjoin but I haven't had success with it so far, here is in a nutshell the table structure I have and the query I'm trying to construct:
#standardSQL
WITH User AS (
SELECT 1 AS id, "A" AS items UNION ALL
SELECT 2 AS id, "B" AS items UNION ALL
SELECT 3 AS id, "c" AS items),
Label_User AS (
SELECT 1 AS user_id, 1 AS label_id UNION ALL
SELECT 1 AS user_id, 4 AS label_id UNION ALL
SELECT 1 AS user_id, 3 AS label_id UNION ALL
SELECT 2 AS user_id, 1 AS label_id UNION ALL
SELECT 2 AS user_id, 2 AS label_id),
Labels AS (
SELECT 1 AS id, "Test" AS label UNION ALL
SELECT 2 AS id, "Admin" AS label UNION ALL
SELECT 3 AS id, "Local" AS label UNION ALL
SELECT 4 AS id, "External" AS label)
select * from User left join Label_User on id=user_id and
label_id in (select id from Labels where label = "External" or label ="Local")
-- This works for a single record of label id
-- select * from User left join Label_User on id=user_id and label_id = 1
Any help would be very appreciated.
Edit 1
Thanks #mikhail-berlyant for his suggestion, but the issue I've found with having the condition in the where clause, it's that it filters out some records that I need, so the result I'm looking for looks like this:
id items user_id label_id
1 A 1 4
1 A 1 3
2 B null null
3 C null null
But having the filter in the where output this:
Row id items user_id label_id
1 A 1 4
1 A 1 3
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM User
LEFT JOIN (
SELECT *
FROM User
LEFT JOIN Label_User
ON id = user_id
WHERE label_id IN (SELECT id FROM Labels WHERE label = "External" OR label ="Local")
)
USING (id, items)
when applied t sample data from your question - output as below
Row id items user_id label_id
1 1 A 1 4
2 1 A 1 3
3 2 B null null
4 3 C null null

Database Peer to peer relationship

I have a table case which has case Id case Name
caseId | CaseName
------ |---------
1 | Case 1
-------|-------
2 |Case2
-------|-------
3 |Case 3
-------|-------
4 |Case 4
I have a requirement where all these cases are related, something like this:-
1-2
1-3
1-4
2-3
2-4
3-4
How to store the records in an efficient way
Make another table. Case Id and Related Case Id columns should have same data type as Case Id in your first table.
Case Id Related Case Id
1 4
2 4
2 3
Since it is a many to many relation, you can create a junction table with both columns as a foreign key to case.
case_relation (case_id_1, case_id_2). Something like this.
If caseId column have unique values
with t( caseId )as (
select 1 from dual union all
select 2 from dual union all
select 3 from dual union all
select 4 from dual
)
select t1.caseId, t2.caseId from t t1
cross join t t2
where
t1.caseId < t2.caseId

How to write optimized query for multiple prioritize conditional joins in SQL server?

The scenario I'm after for is :
Result = Nothing
CollectionOfTables = Tbl1, Tbl2, Tbl3
While(True){
CurrentTable = GetHighestPriorityTable(CollectionOfTables)
If(CurrentTable) = Nothing Then Break Loop;
RemoveCurrentTableFrom(CollectionOfTables)
ForEach ID in CurrentTable as TempRow {
If(Result.DoesntContainsId(ID)) Then Result.AddRow(TempRow)
}
}
Assume I have following three tables.
IdNameTable1, Priority 1
1 John
2 Mary
3 Elsa
IdNameTable2, Priority 2
2 Steve
3 Max
4 Peter
IdNameTable3, Priority 3
4 Frank
5 Harry
6 Mona
Here is the final result I need.
IdNameResult
1 John
2 Mary
3 Elsa
4 Peter
5 Harry
6 Mona
A few tips to keep in mind.
Number of actual tables is 10.
Number of rows per table exceeds 1 Million.
It's not necessary to use join in query, but because of amount of data I'm working with the query must be optimized and used set-operations in SQL not a Cursor script.
Here's a way to do it using UNION and ROW_NUMBER():
;With Cte As
(
Select Id, Name, 1 As Prio
From Table1
Union All
Select Id, Name, 2 As Prio
From Table2
Union All
Select Id, Name, 3 As Prio
From Table3
), Ranked As
(
Select Id, Name, Row_Number() Over (Partition By Id Order By Prio) As RN
From Cte
)
Select Id, Name
From Ranked
Where RN = 1
Order By Id Asc;

Select row which not followed by specific one

I have table with list of candidates and linked tabled with history of candidate statuses:
CandidateId FirstName LastName
--------------------------------
1 User One
2 User Two
and
CandidateStatusId CandidateId Status Timestamp
--------------------------------------------------------
1 1 Assigned ...
2 1 Interviewed ...
3 1 Offer Accepted ...
1 2 Assigned ...
2 2 Interviewed ...
3 2 Offer Accepted ...
4 2 Hired ...
5 2 Bench ...
6 2 Hired ...
1 3 Assigned ...
2 3 Interviewed ...
3 3 Offer Accepted ...
4 3 Hired ...
5 3 Bench ...
I want select candidates which has last status is 'Offer Accepted' and never before was 'Hired'. In my example only 1st user should be selected because second already hired and third was hired before (and actually on bench).
UPD: I prepared SQL statement which should filter users but not sure about its speed, number of users may be quite big:
SELECT * FROM dbo.CandidatePositionStatus
WHERE CandidateId=34841
AND 'Hired' NOT IN (SELECT Status FROM dbo.CandidatePositionStatus WHERE CandidateId=34841)
But I do not know how to embed it in another select to provide CandidateId
UPD2: I prepared another query, but it is just checking whether candidate has OA status and hasn't 'HR' status, but speed of query is still opened question.
SELECT DISTINCT CandidateId
FROM dbo.CandidatePositionStatus
WHERE
CandidateId IN (
SELECT CandidateId FROM dbo.CandidatePositionStatus WHERE PositionStatusForCandidateCode='Offer Accepted' AND FirstWorkingDay IS NOT NULL
)
AND CandidateId NOT IN (
SELECT CandidateId FROM dbo.CandidatePositionStatus WHERE PositionStatusForCandidateCode='Gired'
)
Please check whether the following query is enough. I have omitted the case of Hire since basic checking is for last value Offer Accepted.
select
CandidateId
From(
select
*,
MAX(CandidateStatusId) over(partition by CandidateId) MaxVal
From
CandidatePositionStatus
)x
where MaxVal=CandidateStatusId and [Status]='Offer Accepted'
Based on your requirement I wrote this.
Not tested.
select CandidateId
from dbo.CandidatePositionStatus
group by CandidateId
having sum(case when PositionStatusForCandidateCode = 'Offer Accepted' then 1 else 0 end) = 1
and sum(case when PositionStatusForCandidateCode = 'Hired' then 1 else 0 end) = 0;

Evaluate WHERE predicates on analytic functions before other predicates (Oracle analytic functions)

Background
Sample data set
#Employee
Id | Period | Status
---------------------
1 | 1 | L
1 | 2 | G
2 | 3 | L
I want a simple select query to yield employees' latest record (by period) only if the status='L'.
The results would look like this:
#Desired Results
Id | Period | Status | Sequence
-------------------------------
2 | 3 | L | 1
Naive attempt
Obviously, my naive attempt at a query does not work:
#select query
SELECT *, RANK() OVER (PARTITION BY id ORDER BY period ASC) sequence
FROM employees
WHERE status = 'L'
AND sequence = 1
Which results in the following:
#Naive (incorrect) Results
ID | Period | Status | Sequence
-------------------------------
1 | 1 | L | 1
2 | 3 | L | 1
Knowing the order that clauses are evaluated in SQL explains why it doesn't work. Here is how my query is evaluated:
Isolate rows where status='L'
Rank the rows
Isolate top rank row
I want the following:
Rank rows
Isolate the top ranked rows
Isolate where status='L'
Questions
Is possible--with only a simple modification to the SELECT/WHERE clauses and using only basic predicate operators--to ensure that predicates based on analytic functions in the WHERE clause get evaluated before the non-aggregate predicates?
Anyone have other solutions that can be implemented as an end-user in Oracle Discoverer Plus?
Thanks!
Is it possible to do this without a sub-query
Technically the following is not a sub-query but a derived table
SELECT *
FROM (
SELECT *,
RANK() OVER (PARTITION BY id ORDER BY period ASC) sequence
FROM employees
) t
WHERE status = 'L'
AND sequence = 1
I can't think of a different solution to your problem.
The classic Group by
SELECT e.id, e.period, e.status, 1 sequence
FROM
(
SELECT id, min(period) period
FROM employees
GROUP BY id
) X
JOIN employees e on e.period=X.period and e.id=X.id
WHERE e.status = 'L'
Exists
select e.id, e.period, e.status, 1 sequence
FROM employees e
WHERE e.status = 'L'
AND NOT EXISTS (select *
from employees e2
where e2.id=e.id and e2.period>e.period)
I'll probably have to do a "Dobby" and slam my ear in the oven door and iron my hands for this...
You can create a function which evaluates the current row.
Note that this is inherently non-scalable. But I guess it's better than nothing.
Create the sample data:
--drop table employee purge;
create table employee(
id number not null
,period number not null
,status char(1) not null
,constraint employee_pk primary key(id, period)
);
insert into employee(id,period, status) values(1, 1, 'L');
insert into employee(id,period, status) values(1, 2, 'G');
insert into employee(id,period, status) values(2, 3, 'L');
commit;
Create the slowest function in the database:
create or replace function i_am_slow(
ip_id employee.id%type
,ip_period employee.period%type
)
return varchar2
as
l_count number := 0;
begin
select count(*)
into l_count
from employee e
where e.id = ip_id
and e.period = ip_period
and e.status = 'L'
and not exists(
select 'x'
from employee e2
where e2.id = e.id
and e2.period > e.period);
if l_count = 1 then
return 'Y';
end if;
return 'N';
end;
/
Demonstrates the use of the function:
select id, period, status
from employee
where i_am_slow(id, period) = 'Y';
ID PERIOD STATUS
---------- ---------- ------
2 3 L
Rushes towards the oven...
select * from
(SELECT a.*, rank() OVER (ORDER BY period ASC) sequence
from
(select * from
(
select 1 id, 1 period, 'L' status from dual
union all
select 1 id, 2 period, 'G' status from dual
union all
select 2 id, 3 period, 'L' status from dual
)
where status = 'L'
) a
)
where sequence = 1