SQL server matching two table on a column

SQL server matching two table on a column - sql

I have two tables one storing user skills another storing skills required for a job. I want to match how many skills a of each user matches with a job.
The table structure is
Table1: User_Skills
| ID | User_ID | Skill |
---------------------------
| 1 | 1 | .Net |
---------------------------
| 2 | 1 | Software|
---------------------------
| 3 | 1 | Engineer|
---------------------------
| 4 | 2 | .Net |
---------------------------
| 5 | 2 | Software|
---------------------------
Table2: Job_Skills_Requirement
| ID | Job_ID | Skill |
--------------------------
| 1 | 1 | .Net |
---------------------------
| 2 | 1 | Engineer|
---------------------------
| 3 | 1 | HTML |
---------------------------
| 4 | 2 | Software|
---------------------------
| 5 | 2 | HTML |
---------------------------
I was trying to have comma separated skills and compare but these can be in different order.
Edit
All the answers here are excellent. The result I am looking for is matching all jobs with all users as later on I will match other properties as well.

You could join the tables by the skill columns and count the matches:
SELECT user_id, job_id, COUNT(*) AS matching_skills
FROM user_skills u
JOIN job_skills_requirement j ON u.skill = j.skill
GROUP BY user_id, job_id
EDIT:
IF you want to also show users and jobs that have no matching skills, you can use a full outer join instead.
SELECT user_id, job_id, COUNT(*) AS matching_skills
FROM user_skills u
FULL OUTER JOIN job_skills_requirement j ON u.skill = j.skill
GROUP BY user_id, job_id
EDIT 2:
As Jiri Tousek commented, the above query will produce nulls where there's no match between a user and a job. If you want a full Cartesian products between them, you could use (abuse?) the cross join syntax and count how many skills actually match between each user and each job:
SELECT user_id,
job_id,
COUNT(CASE WHEN u.skill = j.skill THEN 1 END) AS matching_skills
FROM user_skills u
CROSS JOIN job_skills_requirement j
GROUP BY user_id, job_id

If you want to match all users and all jobs, then Mureinik's otherwise excellent answer is not correct.
You need to generate all the rows first, which I would do using a cross join and then count the matching ones:
select u.user_id, j.job_id, count(jsr.job_id) as skills_in_common
from users u cross join
jobs j left join
user_skills us
on us.user_id = u.user_id left join
Job_Skills_Requirement jsr
on jsr.job_id = j.job_id and
jsr.skill = us.skill
group by u.user_id, j.job_id;
Note: This assumes the existence of a users and a jobs table. You can of course generate these using subqueries.

WITH User_Skills(ID,User_ID,Skill)AS(
SELECT 1,1,'.Net' UNION ALL
SELECT 2,1,'Software' UNION ALL
SELECT 3,1,'Engineer' UNION ALL
SELECT 4,2,'.Net' UNION ALL
SELECT 5,2 ,'Software'
),Job_Skills_Requirement(ID,Job_ID,Skill)AS(
SELECT 1,1,'.Net' UNION ALL
SELECT 2,1,'Engineer' UNION ALL
SELECT 3,1,'HTML' UNION ALL
SELECT 4,2,'Software' UNION ALL
SELECT 5,2 ,'HTML'
),Job_User_Skill AS (
SELECT j.Job_ID,u.User_ID,u.Skill
FROM Job_Skills_Requirement AS j INNER JOIN User_Skills AS u ON u.Skill=j.Skill
)
SELECT jus.Job_ID,jus.User_ID,COUNT(jus.Skill),STUFF(c.Skills,1,1,'') AS Skill
FROM Job_User_Skill AS jus
CROSS APPLY(SELECT ','+j.Skill FROM Job_User_Skill AS j WHERE j.Job_ID=jus.Job_ID AND j.User_ID=jus.User_ID FOR XML PATH('')) c(Skills)
GROUP BY jus.Job_ID,jus.User_ID,c.Skills
ORDER BY jus.Job_ID
Job_ID User_ID Skill
----------- ----------- ----------- -------------
1 1 2 .Net,Engineer
1 2 1 .Net
2 1 1 Software
2 2 1 Software

Related

A better way to aggregate into a default value

For this example I have three tables (individual, business, and ind_to_business). Individual has information on people. Business has information on businesses. And ind_to_business has information on which people are linked to which business. Here are their DDL:
CREATE TABLE individual
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE business
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE ind_to_business
(
ID INTEGER PRIMARY KEY,
IND_ID REFERENCES individual(id),
BUS_ID REFERENCES business(id),
START_DT DATE NOT NULL,
END_DT DATE
);
I'm looking for the best way to display one row for each person. If they are linked to one business, I want to display the the business's ENTERPRISE_ID. If they are linked to more than one business, I want to display the default value 'Multiple'. They will always be linked to a business, so there is no LEFT JOIN necessary. They can also be linked to a business more than once (Leaving and coming back). Multiple records for the same business would be aggregated.
So for the following sample data:
Individual:
+----+------------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+------------+---------------+
| 1 | John Smith | 53a23B7 |
| 2 | Jane Doe | 63f2a35 |
+----+------------+---------------+
Business:
+----+----------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+----------+---------------+
| 3 | ABC Corp | 2a34d9b |
| 4 | XYZ Inc | 34bf21e |
+----+----------+---------------+
ind_to_business
+----+--------+--------+-------------+-------------+
| ID | IND_ID | BUS_ID | START_DT | END_DT |
+----+--------+--------+-------------+-------------+
| 5 | 1 | 3 | 01-JAN-2000 | 31-DEC-2002 |
| 6 | 1 | 3 | 01-JAN-2015 | |
| 7 | 2 | 3 | 01-JAN-2000 | |
| 8 | 2 | 4 | 01-MAR-2006 | 05-JUN-2010 |
| 9 | 2 | 4 | 15-DEC-2019 | |
+----+--------+--------+-------------+-------------+
I would expect the following output:
+---------+------------+------------+
| IND_ID | NAME | LINKED_BUS |
+---------+------------+------------+
| 53a23B7 | John Smith | 2a34d9b |
| 63f2a35 | Jane Doe | Multiple |
+---------+------------+------------+
Here is my current query:
SELECT DISTINCT
sub.ind_id,
sub.name,
DECODE(sub.bus_count, 1, sub.bus_id, 'Multiple') AS LINKED_BUS
FROM (SELECT i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID,
COUNT(DISTINCT b.enterprise_id) OVER (PARTITION BY i.id) AS BUS_COUNT
FROM individual i
INNER JOIN ind_to_business i2b ON i.id = i2b.ind_id
INNER JOIN business b ON i2b.bus_id = b.id) sub;
My query works, but this is running on a large dataset and taking a long time to run. I'm wondering if anyone has any ideas on how improve this so that there isn't so much wasted processing (i.e Needing to do a DISTINCT on the final result or doing COUNT(DISTINCT) in the inline view only to use that value in the DECODE above).
I've also created a DBFiddle for this question. (Link)
Thanks in advance for any input.

You could try and use a correlated subquery. This removes the need for outer distinct:
SELECT
i.enterprise_id ind_id,
i.name,
(
SELECT DECODE(COUNT(DISTINCT b.enterprise_id), 1, MIN(bus_id), 'Multiple')
FROM ind_to_business i2b
INNER JOIN business b ON i2b.bus_id = b.id
WHERE i2b.ind_id = i.id
) linked_bus
FROM individual i

You can join with the aggregated ind_to_business per individual. One way to do this:
select i.id, i.name, coalesce(b.enterprise_id, 'Multiple')
from individual i
join
(
select
ind_id,
case when min(bus_id) = max(bus_id) then min(bus_id) else null end as bus_id
from ind_to_business
group by ind_id
) ib on ib.ind_id = i.id
left join business b on b.id = ib.bus_id
order by i.id;

First you should sub-query to get all needed dimensions and then do all your final aggregation using CASE statement.
select
ind_id,
name,
case
when count(*) > 1 then 'Multiple'
else ind_id
end as linked_bus
from
(
select
distinct i.enterprise_id as ind_id,
i.name,
b.enterprise_id as bus_id
from individual i
join ind_to_business i2b
on i.id = i2b.ind_id
join business b
on i2b.bus_id = b.id
) vals
group by
ind_id,
name
order by
ind_id

No need of using DISTINCT twice. You could use subquery factoring and put the in-line view in WITH clause, and make the data set DISTINCT in the subquery itself.
WITH data AS
(
SELECT distinct
i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID
FROM individual i
JOIN ind_to_business i2b ON i.id = i2b.ind_id
JOIN business b ON i2b.bus_id = b.id
)
SELECT ind_id,
name,
case
when count(*) = 1 then MIN(bus_id)
else 'Multiple'
end AS LINKED_BUS
FROM data
GROUP BY ind_id, name;
IND_ID NAME LINKED_BUS
---------- ---------- -------------------------
53a23B7 John Smith 2a34d9b
63f2a35 Jane Doe Multiple

Count how many times a value appears in tables SQL

Here's the situation:
So, in my database, a person is "responsible" for job X and "linked" to job Y. What I want is a query that returns: name of person, his ID and he number of jobs it's linked/responsible. So far I got this:
select id_job, count(id_job) number_jobs
from
(
select responsible.id
from responsible
union all
select linked.id
from linked
GROUP BY id
) id_job
GROUP BY id_job
And it returns a table with id in the first column and number of occurrences in the second. Now, what I can't do is associate the name of person to the table. When i put that in the "select" from beginning it gives me all the possible combinations... How can I solve this? Thanks in advance!
Example data and desirable output:
| Person |
id | name
1 | John
2 | Francis
3 | Chuck
4 | Anthony
| Responsible |
process_no | id
100 | 2
200 | 2
300 | 1
400 | 4
| Linked |
process_no | id
101 | 4
201 | 1
301 | 1
401 | 2
OUTPUT:
| OUTPUT |
id | name | number_jobs
1 | John | 3
2 | Francis | 3
3 | Chuck | 0
4 | Anthony | 2

Try this way
select prs.id, prs.name, count(*) from Person prs
join(select process_no, id
from Responsible res
Union all
select process_no, id
from Linked lin ) a on a.id=prs.id
group by prs.id, prs.name

I would recommend aggregating each of the tables by the person and then joining the results back to the person table:
select p.*, coalesce(r.cnt, 0) + coalesce(l.cnt, 0) as numjobs
from person p left join
(select id, count(*) as cnt
from responsible
group by id
) r
on r.id = p.id left join
(select id, count(*) as cnt
from linked
group by id
) l
on l.id = p.id;

select id, name, count(process_no) FROM (
select pr.id, pr.name, res.process_no from Person pr
LEFT JOIN Responsible res on pr.id = res.id
UNION
select pr.id, pr.name, lin.process_no from Person pr
LEFT JOIN Linked lin on pr.id = lin.id) src
group by id, name
order by id
Query ain't tested, give it a shot, but this is the way you want to go

INNER JOIN and COUNT in the same query

I am having trouble with putting together INNER JOIN and COUNT in the same query.
Tables are:
TABLE STREETS
ID | STREET_NAME
------------------------
1 | Elm street
2 | Some other street
3 | Unknown street
4 | Killer street
5 | Dead-end street
TABLE ACCIDENTS_STREETS
STREET_ID | ACCIDENT_ID
-----------------------
2 | 4
2 | 7
2 | 2
2 | 1
5 | 3
I would like to get the street name where most accidents have occured.
This is for COUNT:
SELECT TOP 1 COUNT(STREET_ID) AS dangerous_street FROM ACCIDENTS_STREETS GROUP BY STREET_ID ORDER BY dangerous_street DESC
How to add INNER JOIN there to get only the name of the street?
Any advice is appreciated!

The Following should work
SELECT TOP 1 S.STREET_NAME,COUNT(a.*) AS dangerous_street
FROM ACCIDENTS_STREETS A
inner Join STREET S on S.ID = A.STREET_ID
GROUP BY S.STREET_NAME ORDER BY dangerous_street DESC

try this...
After joining the streets table, you would have to use an aggregation function to get the name in the resultset
SELECT
TOP 1 COUNT(STREET_ID) AS dangerous_street
, min( STREET_NAME) dangerous_STREET_NAME
FROM ACCIDENTS_STREETS acc
inner join STREETS str
on acc.STREET_ID=str.id
GROUP BY STREET_ID
ORDER BY dangerous_street DESC

SQL left join two tables independently

If I have these tables:
Thing
id | name
---+---------
1 | thing 1
2 | thing 2
3 | thing 3
Photos
id | thing_id | src
---+----------+---------
1 | 1 | thing-i1.jpg
2 | 1 | thing-i2.jpg
3 | 2 | thing2.jpg
Ratings
id | thing_id | rating
---+----------+---------
1 | 1 | 6
2 | 2 | 3
3 | 2 | 4
How can I join them to produce
id | name | rating | photo
---+---------+--------+--------
1 | thing 1 | 6 | NULL
1 | thing 1 | NULL | thing-i1.jpg
1 | thing 1 | NULL | thing-i2.jpg
2 | thing 2 | 3 | NULL
2 | thing 2 | 4 | NULL
2 | thing 2 | NULL | thing2.jpg
3 | thing 3 | NULL | NULL
Ie, left join on each table simultaneously, rather than left joining on one than the next?
This is the closest I can get:
SELECT Thing.*, Rating.rating, Photo.src
From Thing
Left Join Photo on Thing.id = Photo.thing_id
Left Join Rating on Thing.id = Rating.thing_id

You can get the results you want with a union, which seems the most obvious, since you return a field from either ranking or photo.
Your additional case (have none of either), is solved by making the joins left join instead of inner joins. You will get a duplicate record with NULL, NULL in ranking, photo. You can filter this out by moving the lot to a subquery and do select distinct on the main query, but the more obvious solution is to replace union all by union, which also filters out duplicates. Easier and more readable.
select
t.id,
t.name,
r.rating,
null as photo
from
Thing t
left join Rating r on r.thing_id = t.id
union
select
t.id,
t.name,
null,
p.src
from
Thing t
left join Photo p on p.thing_id = t.id
order by
id,
photo,
rating

Here's what I came up with:
SELECT
Thing.*,
rp.src,
rp.rating
FROM
Thing
LEFT JOIN (
(
SELECT
Photo.src,
Photo.thing_id AS ptid,
Rating.rating,
Rating.thing_id AS rtid
FROM
Photo
LEFT JOIN Rating
ON 1 = 0
)
UNION
(
SELECT
Photo.src,
Photo.thing_id AS ptid,
Rating.rating,
Rating.thing_id AS rtid
FROM
Rating
LEFT JOIN Photo
ON 1 = 0
)
) AS rp
ON Thing.id IN (rp.rtid, rp.ptid)

MySQL has no support for full outer joins so you have to hack around it using a UNION:
Here's the fiddle: http://sqlfiddle.com/#!2/d3d2f/13
SELECT *
FROM (
SELECT Thing.*,
Rating.rating,
NULL AS photo
FROM Thing
LEFT JOIN Rating ON Thing.id = Rating.thing_id
UNION ALL
SELECT Thing.*,
NULL,
Photo.src
FROM Thing
LEFT JOIN Photo ON Thing.id = Photo.thing_id
) s
ORDER BY id, photo, rating

Select distinct where date is max

This feels really stupid to ask, but i can't do this selection in SQL Server Compact (CE)
If i have two tables like this:
Statuses Users
id | status | thedate id | name
------------------------- -----------------------
0 | Single | 2014-01-01 0 | Lisa
0 | Engaged | 2014-01-02 1 | John
1 | Single | 2014-01-03
0 | Divorced | 2014-01-04
How can i now select the latest status for each person in statuses?
the result should be:
Id | Name | Date | Status
--------------------------------
0 | Lisa | 2014-01-04 | Divorced
1 | John | 2014-01-03 | Single
that is, select distinct id:s where the date is the highest, and join the name. As bonus, sort the list so the latest record is on top.

In SQL Server CE, you can do this using a join:
select u.id, u.name, s.thedate, s.status
from users u join
statuses s
on u.id = s.id join
(select id, max(thedate) as mtd
from statuses
group by id
) as maxs
on s.id = maxs.id and s.thedate = maxs.mtd;
The subquery calculates the maximum date and uses that as a filter for the statuses table.

Use the following query:
SELECT U.Id AS Id, U.Name AS Name, S.thedate AS Date, S.status AS Status
FROM Statuses S
INNER JOIN Users U on S.id = U.id
WHERE S.thedate IN (
SELECT MAX(thedate)
FROM statuses
GROUP BY id);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL server matching two table on a column - sql

Related

A better way to aggregate into a default value

Count how many times a value appears in tables SQL

INNER JOIN and COUNT in the same query

SQL left join two tables independently

Select distinct where date is max

Categories

Resources