SQL query to find whoever has a skill what other skill do they have? - sql

enter image description here
I have two tables :
UserInfo
Skill
and the join table between them called UserSkill as you can see at the
right part of the diagram.
I want to know whoever knows or is skillful in Java, what else he is skillful at. I mean for example I know java, Go, PHP, python and user number 2 knows java and python and CSS. So the answer to the question: whoever knows java what else he knows would be GO, PHP, Python and CSS.
It's like recommendation systems for example whoever but this product what else do they bought? Like what we have in amazon ..
What would be the best query for this ?
Thank you
More information:
UserInfo
U-id U-name
1 A
2 B
3 C
SkillInfo
S-id S-Name
1 Java
2 GO
3 PHP
4 Python
5 CSS
UserSkill:
U-id S-id
1 1
1 2
1 3
1 4
2 1
2 4
2 5

In SQL Server 2017 and Azure SQL DB you can use the new graph database capabilities and the new MATCH clause to answer queries like this, eg
SELECT FORMATMESSAGE ( 'User %s has skill %s and other skill %s.', [user1].[U-name], mainSkill.[S-name], otherSkill.[S-name] ) result
FROM
dbo.users user1,
dbo.hasSkill hasSkill1, dbo.skills mainSkill, dbo.hasSkill hasSkill2, dbo.skills otherSkill
WHERE mainSkill.[S-name] = 'CSS'
AND otherSkill.[S-name] != mainSkill.[S-name]
AND MATCH ( mainSkill<-(hasSkill1)-user1-(hasSkill2)->otherSkill);
My results:
Obviously you can answer the same queries with a relational approach, it's just a different way of doing things. Full script available here.

To make this more dynamic, replace the hard coded 'java' with a variable that you can pass to filter by any skill type, possibly make a stored procedure so you can pass the variable,
Edited column names as I didn't look at the image you provided:
--Outer query selects all skills of users which are not java and user has skill of java,
--the inner query selects all user ids where the user has a skill of java
SELECT sk.[SkillName], ui.[UserName], ui.[UserId]
FROM [dbo].[Skill] AS sk
INNER JOIN [dbo].[UserSkill] AS us
ON us.[SkillId] = sk.[SkillId]
INNER JOIN [dbo].[UserInfo] AS ui
ON ui.[UserId] = us.[UserId]
WHERE sk.[Skill] <> 'java' AND ui.[UserId] IN (
SELECT [UserId]
FROM [dbo].[UserInfo] ui
INNER JOIN [dbo].[UserSkill] us
ON us.[UserId] = ui.[UserId]
INNER JOIN [dbo].[Skill] sk
ON sk.[SkillId] = us.[SkillId]
WHERE sk.[SkillName] = 'java')

This is what I have found
--Formatted query
select
o.UserName, k.SkillName
from
UserSkill S
inner join
UserSkill SS on s.UserID = ss.UserID
and s.SkillID = 1
and ss.SkillID <> 1
inner join
Skill k on k.SkillID = ss.SkillID
inner join
UsersINFO O on o.UserID = ss.UserID

Related

How do I use SQL to search a many to many relationship using AND

Can anyone help to create a SQL code which could list movies which have been searched under 2 or more tags for the tables below? E.g. I want to list all movies which have the tags “4star” AND “Drama”.
Tables
I have managed to create one which lists movies which have either one or another tag… thus.
Select tblMovies.MovieName
FROM tblMovies, tblBridge, tblTags
WHERE ((tblTags.TagID=1) OR (tblTags.TagID=5))
And tblTags.TagID = tblBridge.TagID
And tblBridge.MediaID= tblMovies.MovieID
Which gives Star Wars, Aliens, Goodfellows, Mermaids.
But I'm struggling with the AND code which would give Goodfellows and The Godfather if I search for movies which have tags 1 (4star) and 7 (Drama) for example.
Many thanks.
You are looking for movies for which exist both tags 1 and 7. We don't use joins usually when we only want to check whether data exists. We use EXISTS. Or IN, which expresses the same thing (movies that are in the set of tag 1 movies and also in the set of tag 7 movies).
The idea is that we select FROM the table we want to see results from. And we use the WHERE clause to tell the DBMS which rows we want to see.
With EXISTS
SELECT m.moviename
FROM tblmovies m
WHERE EXISTS (SELECT null FROM tblbridge b WHERE b.tagid = 1 AND b.movieid = m.movieid)
AND EXISTS (SELECT null FROM tblbridge b WHERE b.tagid = 7 AND b.movieid = m.movieid)
ORDER BY m.moviename;
With IN
SELECT m.moviename
FROM tblmovies m
WHERE m.movieid IN (SELECT b.movieid FROM tblbridge b WHERE b.tagid = 1)
AND m.movieid IN (SELECT b.movieid FROM tblbridge b WHERE b.tagid = 7)
ORDER BY m.moviename;
I should add that these are not the only options available to get that result. But they are the straight-forward ones. (Another is conditional aggregation, but you'll learn this later.)

Conditional join that changes number of join conditions

I am trying to join data based on the following scenario.
Let's say there are two businesses. Business 1 has one field for customer data, business 2 has two fields. I need to join to multiple other tables using these customer fields.
I would like to create a join that joins on just field 1 for business 1, but field 1 AND field 2 for business 2. In other words, there is a more granular identifier available for business 2, but it is still valid to join on just field 1 for business 1 as well. It also needs to function like an inner join, in that we are only preserving the relevant data that match these conditions.
The code would look something like this for business 1:
FROM customer_data a
INNER JOIN marketing_data b
ON a.member_number = b.member_number
WHERE business_number = 1
And something like this for business 2:
FROM customer_data a
INNER JOIN marketing_data b
ON a.member_number = b.member_number
AND a.sub_member_number = b.sub_member_number
WHERE business_number = 2
I am hoping to extract both sets of data in one join statement. Also, just in case it helps, I am using the Snowflake platform to write my queries.
Following should work for both the cases.
FROM customer_data a
INNER JOIN marketing_data b ON a.member_number = b.member_number
WHERE (
a.sub_member_number = b.sub_member_number
AND business_number = 2
)
OR business_number = 1
You can put the conditions in the ON clause like this:
FROM customer_data cd INNER JOIN
marketing_data md
ON cd.member_number = md.member_number AND
( cd.business_number <> 2 OR
cd.sub_member_number = md.sub_member_number
)
Note: this generalizes beyond just businesses 1 and 2, with the special condition only applying to 2. The first condition can be = 1 if you want to be more specific.
Also note that this introduces meaningful table aliases rather than arbitrary letters. This makes queries much easier to understand.

Moodle schema changes?

With a recent upgrade to Moodle 2.7, a customer of ours is reporting their CustomSQL reports are failing. For example, this query used to report gradeable items, but fails now:
SELECT
u.firstname AS "First",
u.lastname AS "Last",
c.fullname AS "Course",
a.name AS "Assignment"
FROM prefix_assignment_submissions AS asb
JOIN prefix_assignment AS a ON a.id = asb.assignment
JOIN prefix_user AS u ON u.id = asb.userid
JOIN prefix_course AS c ON c.id = a.course
JOIN prefix_course_modules AS cm ON c.id = cm.course
WHERE asb.grade < 0 AND cm.instance = a.id
AND cm.module = 1
ORDER BY c.fullname, a.name, u.lastname
A quick query or two to the DB shows there are zero rows in prefix_assignment_submissions and prefix_assignment. Suggestions?
The assignment module was replaced by the assign module in Moodle 2.2.
The old assignment module was disabled by default in Moodle 2.5 (I think) and removed completely in Moodle 2.7.
The query will need rewriting to use the assign_submissions table (and any other assign_* tables that are relevant).
I don't have a complete answer for you, but I can tell you that I also admin a Moodle 2.7 system, and my prefix_assignment_submissions table also has no records.
Additionally, I can give you the below query that I wrote to report on course final grades. We use this query for retention modeling through the semester, and for importing final grades to our student information system at the end of each term, where the idnumber in the mdl_course table will always match the course code followed by the year/term code in our student information system. I think it might be helpful because of how it uses the mdl_grade_items table: there are more itemtypes in that table than just course. In this table, an ungraded item would have a NULL value in the finalgrade field. Unfortunately, I don't know the Moodle internals enough to guarantee there will be a record in this table for every assignment, but it's a starting place.
SELECT u.username,u.lastname, u.firstname,c.shortname, left(c.idnumber, character_length(c.idnumber)-6) AS crs_cde,
right(c.idnumber,5) as yearterm,cast((gg.finalgrade/case when gi.grademax = 0 then 1 else gi.grademax end) * 100 as numeric(5,2)) finalgrade,
(SELECT l.letter
FROM mdl_context x
INNER JOIN mdl_grade_letters l ON l.contextid = x.id
WHERE x.instanceid in (c.id, 0) and l.lowerboundary <= round((gg.finalgrade/case when gi.grademax = 0 then 1 else gi.grademax end)*100,2)
ORDER BY x.id desc, lowerboundary desc limit 1) letter
FROM mdl_grade_grades gg
INNER JOIN mdl_grade_items gi ON gi.id=gg.itemid
INNER JOIN mdl_user u ON u.id=gg.userid
INNER JOIN mdl_course c on c.id = gi.courseid
INNER JOIN mdl_course_categories c2 on c2.id = c.category
WHERE gi.itemtype='course' and c2.visible = 1 and gg.finalgrade is not null
and char_length(c.idnumber) > 0 and right(c.idnumber,5)='20151';
We moved from MySQL to PostgreSQL when we updated to 2.7, but the only changes I needed to make to our queries were for date handling.
It's also worth mentioning that the assignment module was completely overhauled for version 2.3, and a lot of the docs for 2.3, 2.4, 2.5, etc were just copied over from the prior version. I've seen other changes be missed by this process. This especially holds for something like a contributed report. It's possible you're still seeing sql that hasn't been valid since 2.3.

How to get Django QuerySet 'exclude' to work right?

I have a database that contains schemas for skus, kits, kit_contents, and checklists. Here is a query for "Give me all the SKUs defined for kitcontent records defined for kit records defined in checklist 1":
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1;
I'm using Django, and I mostly really like the ORM because I can express that query by:
skus = SKU.objects.filter(kitcontent__kit__checklist_id=1).distinct()
which is such a slick way to navigate all those foreign keys. Django's ORM produces basically the same as the SQL written above. The trouble is that it's not clear to me how to get all the SKUs not defined for checklist 1. In the SQL query above, I'd do this by replacing the "=" with "!=". But Django's models don't have a not equals operator. You're supposed to use the exclude() method, which one might guess would look like this:
skus = SKU.objects.filter().exclude(kitcontent__kit__checklist_id=1).distinct()
but Django produces this query, which isn't the same thing:
SELECT distinct s.* FROM skus s
WHERE NOT ((skus.id IN
(SELECT kc.sku_id FROM kit_contents kc
INNER JOIN kits k ON (kc.kit_id = k.id)
WHERE (k.checklist_id = 1 AND kc.sku_id IS NOT NULL))
AND skus.id IS NOT NULL))
(I've cleaned up the query for easier reading and comparison.)
I'm a beginner to the Django ORM, and I'd like to use it when possible. Is there a way to get what I want here?
EDIT:
karthikr gave an answer that doesn't work for the same reason the original ORM .exclude() solution doesn't work: a SKU can be in kit_contents in kits that exist on both checklist_id=1 and checklist_id=2. Using the by-hand query I opened my post with, using "checklist_id = 1" produces 34 results, using "checklist_id = 2" produces 53 results, and the following query produces 26 results:
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1
JOIN kit_contents kc2 ON kc2.sku_id = s.id
JOIN kits k2 ON k2.id = kc2.kit_id
JOIN checklists c2 ON k2.checklist_id = 2;
I think this is one reason why people don't seem to find the .exclude() solution a reasonable replacement for some kind of not_equals filter -- the latter allows you to say, succinctly, exactly what you mean. Presumably the former could also allow the query to be expressed, but I increasingly despair of such a solution being simple.
You could do this - get all the objects for checklist 1, and exclude it from the complete list.
sku_ids = skus.values_list('pk', flat=True)
non_checklist_1 = SKU.objects.exclude(pk__in=sku_ids).distinct()

Why is this SQL query returning repeated records, when there not repeated in the database?

SELECT *
FROM support_systems,tickets
INNER JOIN user_access ON tickets.support_system_id = user_access.support_system_id
WHERE support_systems.account_id = #session.account_id#
AND user_access.user_access_level >= 1
AND user_access.user_id = #session.user_id#
Any clue why this query would return a record set with repeated records? The results are looking like this:
Priority ID Subject Status
high 1 First Subject open
high 1 First Subject open
low 3 Weeee open
low 3 Weeee open
medium 4 hhhhh closed
medium 4 hhhhh closed
medium 5 neat open
medium 5 neat open
Let me know if you guys need more information, thanks a lot.
You are selecting records from the table support_system but have not specified the join condition. What is the relationship between this table and the others you are interrogating?
You may want something like this
SELECT *
FROM support_systems
INNER JOIN tickets ON
support_systems.support_system_id = tickets.support_system_id
INNER JOIN user_access ON
tickets.support_system_id = user_access.support_system_id
WHERE support_systems.account_id = #session.account_id#
AND user_access.user_access_level >= 1
AND user_access.user_id = #session.user_id#
The problem is this line:
FROM support_systems,tickets
I would remove the tickets from the FROM clause and make it an inner join clause. Right now you have what's called a "cross product": http://en.wikipedia.org/wiki/Cross_product
I would have to say its probably becuase you have an explicite join and a non explicite join which isnt handled in the where which is producing a cartesian...
you have three tables...
but only two tables used in the join... you need a 2nd join... you need to include support_systems in your join somewhere.
probably like
from support_systems a left join user_access b on a.support_systems_id = b.support_systems_id
left join ticket c on c.support_systems_id = b.support_systems_id
then your where would be the same... and it would return based on the correctly joined tables.