SQL Server query: tagging query - sql

I'm stuck with a SQL query (for SQL Server). I'm new to SQL and I'm not making much progress. I've created a test project to test tagging.
I have 3 tables as follows:
Monster:
Name Description EatsPeople
Vampire Pale, afraid of light True
Ghost See-through, annoying False
Wraith Green-ish, ugly, dumb True
TagLookup:
Name ID
Ghost 1
Ghost 2
Wraith 1
Tags:
ID Text Value
1 Green green-skin
2 Screams like a banshee banshee-call
I'm trying to select all monsters that have the tag value 'green-skin'.

Assuming that Monsters.Name, Tags.Value, and the Name-ID combination in TagLookup are all unique:
SELECT m.Name, m.Description, m.EatsPeople
FROM dbo.Monster AS m
INNER JOIN dbo.TagLookup AS tl
ON m.Name = tl.Name
INNER JOIN dbo.Tags AS t
ON t.ID = tl.ID
AND t.Value = 'green-skin';

Related

How do I use SQL to search a many to many relationship using AND

Can anyone help to create a SQL code which could list movies which have been searched under 2 or more tags for the tables below? E.g. I want to list all movies which have the tags “4star” AND “Drama”.
Tables
I have managed to create one which lists movies which have either one or another tag… thus.
Select tblMovies.MovieName
FROM tblMovies, tblBridge, tblTags
WHERE ((tblTags.TagID=1) OR (tblTags.TagID=5))
And tblTags.TagID = tblBridge.TagID
And tblBridge.MediaID= tblMovies.MovieID
Which gives Star Wars, Aliens, Goodfellows, Mermaids.
But I'm struggling with the AND code which would give Goodfellows and The Godfather if I search for movies which have tags 1 (4star) and 7 (Drama) for example.
Many thanks.
You are looking for movies for which exist both tags 1 and 7. We don't use joins usually when we only want to check whether data exists. We use EXISTS. Or IN, which expresses the same thing (movies that are in the set of tag 1 movies and also in the set of tag 7 movies).
The idea is that we select FROM the table we want to see results from. And we use the WHERE clause to tell the DBMS which rows we want to see.
With EXISTS
SELECT m.moviename
FROM tblmovies m
WHERE EXISTS (SELECT null FROM tblbridge b WHERE b.tagid = 1 AND b.movieid = m.movieid)
AND EXISTS (SELECT null FROM tblbridge b WHERE b.tagid = 7 AND b.movieid = m.movieid)
ORDER BY m.moviename;
With IN
SELECT m.moviename
FROM tblmovies m
WHERE m.movieid IN (SELECT b.movieid FROM tblbridge b WHERE b.tagid = 1)
AND m.movieid IN (SELECT b.movieid FROM tblbridge b WHERE b.tagid = 7)
ORDER BY m.moviename;
I should add that these are not the only options available to get that result. But they are the straight-forward ones. (Another is conditional aggregation, but you'll learn this later.)

SQL query for crystal reports produces duplicate results

I created 3 tables TstInvoice, TstProd, TstPersons and added some data:
INVOICE_NBR CLIENT_NR VK_CONTACT
A10304 003145 AT
A10305 000079 EA
A10306 004458 AT
A10307 003331 JDJ
PROD_NR INVOICE_NBR
P29366 A10304
P29367 A10304
P29368 A10305
P29369 A10306
P29370 A10306
P29371 A10307
PERS_NR INITIALEN STATUS PERSOON
0001 AT 7 Alice Thompson
0002 EA 1 Edgar Allen
0003 JDJ 1 John Doe Joe
0004 AT 1 Arthur Twins
The parameter that is passed to the crystal report is the INVOICE_NBR.
On my crystal report I put some fields from the databases and one sql expression:
(
SELECT "TstPersons"."PERSOON" FROM "TstPersons"
WHERE "TstPersons"."INITIALEN" = "TstInvoice"."VK_CONTACT" AND "TstPersons"."STATUS" = 1
)
The full query that is generated:
SELECT "TstInvoice"."INVOICE_NBR", "TstInvoice"."CLIENT_NR", "TstPersons"."STATUS", "TstPersons"."PERSOON", "TstProd"."PROD_NR", "TstProd"."INVOICE_NBR", (
SELECT "TstPersons"."PERSOON" FROM "TstPersons"
WHERE "TstPersons"."INITIALEN" = "TstInvoice"."VK_CONTACT" AND "TstPersons"."STATUS" = 1
)
FROM ("GCCTEST"."dbo"."TstInvoice" "TstInvoice" INNER JOIN "GCCTEST"."dbo"."TstProd" "TstProd" ON "TstInvoice"."INVOICE_NBR"="TstProd"."INVOICE_NBR") INNER JOIN "GCCTEST"."dbo"."TstPersons" "TstPersons" ON "TstInvoice"."VK_CONTACT"="TstPersons"."INITIALEN"
WHERE "TstInvoice"."INVOICE_NBR"='A10304'
The result is as shown in the screenshot:
As you can see the TstPersons.PERSOON field is populated with Alice Thompson and the sql expression field is correctly populated with Arthur Twins. However, I would like only to see the prod_nr once. With this query it produces the prod numbers twice because of the double entry for "AT" despite the fact that I ask for only status 1. I could just delete the old entry but I want to know if it's possible this way.
* edit * I added the status = 1 to the "record selection formula editor" and that seems to work. Not need the sql expression field at all. Not sure if this is the correct way to go though.
So now it looks like this:
SELECT "TstInvoice"."INVOICE_NBR", "TstInvoice"."CLIENT_NR", "TstPersons"."STATUS", "TstPersons"."PERSOON", "TstProd"."PROD_NR", "TstProd"."INVOICE_NBR"
FROM ("GCCTEST"."dbo"."TstInvoice" "TstInvoice" INNER JOIN "GCCTEST"."dbo"."TstProd" "TstProd" ON "TstInvoice"."INVOICE_NBR"="TstProd"."INVOICE_NBR") INNER JOIN "GCCTEST"."dbo"."TstPersons" "TstPersons" ON "TstInvoice"."VK_CONTACT"="TstPersons"."INITIALEN"
WHERE "TstInvoice"."INVOICE_NBR"='A10304' AND "TstPersons"."STATUS"=1
You have a very weak join in your query due to the duplicate values found in the INITIALEN column. Using the STATUS = 1 criteria is a work-around more than a solution because if you ever need to report on an invoice where the contact has a status other than 1, you will need to modify the report's design to allow your join to work because the STATUS value is not found on the invoice to allow a proper join to occur.
You are also running a risk of this work-around breaking down completely should you have another contact with both the same initials and status values as another.
The correct way to solve this problem would be to join TstInvoice to TstPersons through a field that has unique values. The PERS_NR column appears to be a good choice for this.
This is also going to require a redesign of the TstInvoice table to include the PERS_NR column as a Foreign Key.
A stronger join between invoices and persons would also remove the need for that sub-query in you selection statement. This would simplify your query down to the following:
SELECT "TstInvoice"."INVOICE_NBR", "TstInvoice"."CLIENT_NR", "TstPersons"."STATUS", "TstPersons"."PERSOON", "TstProd"."PROD_NR", "TstProd"."INVOICE_NBR"
FROM "GCCTEST"."dbo"."TstInvoice" "TstInvoice"
INNER JOIN "GCCTEST"."dbo"."TstProd" "TstProd"
ON "TstInvoice"."INVOICE_NBR"="TstProd"."INVOICE_NBR"
INNER JOIN "GCCTEST"."dbo"."TstPersons" "TstPersons"
ON "TstInvoice"."PERS_NR"="TstPersons"."PERS_NR"
WHERE "TstInvoice"."INVOICE_NBR"='A10304'

SQL query to find whoever has a skill what other skill do they have?

enter image description here
I have two tables :
UserInfo
Skill
and the join table between them called UserSkill as you can see at the
right part of the diagram.
I want to know whoever knows or is skillful in Java, what else he is skillful at. I mean for example I know java, Go, PHP, python and user number 2 knows java and python and CSS. So the answer to the question: whoever knows java what else he knows would be GO, PHP, Python and CSS.
It's like recommendation systems for example whoever but this product what else do they bought? Like what we have in amazon ..
What would be the best query for this ?
Thank you
More information:
UserInfo
U-id U-name
1 A
2 B
3 C
SkillInfo
S-id S-Name
1 Java
2 GO
3 PHP
4 Python
5 CSS
UserSkill:
U-id S-id
1 1
1 2
1 3
1 4
2 1
2 4
2 5
In SQL Server 2017 and Azure SQL DB you can use the new graph database capabilities and the new MATCH clause to answer queries like this, eg
SELECT FORMATMESSAGE ( 'User %s has skill %s and other skill %s.', [user1].[U-name], mainSkill.[S-name], otherSkill.[S-name] ) result
FROM
dbo.users user1,
dbo.hasSkill hasSkill1, dbo.skills mainSkill, dbo.hasSkill hasSkill2, dbo.skills otherSkill
WHERE mainSkill.[S-name] = 'CSS'
AND otherSkill.[S-name] != mainSkill.[S-name]
AND MATCH ( mainSkill<-(hasSkill1)-user1-(hasSkill2)->otherSkill);
My results:
Obviously you can answer the same queries with a relational approach, it's just a different way of doing things. Full script available here.
To make this more dynamic, replace the hard coded 'java' with a variable that you can pass to filter by any skill type, possibly make a stored procedure so you can pass the variable,
Edited column names as I didn't look at the image you provided:
--Outer query selects all skills of users which are not java and user has skill of java,
--the inner query selects all user ids where the user has a skill of java
SELECT sk.[SkillName], ui.[UserName], ui.[UserId]
FROM [dbo].[Skill] AS sk
INNER JOIN [dbo].[UserSkill] AS us
ON us.[SkillId] = sk.[SkillId]
INNER JOIN [dbo].[UserInfo] AS ui
ON ui.[UserId] = us.[UserId]
WHERE sk.[Skill] <> 'java' AND ui.[UserId] IN (
SELECT [UserId]
FROM [dbo].[UserInfo] ui
INNER JOIN [dbo].[UserSkill] us
ON us.[UserId] = ui.[UserId]
INNER JOIN [dbo].[Skill] sk
ON sk.[SkillId] = us.[SkillId]
WHERE sk.[SkillName] = 'java')
This is what I have found
--Formatted query
select
o.UserName, k.SkillName
from
UserSkill S
inner join
UserSkill SS on s.UserID = ss.UserID
and s.SkillID = 1
and ss.SkillID <> 1
inner join
Skill k on k.SkillID = ss.SkillID
inner join
UsersINFO O on o.UserID = ss.UserID

SQL Server : multi-join with tuple IN clause

I'm trying to join 4 tables that have a somewhat complex relationship. Because of where this will be used, it needs to be contained in a single query, but I'm having trouble since the primary query and the IN clause query both join 2 tables together and the lookup is on two columns.
The goal is to input a SalesNum and SalesType and have it return the Price
Tables and relationships:
sdShipping
SalesNum[1]
SalesType[2]
Weight[3]
sdSales
SalesNum[1]
SalesType[2]
Zip[4]
spZones
Zip[4]
Zone[5]
spPrices
Zone[5]
Price
Weight[3]
Here's my latest attempt in T-SQL:
SELECT
spp.Price
FROM
spZones AS spz
LEFT OUTER JOIN
spPrices AS spp ON spz.Zone = spp.Zone
WHERE
(spp.Weight, spz.Zip) IN (SELECT ship.Weight, sales.Zip
FROM sdShipping AS ship
LEFT OUTER JOIN sdSales AS sales ON sales.SalesNum = ship.SalesNum
AND sales.SalesType = ship.SalesType
WHERE sales.SalesNum = (?)
AND ship.SalesType = (?));
SQL Server Management Studio says I have an error in my syntax near ',' (appropriately useless error message). Does anybody have any idea whether this is even allowed in Microsoft's version of SQL? Is there perhaps another way to accomplish it? I've seen the multi-key IN questions answered on here, but never in the case where both sides require a JOIN.
Many databases do support IN on tuples. SQL Server is not one of them.
Use EXISTS instead:
SELECT spp.Price
FROM spZones spz LEFT OUTER JOIN
spPrices spp
ON spz.Zone = spp.Zone
WHERE EXISTS (SELECT 1
FROM sdShipping ship LEFT JOIN
sdSales sales
ON sales.SalesNum = ship.SalesNum AND
sales.SalesType = ship.SalesType
WHERE spp.Weight = ship.Weight AND spz.Zip = sales.Zip AND
sales.SalesNum = (?) AND
ship.SalesType = (?)
);

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)