Large Anti JOIN in SQL - sql

I am working with a training database and I am trying to find the best way to create a list of all the users and all the classes they HAVEN'T taken. I have two tables a courses master table of 150 classes and a registration table that is 175K. I can create an anti join for a single person:
SELECT *
FROM tblCourses C --150 rows
Left JOIN tblTraineeCourses TC -- 175K
ON C.CourseId = TC.CourseId AND TraineeId = 2283539
WHERE TC.CourseID IS NULL
And an Anti Join to see what classes have never been taken
SELECT *
FROM tblCourses C --150 rows
Left JOIN tblTraineeCourses TC
ON C.CourseId = TC.CourseId --175K
WHERE TC.CourseID IS NULL
but i am looking for a way to produce a list table of every User ID and the classes they haven't taken.
I fully realize that this will produce a massive table but I need to do it for an analysis

Use a cross join to generate all combinations. Then filter out the ones that exist:
SELECT T.TraineeId, C.CourseId
FROM tblTrainees t CROSs JOIN
tblCourses C LEFT JOIN
tblTraineeCourses TC
ON TC.CourseId = C.CourseId AND
TC.TraineeId = T.TraineeId
WHERE TC.CourseID IS NULL;
This assumes that you have a list of trainees, in tblTrainees. You could replace that with:
(SELECt DISTINCT TraineeId FROM tblTraineeCourses
) t
If you don't have such a table.

Related

Is it possible to have multiple joins between two tables in stored procedure?

I have two tables, "Booking" and "City". CityName field is primary key in City table and I have used it as foreign key for two columns "SourceCity" and "DestinationCity" in Booking table. I want to create a stored procedure to select all existing data from the Booking table for creating a view list, for which I have written the following.
SELECT [dbo].[Booking].[BookingID],
[dbo].[Booking].[CustomerName],
[dbo].[City].[CityName],
[dbo].[City].[CityName],
[dbo].[Booking].[StartingDate],
[dbo].[Booking].[EndingDate],
[dbo].[Car].[LicensePlateNumber],
[dbo].[Driver].[DriverName],
[dbo].[Booking].[AdvanceTaken],
[dbo].[Booking].[PendingPayment],
[dbo].[Booking].[TotalRent],
[dbo].[Booking].[BookingDate],
[dbo].[Booking].[IDProof]
FROM [dbo].[Booking]
**LEFT OUTER JOIN [dbo].[City]
ON [dbo].[Booking].[SourceCity] = [dbo].[City].[CityName]
AND [dbo].[Booking].[DestinationCity] = [dbo].[City].[CityName]**
LEFT OUTER JOIN [dbo].[Driver]
ON [dbo].[Driver].[DriverID] = [dbo].[Booking].[DriverAllotted]
LEFT OUTER JOIN [dbo].[Car]
ON [dbo].[Car].[CarID] = [dbo].[Booking].[CarAllotted]
ORDER BY [dbo].[Booking].[BookingID]
I am not sure if it is possible to do the following
LEFT OUTER JOIN [dbo].[City]
ON [dbo].[Booking].[SourceCity] = [dbo].[City].[CityName]
AND [dbo].[Booking].[DestinationCity] = [dbo].[City].[CityName]
I guess you need a different JOIN
FROM [dbo].[Booking] as booking
LEFT OUTER JOIN [dbo].[City] as source_city
ON booking.[SourceCity] = source_city.[CityName]
LEFT OUTER JOIN [dbo].[City] as destination_city
ON booking.[DestinationCity] = destination_city.[CityName]
....
Yes it is possible, you just need to use a different table alias. Beyond referencing the same table twice, table aliases can make your code look a lot cleaner, e.g.
SELECT b.CustomerName,
sc.CityName AS SourceCity,
dc.CityName AS DestinationCity,
b.StartingDate,
b.EndingDate,
c.LicensePlateNumber,
d.DriverName,
b.AdvanceTaken,
b.PendingPayment,
b.TotalRent,
b.BookingDate,
b.IDProof
FROM dbo.Booking AS b
LEFT OUTER JOIN dbo.City AS sc
ON sc.CityName= b.SourceCity
LEFT OUTER JOIN dbo.City AS dc -- Different Alias here
ON dc.CityName = b.DestinationCity
LEFT OUTER JOIN dbo.Driver AS d
ON d.DriverID = b.DriverAllotted
LEFT OUTER JOIN dbo.Car AS c
ON c.CarID = b.CarAllotted
ORDER BY
b.BookingID;
I appreciate that cleaner is somewhat subjective, but I would be astonished if anyone found this harder to read than your original query

Entity Framework with Multiple Joins and Subquery

I have a complex query with multiple left joins and subqueries which I need to implement in Entify Framework. I've received the monster SQL and
my goal is to do it on a elegant way with EF. The query consumes multiple tables and creates a "WITH" subquery on top which
is included in the joins later. I've done a first approach with EF but when I inspect the output that EF sends to the DB, inner joins are sent when
I am expecting LEFT JOINs.
A summary of the SQL follows:
WITH SUB_QUERY
AS ( SELECT FIELD_A,
FIELD_B,
FIELD_C,
MAX (FIELD_D) MAX_FIELD_D
FROM TABLE_X
WHERE SOME FIELD_A = 'WHATEVER'
GROUP BY FIELD_A, FIELD_B, FIELD_C)
SELECT C.FIELD_A,
C.FIELD_B,
B.FIELD_X,
D.FIELD_S,
E.FIELD_J,
F.FIELD_Y
FROM TABLE_A A
LEFT JOIN SUB_QUERY B
ON A.FIELD_C = B.FIELD_C
LEFT JOIN TABLE_C C
ON B.FIELD_A = C.FIELD_A
LEFT JOIN TABLE_D D
ON A.FIELD_C = D.FIELD_C
LEFT JOIN TABLE_E E
ON A.FIELD_X = E.FIELD_X
LEFT JOIN TABLE_F F
ON A.FIELD_W = F.FIELD_W
WHERE A.FIELD_H = D.FIELD_H
AND A.FIELD_D = B.MAX_FIELD_D
As you see, a subquery on top filters and groups some data to be consumed in a join below. Then all the joins take place
and some fields are taken from different tables as the output of the query.
Which approach would you recommend me to accomplish this task? I've tried different approaches and no one of them works (either retrieve nothing, or many more rows than the SQL query on the DB, etc..)
Please note that the Domain Model in Entity Framework is properly setup: Primary Keys, collections, nested objects etc.. so I believe some of these
joins are not even required because my EF entities contain already references to the child collections and parent objects (navigation properties).
Thanks a lot!!
if you really need a left join you should mode the where condition related to a left joined table in the proper on clause
FROM TABLE_A A
LEFT JOIN SUB_QUERY B
ON A.FIELD_C = B.FIELD_C
LEFT JOIN TABLE_C C
ON B.FIELD_A = C.FIELD_A
LEFT JOIN TABLE_D D
ON A.FIELD_C = D.FIELD_C AND A.FIELD_D = B.MAX_FIELD_D
LEFT JOIN TABLE_E E
ON A.FIELD_X = E.FIELD_X
LEFT JOIN TABLE_F F
ON A.FIELD_W = F.FIELD_W
the use of a left join table column in where force the relation to work as a INNER JOIN

How to Query across multiple tables

I have the following DB. Is this a proper query to get the first name of the authors loaned by Members ? or is there a more efficient way ?
select M.MemberNo, A.FirstName
from Person.Members as M
INNER JOIN Booking.Loans as L
on M.MemberNo = L.MemberNo
INNER JOIN Article.Copies as C
on L.ISBN = C.ISBN and L.CopyNo = C.CopyNo
INNER JOIN Article.Items as I
on C.ISBN = I.ISBN
INNER JOIN Article.Titles as T
on I.TitleID = T.TitleID
INNER JOIN Article.TitleAuthors as TA
on T.TitleID = TA.TitleID
INNER JOIN Article.Authors as A
on TA.AuthorID = A.AuthorID
;
go
The most direct way to get the names of authors of books that are lent out to members would be like this:
Get member loans (Booking.Loans)
Get books for member loans (Article.Items)
Get authors for the books for member loans (Article.TitleAuthors)
Get first names for authors for the books for member loans (Article.Authors)
select L.MemberNo, A.FirstName
FROM Booking.Loans as L
INNER JOIN Article.Items as I
on L.ISBN = I.ISBN
INNER JOIN Article.TitleAuthors as TA
on I.TitleID = TA.TitleID
INNER JOIN Article.Authors as A
on TA.AuthorID = A.AuthorID
The advantage to joining all the tables like you have your original query is to ensure that each intermediary table was INSERTed into or DELETEd from correctly.
For example, your original query wouldn't return any rows for a member if they had a loan (i.e. a row in the Loans table) but for some reason didn't have an entry in the Members table. If you can assume that (or don't care if) there will always be a row in the Members table if there is a row in the Loans table, then you can exclude the join to Person.Members.
You can apply this same logic to the other intermediary tables (Member, Copies, Title) too, if needed. If you need a specific piece of data from one of the these tables, though, then you would need to include them in the joins.

Outer Joining SQL Tables?

I have three table in the Database -
Activity table with activity_id, activity_type
Category table with category_id, category_name
Link table with mapping between activity_id and category_id
I need to write a select statement to get the following data:
activity_id, activity_type, Category_name.
The issue is some of the activity_id have no entry in the link table.
If I write:
select a.activity_id, a.activity_type, c.category_name
from activity a, category c, link l
where a.activity_id = l.activity_id and c.category_id = l.category_id
then I do not get the data for the activity_ids that are not present in the link table.
I need to get data for all the activities with empty or null value as category_name for those not having any linking for category_id.
Please help me with it.
PS. I am using MS SQL Server DB
I believe you're looking for a LEFT OUTER JOIN for your activity table to return all rows.
SELECT
a.activity_id, a.activity_type, c.category_name
FROM activity a
LEFT OUTER JOIN link l
ON a.activity_id = l.activity_id
LEFT OUTER JOIN category c
ON c.category_id = l.category_id;
You should use proper explicit joins:
select a.activity_id, a.activity_type, c.category_name
from activity a
LEFT JOIN link l
ON a.activity_id = l.activity_id
LEFT JOIN category c
ON l.category_id = c.category_id
If writing this type of logic will be part of your ongoing responsibilities, I would strongly suggest that you do some research on joins, including the interactions between joins and where clauses. Joins and where clauses combine to form the backbone of query writing, regardless of the technology used to retrieve the data.
Most critical join information to understand:
Left Outer Join: retrieves all information from the 'left' table and any records that exist in the joined table
Inner Join: retrieves only records that exist in both tables
Where clauses: used to limit data, regardless of inner or outer join definitions.
In the example you posted, the where clause is limiting your overall data to rows that exist in all 3 tables. Replacing the where clause with appropriate join logic will do the trick:
select a.activity_id, a.activity_type, c.category_name
from activity a
left outer join link l --return all activity rows regardless of whether the link exists
on a.activity_id = l.activity_id
left outer join category c --return all activity rows regardless of whether the link exists
on c.category_id = l.category_id
Best of luck!
What about?
select a.activity_id, a.activity_type, c.category_name from category c
left join link l on c.category_id = l.category_id
left join activity a on l.activity_id = a.activity_id
Actually, the first join seems that it could be an inner join, because you didn't mention that there might be some missing elements there

trying to inner join multiple tables but always returning 0 rows

I am trying to figure out why this query returns 0 rows as there is data in all 3 tables.
Here are my tables:
Table1: Applications
Columns: ID, Name
Table2: Resources
Columns: ID, Name
Table3: ApplicationResourceBridge
Columns: ID, app_id, resource_id
And Here is the query
SELECT Resources.name
, ApplicationResourceBridge.resource_id AS Expr3
FROM Resources
INNER JOIN Applications
ON Resources.id = Applications.id
INNER JOIN ApplicationsResources
ON Resources.id = ApplicationResourceBridge.resource_id
Your current query tries to match Resources.id with Applications.id, but they're different things. It should match Applications.id with ApplicationResources.app_id.
It's generally clearer to have the bridge table as the middle join, so it looks like a link. For example:
SELECT Resources.name
, ar.resource_id
FROM Resources r
INNER JOIN ApplicationsResources ar
ON r.id = ar.resource_id
INNER JOIN Applications a
ON a.id = ar.app_id
SELECT r.Name AS "Resource Name" , a.Name AS "Application Name"
FROM ApplicationResourceBridge as b
INNER JOIN Resources as r ON r.ID = b. resource_id
INNER JOIN Applications as a ON a.ID = b.app_id
UPDATE:
When constructing FROM ... JOIN ... JOIN ... follow "the path" do not "jump over".
Change this:-
Applications ON Resources.id = Applications.id INNER JOIN
to this:-
Applications ON Applications.id = ApplicationResourceBridge.app_id INNER JOIN
You were trying to join the Appliciation ID to the Resource ID but these have no relationship. What you really want is to join the Application table to the Bridge table and the Resource table to the Bridge table.
The fact that all three have data in them is immaterial. The question is whether there is data in "Resources" that has the same ID as rows in BOTH "Applications" and "ApplicationResourceBridge".
Update: don't want to post this as if I thought of it but Andomar makes a good point that you seem to be attempting to link the wrong tables fields. Thus, my answer might be right in the abstract but his might explain why you aren't getting field matches.