It seems to me that there are two scenarios in which to use JOINs:
When data would otherwise be duplicated
When data from one query would otherwise be used in another query
Are these scenarios right? Are there any other scenarios in which to use JOIN?
EDIT: I think I've miscommunicated. I understand how a JOIN works, what I'm not so sure about is when to use one.
JOINS are used to JOIN tables together with related information.
Tipical situations are where you have lets say
A user table where the user has specific security settings. The join would be used such that you can determine which settings the user has.
Users
-UserID
-UserName
UserSecurityRoles
-UserID
-SecurityRoleID
SecurityRoles
-SecurityRoleID
-SecurityRole
SELECT *
FROM Users u INNER JOIN
UserSecurityRoles usr ON u.UserID = usr.UserID INNER JOIN
SecurityRoles sr ON usr.SecurityRoleID = sr.SecurityRoleID
WHERE sr.SecurityRole = 'Admin'
LEFT joins will be used in the cases where you wish to retrieve all the data from the table in the left hand side, and only data from the right that match.
JOINS are used when you start normalizing your table structure. You can crete a table with 100s on columns, where a lot of the data could possibly be NULL, or you can normalize the tables, such that you avoid having too many columns with null values, where you group the appropriate data into table structures.
The answer to this Question has a VERY good link that has graphical display of using JOINs
JOINS are used to return data that is related in a relational database. Data can be related in 3 ways
One to Many relationship (A Person can have many Transactions)
Many to Many relationship (A Doctor can have many Patients, but a Patient can have more than one Doctor)
One to One relationship (One Person can have exactly one Passport number)
JOINS come in various flavours:
AN INNER JOIN will return data from both tables where the keys in each table match
A LEFT JOIN or RIGHT JOIN will return all the rows from one table and matching data from the other table
A CROSS JOIN will return the product of each table
You use joins when you need information from more than one table :)
Related
Well I am designing a domain model and data mapper system for my site. The system has different kind of user domain objects, with the base class for users with child classes for admins, mods and banned users. Every domain object uses data from a table called 'users', while the child classes have an additional table to store admin/mod/banned information. The user type is determined by a column in the table 'users' called 'userlevel', its value is 3 for admins, 2 for mods, 1 for registered users and -1 for banned users.
Now it comes a problem when I work on a members list feature for my site, since the members list is supposed to load all users from the database(with pagination, but lets not worry about this now). The issue is that I want to load the data from both the base user table and additional admin/mod/banned table. As you see, the registered users do not have additional table to store extra data, while for admin/mod/banned users the table is different. Moreover, the columns in these tables are also different.
So How am I supposed to handle this situation using SQL queries? I know I can simply just select from the base user table and then use multiple queries to load additional data if the user level is found to be a given value, but this is a bad idea since it will results in n+1 queries for n admins/mods/banned users, a very expensive trip to database. What else am I supposed to do? Please help.
If you want to query all usertypes with one query you will have to have the columns from all tables in your result-table, several of them filled with null-values.
To get them filled with data use a left-join like this:
SELECT *
FROM userdata u
LEFT OUTER JOIN admindata a
ON ( u.userid = a.userid
AND u.usertype = 3 )
LEFT OUTER JOIN moddata m
ON ( u.userid = m.userid
AND u.usertype = 2 )
LEFT OUTER JOIN banneddata b
ON ( u.userid = b.userid
AND u.usertype = -1 )
WHERE...
You could probably drop the usertype-condition, since there should only be data in one of the joined tables, but you never know...
Then your program-code will have the job to pick the correct columns based on the usertype.
P.S.: Not that select * is only for sake of simplicity, in real code better list all of the column-names...
While is totally fine having this hierarchy in your domain classes, I would suggest changing the approach in your database. Otherwise your queries are going to be very complex and slow.
You can have just another table called e.g. users_additional_info with the mix of the columns that you need for all your user types. Then you can do
SELECT * FROM users
LEFT JOIN users_additional_info ON users.id = users_additional_info.user_id
to get all the information in a single simple query. Believe me or not, this approach will save you a lots of headaches in the future when your tables start to grow or you decide to add another type of user.
I'm working on enhancing a query for a DB2 database and I'm having some problems getting acceptable performance due to the number of joins across large tables that need to be performed to get all of the data and I'm hoping that there's a SQL function or technique that can simplify and speed up the process.
To break it down, let's say there are two tables: People and Groups. Groups contain multiple people, and a person can be part of multiple groups. It's a many-to-many, but bear with me. Basically, there's a subquery that will return a set of groups. From that, I can join to People (which requires additional joins across other tables) to get all of the people from those groups. However, I also need to know all of the groups that those people are in, which means joining back to the Groups table again (several more joins) to get a superset of the original subquery. There are additional joins in the query as well to get other pieces of relevant data, and the cost is adding up in a very ugly way. I also need to return information from both tables, so that rules out a number of techniques.
What I'd like to do is be able to start with the People table, join it to Groups, and then compare Groups with the subquery. If the Groups attached to one person has one match in the subquery, it should return ALL Group items associated with that person.
Essentially, let's say that Bob is part of Group A, B, and C. Currently, I start with groups, and let's say that only Group A comes out of the subquery. Then I join A to Bob, but then I have to come back and join Bob to Group again to get B and C. SQL example:
SELECT p.*, g2.*
FROM GROUP g
JOIN LINKA link
ON link.GROUPID = g.GROUPID
JOIN LINKB link1
ON link1.LISTID = link.LISTID
JOIN PERSON p
ON link1.PERSONID = p.PERSONID
JOIN LINKB link2
ON link2.PERSONID = p.PERSONID
JOIN LINKA link3
ON link2.LISTID = link3.LISTID
JOIN GROUP g2
ON link3.GROUPID = g2.GROUPID
WHERE
g.GROUPID IN (subquery)
Yes, the linking tables aren't ideal, but they're basically normalized tables containing additional information that is not relevant to the query I'm running. We have to start with a filtered Group set, join to People, then come back to get all of the Groups that the People are associated to.
What I'd like to do is start with People, join to Group, and if ANY Group that Bob is in returns from the subquery, ALL should be returned, so if we have Bob joined to A, B, and C, and A is in the subquery, it will return three rows of Bob to A, B, and C as there was at least one match. In this way, it could be treated as a one-to-many relationship if we're only concerned with the Groups for each Person and not the other way around. SQL example:
SELECT p.*, g.*
FROM PEOPLE p
JOIN LINKB link
ON link.PERSONID = p.PERSONID
JOIN LINKA link1
ON link.LISTID = link1.LISTID
JOIN GROUP g
ON link1.GROUPID = g.GROUPID
WHERE
--SQL function, expression, or other method to return
--all groups for any person who is part of any group contained in the subquery
The number of joins in the first query make it largely unusable as these are some pretty big tables. The second would be far more ideal if this sort of thing is possible.
From the question, I think you are querying hierarchical data. DB2 provides facility to deal with such data. There are two clauses Start with and Connect by in DB2 which will be useful. They are explained here.
Although I'm using Rails, this question is more about database design. I have several entities in my database, with the schema a bit like this: http://fishwebby.posterous.com/40423840
If I want to get a list of people and order it by surname, that's no problem. However, if I want to get a list of people, ordered by surname, enrolled in a particular group, I have to use an SQL statement that includes several joins across four tables, something like this:
SELECT group_enrolment.*, person.*
FROM person INNER JOIN member ON person.id = member.person_id
INNER JOIN enrolment ON member.id = enrolment.member_id
INNER JOIN group_enrolment ON enrolment.id = group_enrolment.enrolment_id
WHERE group_enrolment.id = 123
ORDER BY person.surname;
Although this works, it strikes me as a bit inefficient, and potentially as my schema grows, these queries could get more and more complicated.
Another option could be to join the person table to all the other tables in the query by including person_id in the other tables, then it would just be one single join, for example
SELECT group_enrolment.*, person.*
FROM person INNER JOIN group_enrolment ON group_enrolment.person_id
WHERE group_enrolment.id = 123
ORDER BY person.surname;
But this would mean that in my schema, the person table is joined to a lot of other tables. Aside from a complicated schema diagram, does anyone see any disadvantages to this?
I'd be very grateful for any comments on this - whether what I'm doing now (the many table join) or the second solution or another one that hasn't occurred to me is the best way to go.
Many thanks in advance
Well, joins are what databases do. Having said that, you may consider propagating natural keys in your model, which would then allow you to skip over some tables in joins. Take a look at this example.
EDIT
I'm not saying that this will match your model (problem), but just for fun try similar queries on something like this:
I have a database with two tables
One with games
and one with participants
A game is able to have more participants and these are in a different table.
Is there a way to combine these two into one query?
Thanks
You can combine them using the JOIN operator.
Something like
SELECT *
FROM games g
INNER JOIN participants p ON p.gameid = g.gameid
Explanation on JOIN operators
INNER JOIN - Match rows between the two tables specified in the INNER
JOIN statement based on one or more
columns having matching data.
Preferably the join is based on
referential integrity enforcing the
relationship between the tables to
ensure data integrity.
o Just to add a little commentary to the basic definitions
above, in general the INNER JOIN
option is considered to be the most
common join needed in applications
and/or queries. Although that is the
case in some environments, it is
really dependent on the database
design, referential integrity and data
needed for the application. As such,
please take the time to understand the
data being requested then select the
proper join option.
o Although most join logic is based on matching values between
the two columns specified, it is
possible to also include logic using
greater than, less than, not equals,
etc.
LEFT OUTER JOIN - Based on the two tables specified in the join
clause, all data is returned from the
left table. On the right table, the
matching data is returned in addition
to NULL values where a record exists
in the left table, but not in the
right table.
o Another item to keep in mind is that the LEFT and RIGHT OUTER
JOIN logic is opposite of one another.
So you can change either the order of
the tables in the specific join
statement or change the JOIN from left
to right or vice versa and get the
same results.
RIGHT OUTER JOIN - Based on the two tables specified in the join
clause, all data is returned from the
right table. On the left table, the
matching data is returned in addition
to NULL values where a record exists
in the right table but not in the left
table.
Self -Join - In this circumstance, the same table is
specified twice with two different
aliases in order to match the data
within the same table.
CROSS JOIN - Based on the two tables specified in the join clause, a
Cartesian product is created if a
WHERE clause does filter the rows.
The size of the Cartesian product is
based on multiplying the number of
rows from the left table by the number
of rows in the right table. Please
heed caution when using a CROSS JOIN.
FULL JOIN - Based on the two tables specified in the join clause,
all data is returned from both tables
regardless of matching data.
example
table Game has columns (gameName, gameID)
table Participant has columns (participantID, participantName, gameID)
the GameID column is the "link" between the 2 tables. you need a common column you can join between 2 tables.
SELECT gameName, participantName
FROM Game g
JOIN Participat p ON g.gameID = p.gameID
This will return a data set of all games and the participants for those games.
The list of games will be redundant unless you structure it some other way due to multiple participants to that game.
sample data
WOW Bob
WOW Jake
StarCraft2 Neal
Warcraft3 James
Warcraft3 Rich
Diablo Chris
Is an outer join only used for analysis by the developer? I'm having trouble finding a use case for why you would want to include data in two or more tables that is unrelated or does not "match" your select criteria.
An example use case would be to produce a report that shows ALL customers and their purchases. That is, show even customers who have not purchased anything. If you do an ordinary join of customers and purchases, the report would show only those customers with at least one purchase.
Here is a very good list of examples by none other than Jeff Atwood - A Visual Explanation of SQL Joins he shows a new representation visually of the SQL joins.
The "doesn't match" case is useful for optional data. Something which is present for some rows, but not for others.
The OUTER JOIN is intended to address the cases where you wish to select a set of records from a primary table, which may or may not have related records contained in a secondary table.
An INNER join would omit from the list any primary records not also represented in the secondary table. Any OUTER JOIN guarantees the visiblity of all qualified primary records.
The difference between outer and inner joins is primarily that the outer joins retain each record from both the joined tables, whether the join predicate is matched or not. So a use case would need to revolve around knowing about "missing stuff", eg. customers who haven't made purchases, or members who have missing personal details.
Perhaps you have a users table and an address table related to it, since your users may choose not to enter an address, or they may have several (shipping, street, postal, etc). If you wanted to list all your users, but also display addresses for those that have them, you'd have to use an outer join for the address table.
Say you had two tables one for users and one for something like qualifications, you want to do a query that gets all users with their qualifications if they have any. In this case you would use an outer join to get all the users back and those with qualifications. An inner join would only give you the users who had a qualification.
A great article about outer joins:
Meet the experts: Terry Purcell on coding predicates in outer joins: A comparison of simple outer join constructs