Get all Many:Many relationships from reference/join-table - sql

I am having difficulty querying for possible relationships in a Many:Many scenario.
I present my schema:
What I do know how to query with this schema is:
All Bands that a given User belongs to.
All Users that belong to a given Band.
What I am trying to do is:
Get all Band Members across all Bands that a given User belongs to.
ie, say I am in 5 bands, I want to know who all of my bandmates are.
My first questions are:
Is there a name for this type of query? Where I am more interested in the joined relationships than what I am joined to (just saying that made me want to put this whole system into a Graph DB :/ )? I'd like to learn proper terminology to help me google for problems down the road.
Is this a terrible idea in RDBMS land in general? I feel like this should be a common use case but I want to know if I'm totally approaching this wrong.
To recap:
I am looking to query the above schema with the expected output being one row per User as Band Members that a given User shares a Band with.

Terminology
Your terminology seems to be correct - "many to many", often written as "many:many" with a colon. Sometimes the middle table (band_members) is called the "bridge table".
You can probably drop band_members.id, since the two foreign keys also make up a composite primary key (and the primary key can actually be defined that way, since normally a User cannot be a member of the same Band twice. The only exception to that is if a User could have more than one role in the same Band).
Solution
On the surface of it, this sounds easy - we can see the relationships of the tables, and one would normally just use an INNER JOIN between them. There are three tables, so that would be two joins.
However, we have to conceptualise the problem correctly first. The problem we have is that the join between Users and Band Members (user ID) is actually to be used for two things:
which User is in what Band
filtering by User
So to do this we need to introduce one table with multiple purposes:
SELECT
Users.first_name, Users.last_name
FROM Users
INNER JOIN Band_Members Band_Members1 ON (Band_Members1.user_id = Users.Id)
INNER JOIN Band_Members Band_Members2 ON (Band_Members1.band_id = Band_Members2.band_id)
WHERE
Band_Members2.user_id = 1
You can see here that I have joined Band_Members twice, and when one does that, one has to alias them differently, so they can be separately referenced. The first instance does the obvious join between the Users table and the bridge table, and the second one does a link between "Users who are in Bands" and "Bands that I am in".
Of course, this solution requires that you know your User ID. If you had wanted to do a similar query but filter based on your name, then you would have to join to another (re-aliased) copy of the User table, so that you can differentiate between the two different purposes: "Users who are in bands" and "your User".

Related

SQL - how to get out chained data?

I have 4 tables which were auto generated for me:
User
Challenge
Exercise
Challenge_Exercise
One User may have many Challenges, and one Challenge will have many Exercises.
What I noticed is that the Challenge table has a reference to it's parent User (called user_id) but Exercise do not have a reference in it's table to Challenge; their relation is stored in Challenge_Exercise as Challenge_id and exercise_id.
My question is, how would I take out every Exercise that is linked to a specific user? For instance User with id = 1?
SELECT *
FROM excerise,
challenge_excerise,
challenge
WHERE challenge.user_id = 1
AND challenge_excerise.challenge_id = challenge.id
AND challenge_excerise.exercise_id = excercise.id
What I'm doing here is a join, you could also explicitly do it with inner joins (google it if you wanna know more).
This table is needed because you have a many to many relationship, which means each challenge can have multiple exercises, but also each exercise can have multiple challenges. It's a standard to make an extra table then, so you don't have redundant data, this table is often called junction table.
If you want background just google it, there are tons of data to this topic.

How to best explain on what fields should a user join on?

I need to explain to somebody how they can determine what fields from multiple tables/views they should join on. Any suggestions? I know how to do it but am having difficulty trying to explain it.
One of the issues they have is they will take two fields from two tables that are the same (zip code) and join on those, when in reality they should be joining on ID columns. When they choose the wrong column to join on it increases records they receive in return.
Should I work in PK and FK somewhere?
While it is indeed typical to join a PK to an FK any conversation about JOIN clauses that only revolve around PK's and FK's is fairly limited
For example I had this FROM clause in a recent SQL answer I gave
FROM
YourTable firstNames
LEFT JOIN YourTable lastNames
ON firstnames.Name = lastNames.Name
AND lastNames.NameType =2
and firstnames.FrequencyPercent < lastNames.FrequencyPercent
The table referenced on each side of the table is the same table (a self join) and it includes three condidtions one of which is an inequality. Furthermore there would never be an FK here because its looking to join on a field, that is by design, not a Candidate Key.
Also you don't have even have to join one table to another. You can join inline queries to each other which of course can't possibly have a Key.
So in order to properly understand JOIN you just need to understand that it combines the records from two relations (tables, views, inline queries) where some conditions evaluate to true. This means you need to understand boolean logic and the database and the data in the database.
If your user is having a problem with a specific JOIN ask them to SELECT some rows from one table and also the other and then ask them under what conditions would you want to combine the rows.
You don't need to talk in terms of a primary key of a table but you should point to it and explain that it uniquely identifies a given row and that you must join to related tables using it or you could get duplicated results.
Give them examples of joining with it and joining without it.
An ER diagram showing all of the tables they use and their key relationships would help ensure that they always use the correct keys.
It sounds to me like neither you, nor the person you are trying to help understands how this particular database is constructed and perhaps don't really even understand basic database fundamentals, like PK's and FK's. Most often a PK from one table is joined to a FK to another table.
Assuming the database has the proper PK's and FK's in place, it would probably help a great deal to generate an ER diagram. That would make the joining concept much easier to grasp.
Another approach you could take is to find someone who does understand these things and create some views for this person to use. This way he doesn't need to understand how to join the tables together.
A user shouldn't typically be doing joins. A user should have an interface that lets them get the data that they need in the way that they need it. If you don't have the developer resources to do that then you're going to be stuck with this problem of having to teach a user technical details. You also need to be very careful about what kind of damage the user can do. Do they have update rights on the data? I hope they don't accidentally do a DELETE FROM Table with no WHERE clause. Even if you restrict their permissions, a poorly written query can crush the database server or block resources causing problems for other users (and more work for you).
If you have no choice, then I think that you need to certainly teach them about primary and foreign keys, even if you don't call them that. Point out that the id on your table (or whatever your PK is) identifies a row. Then explain how the id appears in other tables to show the relationship. For example, "See, in the address table we have a person_id which tells us who that address belongs to."
After that, expect to spend a large portion of your time with that user as they make mistakes or come up with other things that they want to get from the database, but which they can't figure out how to get.
From theory, and ideally, you should define primary keys on all tables, and join tables using a primary key to the matching field or fields (foreign key) in the other table.
Even if you don't define or if they're not defined as primary keys, you need to make sure the fields uniquely identify the records in the table, and that they should be properly indexed.
For example, let's say the 'person' table has a SSN and a driver's license field. The SSN could be considered and flagged as the 'primary key', but if you join that table to a 'drivers' table which might not have the SSN, but does have the driver's license #, you could join them by the driver's license field (even if it's not flagged as primary key), but you need to make sure that the field is properly indexed in both tables.
...explain to somebody how they can determine what fields from multiple tables/views they should join on.
Simply put, look for the columns with values that match between the tables/views. Preferably, match exactly but some massaging might be necessary.
The existence of foreign key constraints would help to know what matches to what, but the constraint might not be directly to the table/view that is to be joined.
The existence of a primary key doesn't mean it is the criteria that is necessary for the query, so I would overlook this detail (depending on the audience).
I would recommend attacking the desired result set by starting with the columns desired, and working back from there. If there's more than one table's columns in the result set, focus on the table whose columns should be returning distinct results first and then gradually add joins, checking the result set between each JOIN addition to confirm the results are still the same. Otherwise, need to review the JOIN or if a JOIN is actually necessary vs IN or EXISTS.
I did this when I first started out, it comes from thinking of joins as just linking tables together, so I linked at all possible points.
Once you think of joins as a way to combine AND filter the data it becomes easier to understand them.
Writing out your request as a sentence is helpful too, "I want to see all the times Table A interacted with Table B". Then build a query from that using only the ID, noting that if you wanted to know "All the times Table A was in the same zip code as Table B" then you would join by zip code.

Relational Database Design (MySQL)

I am starting a new Project for a website based on "Talents" - for example:
Models
Actors
Singers
Dancers
Musicians
The way I propose to do this is that each of these talents will have its own table and include a user_id field to map the record to a specific user.
Any user who signs up on the website can create a profile for one or more of these talents. A talent can have sub-talents, for example an actor can be a tv actor or a theatre actor or a voiceover actor.
So for example I have User A - he is a Model (Catwalk Model) and an Actor (TV actor, Theatre actor, Voiceover actor).
My questions are:
Do I need to create separate tables to store sub-talents of this user?
How should I perform the lookups of the top-level talents for this user? I.e. in the user table should there be fields for the ID of each talent? Or should I perform a lookup in each top-level talent table to see if that user_id exists in there?
Anything else I should be aware of?
before answering your questions... i think that user_id should not be in the Talents table... the main idea here is that "for 1 talent you have many users, and for one user you have multiple talent".. so the relation should be NxN, you'll need an intermediary table
see: many to many
now
Do I need to create seperate tables to store sub-talents of this
user?
if you want to do something dynamic (add or remove subtalents) you can use a recursive relationship. That is a table that is related to itself
TABLE TALENT
-------------
id PK
label
parent_id PK FK (a foreign key to table Talent)
see : recursive associations
How should I perform the lookups of the top-level talents for this
user? I.e. in the user table should
there be fields for the ID of each
talent? Or should I perform a lookup
in each top-level talent table to see
if that user_id exists in there?
if you're using the model before, it could be a nightmare to make queries, because your table Talents is now a TREE that can contain multiple levels.. you might want to restrict yourself to a certain number of levels that you want in your Talent's table i guess two is enough.. that way your queries will be easier
Anything else I should be aware of?
when using recursive relations... the foreign key should alow nulls because the top levels talents wont have a parent_id...
Good luck! :)
EDIT: ok.. i've created the model.. to explain it better
Edit Second model (in the shape of a Christmas tree =D ) Note that the relation between Model & Talent and Actor & Talent is a 1x1 relation, there are different ways to do that (the same link on the comments)
to find if user has talents.. join the three tables on the query =)
hope this helps
You should have one table that has everything about the user (name, dob, any other information about the user). You should have one table that has everything about talents (id, talentName, TopLevelTalentID (to store the "sub" talents put a reference to the "Parent" talent)). You should have a third table for the many to many relationship between users and talents: UserTalents which stores the UserID and the TalentID.
Here's an article that explains how to get to 3rd NF:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx
This is a good question to show some of the differences and similarities between object oriented thinking and relational modelling.
First of all there are no strict rules regarding creating the tables, it depends on the problem space you are trying to model (however, having a field for each of the tables is not necessary at all and constitutes a design fault - mainly because it is inflexible and hard to query).
For example perfectly acceptable design in this case is to have tables
Names (Name, Email, Bio)
Talents (TalentType references TalentTypes, Email references Names)
TalentTypes (TalentType, Description, Parent references TalentTypes)
The above design would allow you to have hierarchical TalentTypes and also to keep track which names have which talents, you would have a single table from which you could get all names (to avoid registering duplicates), you have a single table from which you could get a list of talents and you can add new talent types and/or subtypes easily.
If you really need to store some special fileds on each of the talent types you can still add these as tables that reference general talents table.
As an illustration
Models (Email references Talents, ModelingSalary) -- with a check constraint that talents contain a record with modelling talent type
Do notice that this is only an illustration, it might be sensible to have Salary in the Talents table and not to have tables for specific talents.
If you do end up with tables for specific talents in a sense you can look at Talents table as sort of a class from which a particular talent or sub-talent inherits properties.
ok sorry for the incorrect answer.. this is a different approach.
The way i see it, a user can have multiple occupations (Actor, Model, Musician, etc.) Usually what i do is think in objects first then translate it into tables. In P.O.O. you'd have a class User and subclasses Actor, Model, etc. each one of them could also have subclasses like TvActor, VoiceOverActor... in a DB you'd have a table for each talent and subtalent, all of them share the same primary key (the id of the user) so if the user 4 is and Actor and a Model, you would have one registry on the Actor's Table and another on the Model Table, both with id=4
As you can see, storing is easy.. the complicated part is to retrieve the info. That's because databases dont have the notion of inheritance (i think mysql has but i haven't tried it).. so if you want to now the subclases of the user 4, i see three options:
multiple SELECTs for each talent and subtalent table that you have, asking if their id is 4.
SELECT * FROM Actor WHERE id=4;SELECT * FROM TvActor WHERE id=4;
Make a big query joining all talent and subtalent table on a left join
SELECT * from User LEFT JOIN Actor ON User.id=Actor.id LEFT JOIN TvActor ON User.id=TvActor.id LEFT JOIN... WHERE User.id=4;
create a Talents table in a NxN relation with User to store a reference of each talent and subtalents that the User has, so you wont have to ask all of the tables. You'd have to make a query on the Talents table to find out what tables you'll need to ask on a second query.
Each one of these three options have their pros and cons.. maybe there's another one =)
Good Luck
PS: ahh i found another option here or maybe it's just the second option improved

Adding new fields vs creating separate table

I am working on a project where there are several types of users (students and teachers). Currently to store the user's information, two tables are used. The users table stores the information that all users have in common. The teachers table stores information that only teachers have with a foreign key relating it to the users table.
users table
id
name
email
34 other fields
teachers table
id
user_id
subject
17 other fields
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user use users.id. Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
e.g.
users
id
name
email
subject
51 other fields
Is this too many fields for one table? Will this impede performance?
I think this design is fine, assuming that most of the time you only need the user data, and that you know when you need to show the teacher-specific fields.
In addition, you get only teachers just by doing a JOIN, which might come in handy.
Tomorrow you might have another kind of user who is not a teacher, and you'll be glad of the separation.
Edited to add: yes, this is an inheritance pattern, but since he didn't say what language he was using I didn't want to muddy the waters...
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user
use users.id.
I would expect relating to the teacher_id for classes/sections...
Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
Are you modelling a system for a high school, or post-secondary? Reason I ask is because in post-secondary, a user can be both a teacher and a student... in numerous subjects.
I would think it fine provided neither you or anyone else succumbs to the temptation to reuse 'empty' columns for other purposes.
By this I mean, there will in your new table be columns that are only populated for teachers. Someone may decide that there is another value they need to store for non-teachers, and use one of the teacher's columns to hold it, because after all it'll never be needed for this non-teacher, and that way we don't need to change the table, and pretty soon your code fills up with things testing row types to find what each column holds.
I've seen this done on several systems (for instance, when loaning a library book, if the loan is a long loan the due date holds the date the book is expected back. but if it's a short loan the due date holds the time it's expected back, and woe betide anyone who doesn't somehow know that).
It's not too many fields for one table (although without any details it does seem kind of suspicious). And worrying about performance at this stage is premature.
You're probably dealing with very few rows and a very small amount of data. You concerns should be 1) getting the job done 2) designing it correctly 3) performance, in that order.
It's really not that big of a deal (at this stage/scale).
I would not stuff all fields in one table. Student to teacher ratio is high, so for 100 teachers there may be 10000 students with NULLs in those 17 fields.
Usually, a model would look close to this:
I your case, there are no specific fields for students, so you can omit the Student table, so the model would look like this
Note that for inheritance modeling, the Teacher table has UserID, same as the User table; contrast that to your example which has an Id for the Teacher table and then a separate user_id.
it won't really hurt the performance, but the other programmers might hurt you if you won't redisign it :) (55 fielded tables ??)

What to do if 2 (or more) relationship tables would have the same name?

So I know the convention for naming M-M relationship tables in SQL is to have something like so:
For tables User and Data the relationship table would be called
UserData
User_Data
or something similar (from here)
What happens then if you need to have multiple relationships between User and Data, representing each in its own table? I have a site I'm working on where I have two primary items and multiple independent M-M relationships between them. I know I could just use a single relationship table and have a field which determines the relationship type, but I'm not sure whether this is a good solution. Assuming I don't go that route, what naming convention should I follow to work around my original problem?
To make it more clear, say my site is an auction site (it isn't but the principle is similar). I have registered users and I have items, a user does not have to be registered to post an item but they do need to be to do anything else. I have table User which has info on registered users and Items which has info on posted items. Now a user can bid on an item, but they can also report a item (spam, etc.), both of these are M-M relationships. All that happens when either event occurs is that an email is generated, in my scenario I have no reason to keep track of the actual "report" or "bid" other than to know who bid/reported on what.
I think you should name tables after their function. Lets say we have Cars and People tables. Car has owners and car has assigned drivers. Driver can have more than one car. One of the tables you could call CarsDrivers, second CarsOwners.
EDIT
In your situation I think you should have two tables: AuctionsBids and AuctionsReports. I believe that report requires additional dictinary (spam, illegal item,...) and bid requires other parameters like price, bid date. So having two tables is justified. You will propably be more often accessing bids than reports. Sending email will be slightly more complicated then when this data is stored in one table, but it is not really a big problem.
I don't really see this as a true M-M mapping table. Those usually are JUST a mapping. From your example most of these will have additional information as well. For example, a table of bids, which would have a User and an Item, will probably have info on what the bid was, when it was placed, etc. I would call this table... wait for it... Bids.
For reporting items you might want what was offensive about it, when it was placed, etc. Call this table OffenseReports or something.
You can name tables whatever you want. I would just name them something that makes sense. I think the convention of naming them Table1Table2 is just because sometimes the relationships don't make alot of sense to an outside observer.
There's no official or unofficial convention on relations or tables names. You can name them as you want, the way you like.
If you have multiple user_data relationships with the same keys that makes absolutely no sense. If you have different keys, name the relation in a descriptive way like: stores_products_manufacturers or stores_products_paymentMethods
I think you're only confused because the join tables are currently simple. Once you add more information, I think it will be obvious that you should append a functional suffix. For example:
Table User
UserID
EmailAddress
Table Item
ItemID
ItemDescription
Table UserItem_SpamReport
UserID
ItemID
ReportDate
Table UserItem_Post
UserID -- can be (NULL, -1, '', ...)
ItemID
PostDate
Table UserItem_Bid
UserId
ItemId
BidDate
BidAmount
Then the relation will have a Role. For instance a stock has 2 companies associated: an issuer and a buyer. The relationship is defined by the role the parent and child play to each other.
You could either put each role in a separate table that you name with the role (IE Stock_Issuer, Stock_Buyer etc, both have a relationship one - many to company - stock)
The stock example is pretty fixed, so two tables would be fine. When there are multiple types of relations possible and you can't foresee them now, normalizing it into a relationtype column would seem the better option.
This also depends on the quality of the developers having to work with your model. The column approach is a bit more abstract... but if they don't get it maybe they'd better stay away from databases altogether..
Both will work fine I guess.
Good luck, GJ
GJ