Recommended Table Set up for one to many/many to one situation - sql

I need to create a script where someone will post an opening for a position, and anyone who is eligible will see the opening but anyone who is not (or opts out) will not see the opening. So two people could go to the same page and see different content, some potentially the same, some totally unique. I'm not sure the best way to arrange that data in a MySQL DB/table.
For instance, I could have it arranged by the posting, but that would look sort of like:
PostID VisibleTo
PostingA user1,user2
And that seems wrong (the CSV style in the column). Or I could go with by person:
User VisiblePosts
user1 posting1, posting2
But it's the same problem. Is there a way to make the user's unique, the posting unique, and have them join only where they match?
The decision is initially made by doing a series of queries to another set of tables, but once that is run, it seems inefficient to have that some chunk of code run again and again when it won't change after the user posts the position.
...On second thought, it MIGHT change, but if we assume it doesn't (as it is unlikely, and as little consequence if a user sees something that they are no longer eligible for), is there a standard solution for this scenario?

Three tables...
User:
[UserId]
[OtherField]
Post:
[PostId]
[OtherFields]
UserPost:
[UserId]
[PostId]
User.UserId Joins to UserPost.UserId,
Post.PostId Joins to UserPost.PostId
Then look up the table UserPost, joining to Post when you are selecting which posts to show

This is a many-to-many relationship or n:m relationship.
You would create an additional table, say PostVisibility, with a column PostID and UserID. If a combination of PostID and UserID is present in the table, that post is visible to that user.

Edit: Sorry, I think you are speaking in Posting-User terms, which is many-to-many. I was thinking of this in terms of posting-"viewing rights" terms, which is one-to-many.
Unless I am missing something, this is a one-to-many situation, which requires two tables. E.g., each posting has n users who can view it. Postings are unique to an individual user, so you don't need to do the reverse.
PostingTable with PostingID (and other data)
PostingVisibilityTable with PostingID and UserID
UserTable with UserID and user data
Create the postings independently of their visibility rights, and then separately add/remove PostingID/UserID pairs against the Visibility table.
To select all postings visible to the current user:
SELECT * FROM PostingTable A INNER JOIN PostingVisibilityTable B ON A.PostingID = B.PostingID WHERE B.UserID = "currentUserID"

Related

Get all Many:Many relationships from reference/join-table

I am having difficulty querying for possible relationships in a Many:Many scenario.
I present my schema:
What I do know how to query with this schema is:
All Bands that a given User belongs to.
All Users that belong to a given Band.
What I am trying to do is:
Get all Band Members across all Bands that a given User belongs to.
ie, say I am in 5 bands, I want to know who all of my bandmates are.
My first questions are:
Is there a name for this type of query? Where I am more interested in the joined relationships than what I am joined to (just saying that made me want to put this whole system into a Graph DB :/ )? I'd like to learn proper terminology to help me google for problems down the road.
Is this a terrible idea in RDBMS land in general? I feel like this should be a common use case but I want to know if I'm totally approaching this wrong.
To recap:
I am looking to query the above schema with the expected output being one row per User as Band Members that a given User shares a Band with.
Terminology
Your terminology seems to be correct - "many to many", often written as "many:many" with a colon. Sometimes the middle table (band_members) is called the "bridge table".
You can probably drop band_members.id, since the two foreign keys also make up a composite primary key (and the primary key can actually be defined that way, since normally a User cannot be a member of the same Band twice. The only exception to that is if a User could have more than one role in the same Band).
Solution
On the surface of it, this sounds easy - we can see the relationships of the tables, and one would normally just use an INNER JOIN between them. There are three tables, so that would be two joins.
However, we have to conceptualise the problem correctly first. The problem we have is that the join between Users and Band Members (user ID) is actually to be used for two things:
which User is in what Band
filtering by User
So to do this we need to introduce one table with multiple purposes:
SELECT
Users.first_name, Users.last_name
FROM Users
INNER JOIN Band_Members Band_Members1 ON (Band_Members1.user_id = Users.Id)
INNER JOIN Band_Members Band_Members2 ON (Band_Members1.band_id = Band_Members2.band_id)
WHERE
Band_Members2.user_id = 1
You can see here that I have joined Band_Members twice, and when one does that, one has to alias them differently, so they can be separately referenced. The first instance does the obvious join between the Users table and the bridge table, and the second one does a link between "Users who are in Bands" and "Bands that I am in".
Of course, this solution requires that you know your User ID. If you had wanted to do a similar query but filter based on your name, then you would have to join to another (re-aliased) copy of the User table, so that you can differentiate between the two different purposes: "Users who are in bands" and "your User".

Proper Design For SQL table connecting to many other tables

I have an app where users can log comments related to any specific entity that the system has and I wanted to know if there is a "best practice" way to handle the design of the db for this kind of feature.
For example: Three current entities (tables) deal are Question, Documentation, ReferenceMaterial (self explanatory what each hold). The user can leave a comment on any one of those particular items and all comments are simply a varchar field, user id, and date of comment. EDIT: Comments can also belong to more than one entity. For example, all Question entities belong to a Quiz or Test entity. Each of those (Quiz and Test), can also have comments associated with themselves. So you could run a report to see all comments left for a test and easily just query the Comment table for every record with that test foreign key, or you could limit your query to just the comments left for questions in that test, or a particular question itself. It offered a lot of flexibility END EDIT
Right now the way that I hvae this is one Comment table with a foreign key relationship with each of the other entity tables (i.e. fkQuestion, fkDocumentation, fkReferenceMaterial, etc). So all comments in the system are stored in this table and based on what page the user is on, I conduct the join to that particular entity's records.
Is there a best practice way of doing this?
Thanks in advance for any help.

Link table(s) or redundant columns, SQL optimisation

I have two tables, Users and People, both of which share a common attribute, email address, of which they should be allowed to have many email addresses.
I can see three options myself:
One link table with redundant columns:
Users [id,email_id] and People [id,email_id]
EmailAddress [id,user_id,person_id,email_id]
Emails [id,address,type]
Two link tables without redundancies:
Users [id,email_id] and People [id,email_id]
PersonEmail [id,person_id,email_id]
UserEmail [id,user_id,email_id]
Emails [id,address,type]
No link tables with redundant columns:
Users [id] and People [id]
Emails [id,address,type,user_id,person_id]
Does anyone have any idea what would be the best option, or if there is any other ways? Also, if anyone knows how to implement or feel it is better to have link tables without the generated id column please also specify.
Update: a User has many People, a person belongs to a User
First off, the relationship between user and e-mail is 1:N, not M:N, so in any case you don't need the "link" table EmailAddress.
You need to decide which of these possibilities is true for your application:
User is always person.
Person is always user.
There can be a person that is not user and there can be a user that is not person.
Option 1:
Assuming the option (1) is the correct one, the logical model should look like this:
The symbol between Person and User is "category", which at the level of the physical database can be implemented either:
as a "1 to 0 or 1" relationship between separate tables Person and User,
or a single table containing both person and user fields, where user fields are NULL for persons that are not also users.
If you have...
many user-specific fields,
there are user-specific foreign keys,
new kinds of persons could be added in the future
and you don't need to squeeze-out every last drop of performance,
...choose the implementation strategy with two tables.
If there are:
relatively few user-specific fields,
there are no user-specific relationships,
low "evolvability" is acceptable
and performance is of high importance,
...choose the implementation strategy with the single table.
Similar analysis can be done for each of the remaining possibilities...
Option 2:
Option 3:
If the two entities are conceptually related, then it might make sense to have one table. But if they are two different concepts, then in my experience it is best to have separate tables in order to avoid future confusion. And you're not going to take a big hit anywhere by doing so.
Isn't the User a Person (People)?
That would solve the redundant field issue right away.
----------
| Person |
----------
|
--------
| User |
--------
The User should have the single e-mail field, or mantain the relation with the e-mails table, since Person is an abstract concept not related to any application.
I would say start thinking about (re)modeling your schema, so you won't have problems like this.
Read the Multiple Table Inheritance in Rails guide, that should get you started.

What to do if 2 (or more) relationship tables would have the same name?

So I know the convention for naming M-M relationship tables in SQL is to have something like so:
For tables User and Data the relationship table would be called
UserData
User_Data
or something similar (from here)
What happens then if you need to have multiple relationships between User and Data, representing each in its own table? I have a site I'm working on where I have two primary items and multiple independent M-M relationships between them. I know I could just use a single relationship table and have a field which determines the relationship type, but I'm not sure whether this is a good solution. Assuming I don't go that route, what naming convention should I follow to work around my original problem?
To make it more clear, say my site is an auction site (it isn't but the principle is similar). I have registered users and I have items, a user does not have to be registered to post an item but they do need to be to do anything else. I have table User which has info on registered users and Items which has info on posted items. Now a user can bid on an item, but they can also report a item (spam, etc.), both of these are M-M relationships. All that happens when either event occurs is that an email is generated, in my scenario I have no reason to keep track of the actual "report" or "bid" other than to know who bid/reported on what.
I think you should name tables after their function. Lets say we have Cars and People tables. Car has owners and car has assigned drivers. Driver can have more than one car. One of the tables you could call CarsDrivers, second CarsOwners.
EDIT
In your situation I think you should have two tables: AuctionsBids and AuctionsReports. I believe that report requires additional dictinary (spam, illegal item,...) and bid requires other parameters like price, bid date. So having two tables is justified. You will propably be more often accessing bids than reports. Sending email will be slightly more complicated then when this data is stored in one table, but it is not really a big problem.
I don't really see this as a true M-M mapping table. Those usually are JUST a mapping. From your example most of these will have additional information as well. For example, a table of bids, which would have a User and an Item, will probably have info on what the bid was, when it was placed, etc. I would call this table... wait for it... Bids.
For reporting items you might want what was offensive about it, when it was placed, etc. Call this table OffenseReports or something.
You can name tables whatever you want. I would just name them something that makes sense. I think the convention of naming them Table1Table2 is just because sometimes the relationships don't make alot of sense to an outside observer.
There's no official or unofficial convention on relations or tables names. You can name them as you want, the way you like.
If you have multiple user_data relationships with the same keys that makes absolutely no sense. If you have different keys, name the relation in a descriptive way like: stores_products_manufacturers or stores_products_paymentMethods
I think you're only confused because the join tables are currently simple. Once you add more information, I think it will be obvious that you should append a functional suffix. For example:
Table User
UserID
EmailAddress
Table Item
ItemID
ItemDescription
Table UserItem_SpamReport
UserID
ItemID
ReportDate
Table UserItem_Post
UserID -- can be (NULL, -1, '', ...)
ItemID
PostDate
Table UserItem_Bid
UserId
ItemId
BidDate
BidAmount
Then the relation will have a Role. For instance a stock has 2 companies associated: an issuer and a buyer. The relationship is defined by the role the parent and child play to each other.
You could either put each role in a separate table that you name with the role (IE Stock_Issuer, Stock_Buyer etc, both have a relationship one - many to company - stock)
The stock example is pretty fixed, so two tables would be fine. When there are multiple types of relations possible and you can't foresee them now, normalizing it into a relationtype column would seem the better option.
This also depends on the quality of the developers having to work with your model. The column approach is a bit more abstract... but if they don't get it maybe they'd better stay away from databases altogether..
Both will work fine I guess.
Good luck, GJ
GJ

What is the best way to store a threaded message list/tree in SQL?

I'm looking for the best way to store a set of "posts" as well as comments on those posts in SQL. Imagine a design similar to a "Wall" on Facebook where users can write posts on their wall and other users can comment on those posts. I need to be able to display all wall posts as well as the comments.
When I first started out, I came up with a table such as:
CREATE Table wallposts
(
id uuid NOT NULL,
posted timestamp NOT NULL,
userid uuid NOT NULL,
posterid uuid NOT NULL,
parentid uuid NOT NULL,
comment text NOT NULL
)
id is unique, parentid will be null on original posts and point to an id if the row is a comment on an existing post. Easy enough and super fast to insert new data. However, doing a select which would return me:
POST 1
COMMENT 1
COMMENT 2
POST 2
COMMENT 1
COMMENT 2
Regardless of which order the rows existed in the database proved to be extremely difficult. I obviously can't just order by date, as someone might comment on post 1 after post 2 has been posted. If I do a LEFT JOIN to get the parent post on all rows, and then sort by that date first, all the original posts group together as they'd have a value of null.
Then I got this idea:
CREATE TABLE wallposts
(
id uuid NOT NULL,
threadposted timestamp,
posted timestamp,
...
comment text
)
On an original post, threadposted and posted would be the same. On a comment, timestamp would be the time the original post was posted and "posted" would be the time the comment on that thread was posted. Now I can just do:
select * from wallposts order by threadposted, posted;
This works great, however one thing irks me. If two people create a post at the same time, comments on the two posts would get munged together as they'd have the same timestamp. I could use "ticks" instead of a datetime, but still the accuracy is only 1/1000 of a second. I could also setup a unique constraint on threadposted and posted which makes inserts a bit more expensive, but if I had multiple database servers in a farm, the chance of a collision is still there. I almost went ahead with this anyway since the chances of this happening are extremely small, but I wanted to see if I could eat my cake and still have it too. Mostly for my own educational curiosity.
Third solution would be to store this data in the form of a graph. Each node would have a v-left and v-right pointer. I could order by "left" which would traverse the tree in the order I need. However, every time someone inserts a comment I'd have to re balance the whole tree. This would create a ton of row locking, and all sorts of problems if the site was very busy. Plus, it's kinda extreme and also causes replication problems. So I tossed this idea quickly.
I also thought about just storing the original posts and then serializing the comments in a binary form, since who cares about individual comments. This would be very fast, however if a user wants to delete their comment or append a new comment to the end, I have to deserialize this data, modify the structure, then serialize it back and update the row. If a bunch of people are commenting on the same post at the same time, I might have random issues with that.
So here's what I eventually did. I query for all the posts ordered by date entered. In the middle ware layer, I loop through the recordset and create a "stack" of original posts, each node on the stack points to a linked list of comments. When I come across an original post, I push a new node on the stack and when I come across a comment I add a node to the linked list. I organize this in memory so I can traverse the recordset once and have O(n). After I create the in-memory representation of the wall, I traverse through this data structure again and write out HTML. This works great and has super fast inserts and super fast selects, and no weird row locking issues; however it's a bit heavier on my presentation layer and requires me to build an in memory representation of the user's wall to move stuff around so it's in the right order. Still, I believe this is the best approach I've found so far.
I thought I'd check with other SQL experts to see if there's a better way to do this using some weird JOINS or UNIONS or something which would still be performant with millions of users.
I think you're better off using a simpler model with a "ParentID" on Comment to allow for nesting comments. I don't think it's usually a good practice to use datetimes as keys, especially in this case, where you don't really need to, and an identity ID will be sufficient. Here's a basic example that might work:
Post
----
ID (PK)
Timestamp
UserID (FK)
Text
Comment
-------
ID (PK)
Timestamp
PostID (FK)
ParentCommentID (FK nullable) -- allows for nested comments
Text
Do you want people to be able to comment on other comments, i.e. does the tree have infinite depth?
If you just want to have posts and then comments on those posts then you were on the right lines to start with and I believe the following SQL would meet that requirement (Untested so may be typos)
SELECT posts.id,
posts.posted AS posted_at,
posts.userid AS posted_by,
posts.posterid,
posts.comment AS post_text,
comments.posted AS commented_at,
comments.userid AS commented_by,
comments.comment AS comment_text
FROM wallposts AS posts
LEFT OUTER JOIN wallposts AS comments ON comments.parent_id = posts.id
ORDER BY posts.posted, comments.posted
This technique, a self-join, simply joins the table to itself using table aliases to specify the joins.
You should look into "nested sets". They allow retrieving a hierarchy very easily with a single query.
Here's an article about them
If you are using SQL server 2008, it has built-in support for it through the "hierarchyID" type.
Inserts and updates are more costly and complicated if you don't have the built in support), but querying is much faster and easier.
EDIT:
Damn, missed the part where you already knew about it. (was checking from a mobile phone).
If we stick to your table design … I think you would need some special value in the parentid column to separate original posts from comments (maybe just NULL, if you change definition of that column to nullable). Then, self-join will work. Something like this:
SELECT posts.comment as [Original Post],
comments.comment as Comment
FROM wallposts AS posts
LEFT OUTER JOIN wallposts AS comments
ON posts.id=comments.parentID
WHERE posts.parentID IS NULL
ORDER BY posts.posted, comments.posted
The result set shows Original Post before every comment, and has the right order.
(This was done using SQL Server, so I'm not sure if it works in your environment.)