How should I design a table where a row can have different columns depending on the type of row? - sql

I'm planning to use the Reddit API and store my saved posts in a database. The saves can be of two types - Comments or Posts, both of them have few common columns - author, score, subreddit etc. and a few columns unique to each category:
comment - body_text, comment_id, parent_id, etc.
posts - selftext,link_url,is_video, etc.
I decided to separate the 2 categories into their own tables - Comments table and Posts table. But I don't know how to link these tables to the master table "saves".
My current solution is to have a column kind for the type of save. The comment_id and post_id link the save to its own table. However, this feels like a messy solution and a bit cumbersome. A save can either have a comment_id or a link_id (but not both or neither), and I also have to manage this constraint.
Saves Table :
+----+---------+-------+---------------------------------------------+---------+------------+---------+
| ID | Kind | title | post_url | author | comment_id | post_id |
+----+---------+-------+---------------------------------------------+---------+------------+---------+
| 1 | comment | abc | https://redd.i/redditpostid/redditcommentid | FusionX | 1 | NULL |
| 2 | post | xyz | https://redd.i/redditpostid | XnoisuF | NULL | 1 |
+----+---------+-------+---------------------------------------------+---------+------------+---------+
Post Table :
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
| ID | is_self | selftext | post_url | num_comments | thumbnail |
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
| 1 | no | NULL | i.imgur.com/xyz.jpg | 1020 | someimageurl |
| 2 | yes | "some random selftext of variable length" | redd.it/redditpostid/ | 10 | |
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
Comment table:
+----+---------------------------------+---------------------+--------------------+
| ID | body_html | reddit_comment_id | reddit_parent_id |
+----+---------------------------------+---------------------+--------------------+
| 1 | comment text of variable length | <reddit comment id> | <reddit parent id> |
+----+---------------------------------+---------------------+--------------------+
(reddit ID's are different from my table's own IDs and are only relevant at reddit's end)
Is there a better way to design this database?

I think you should move the owning side of the relation to the two other tables.
So instead of having comment_id and post_id columns in saves table, have a saves_id column in post table and comment table.

Related

A good SQL Structure for grouped comments with replies

I have two tables: News, Images. Both could have comments, so i decided try to make a generic comments table. Also comments could have a reply. I solved two possible methods, but i dont know which choose in order of a good practice or good performance solution.
Method 1 (which i am using):
News:
| ID | CommentGroup | Content | ...etc
Images:
| ID | CommentGroup | Url | ...etc
Considering the next image:
| 14 | 22 | http://image.gif | ...etc
Where the comments could be these:
|UserA:
| Coment1
|
|--|UserB -> UserA:
| Coment2
|
|---|UserC -> UserB:
| | Comment4
|
|UserD -> UserA:
| Coment3
Resulting Comments:
| ID | Group | ReplyGroup | Replied | Content | User |
| 13 | 22 | NULL | 1 | Comment1 | UserA |
| 17 | 22 | 13 | 1 | Comment2 | UserB |
| 11 | 22 | 13 | NULL | Comment3 | UserD |
| 15 | 22 | 17 | NULL | Comment4 | UserC |
If after commented Image14, is created a New, i decide the future comments group number by counting the max of the group column (22) so add 1 (23).
New:
| ID | CommentGroup | Content | ...etc
| 14 | 23 | A new | ...etc
Comments:
| ID | Group | ReplyGroup | Replied | Content | User |
| 22 | 23 | NULL | 1 | Comment1 | UserA |
| 30 | 23 | 22 | NULL | Comment2 | UserB |
Method 2
taken from this question:
News:
| ID | Content | ...etc
Images:
| ID | Url | ...etc
Comments:
| ID | Group | Type | ReplyGroup | Replied | Content | User |
Where type dintincts between News or Images Group.
how you think is better?
or what other solutions are possible?
Thanks.
Initially as a basic implementation I would treat everything as 'content' grouping common attributes.
CONTENT (
id int primary key,
created_on datetime,
created_by int
)
Then have more specific tables of the types of content
e.g.
NEWS (
content_id int primary key foreign key references content(id),
article nvarchar(max)
)
and
IMAGES (
content_id int primary key foreign key references content(id),
url varchar(1000)
)
and
COMMENTS (
content_id int primary key foreign key references content(id),
parent_id int foreign key references content(id)
root_id int foreign key references content(id),
level int,
text nvarchar(2000)
)
Each of these would have a 1:1 relationship with CONTENT.
COMMENTS would then reference other content 'directly' via the parent_id, the reference being either an image, news or indeed another comment.
The root_id in the COMMENTS would reference the actual image or news content (as would the parent_id of all 'top level' comments). This adds the overhead of maintaining the root_id (which shouldn't be too difficult) but will aid selecting comments for some content.
e.g.
-- get the article
SELECT *
FROM content
JOIN news
ON news.content_id = content.id
JOIN users
ON users.id = content.created_by
WHERE content.id = #news_id
-- get the comments
SELECT *
FROM content
JOIN comments
ON comments.content_id = content.id
JOIN users
ON users.id = content.created_by
WHERE comments.root_id = #news_id

SQL relationships and best practices

If I have a User table and a Roles table.
What is the usual practice/pattern for adding the relationship?
Do I create an extra column in the User table for the RoleID or do people usually create a Relationships table like so:
Relationships Table
RelationshipID | UserID | RoleID |... any other relations a user might have
for the last bit, as a user you might create an endless amount of different types of things that all need to be related to you... do you instead add the relationship to each individual table created for each individual thing.. for example:
Pages Table
PageID | Title | Content | Author (UserID)
and so another table would also be similar to this:
Comments Table
CommentID | Comment | Author (UserID)
In this case, I would need to expand upon the Relationships table if I were to do it that way:
Relationships Table
RelationshipID | UserID | RoleID | CommentID
and i'd probably only want to fill in the UserID and CommentID as this relationship is not for the Roles... that is governed by another entry. so for example the values might be put in for a comment relationship:
AUTO | 2 | NULL | 16
I could imagine a multi purpose Revisions table being handy...
Revisions Table
RevisionID | DateCreated | UserID | ActionTypeID | ModelTypeID | Status | RelatedItemID
---------------------------------------------------------------------------------------
1 | <Now> | 3 | 4 (Delete) | 6 (Page) | TRUE | 38
2 | <Now> | 3 | 1 (Delete) | 5 (Comment) | TRUE | 10
3 | <Now> | 3 | 1 (Add) | 5 (Comment) | FALSE | 10
but not for a general Relationships table...
Does this sound correct?
Edit since comments:
They stated that the relationships table should be made due to the many-to-many (data model)
So let's take my previous example of my possible relationship table:
Relationships Table Old
RelationshipID | UserID | RoleID | CommentID... etc
Should it actually be something more like this:
Relationships Table New
RelationshipID | ItemID | LinkID | ItemType | LinkType | Status
---------------------------------------------------------------------------------
1 | 23(PageID) | 7(UserID) | ("Page") | ("User") | TRUE
2 | 22(CommentID) | 7(UserID) | ("Comment") | ("User") | TRUE
3 | 22(CommentID) | 23(PageID) | ("Comment") | ("Page") | TRUE

RDBMS schema for unknown columns

I have a project with a MySQL database, and I would like to be able to upload various datasets. Say I am building a restaurant reviews aggregator. So we would like to keep adding all sources of restaurant reviews we could get our hands on, and keeping all the information.
I have a table review_sources
=========================
| id | name |
=========================
| 1 | Zagat |
| 2 | GoodEats Magazine|
| ... |
| 50 | Allergy News |
=========================
Now say I have a table reviews
=====================================================================
| id | Restaurant Name | source_id | Star Rating | Description |
=====================================================================
| 0 | Joey's Burgers | 1 | 3.5 | Wow! |
| 1 | Jamal's Steaks | 1 | 3.5 | Yummy! |
| 2 | Jenny's Crepes | 1 | 4.5 | Sweet! |
| .... |
| 253| Jeeva's Curries | 3 | 4 | Spicy! |
=====================================================================
Now suppose someone wants to add reviews from "Allergy News", they have a field "nut-free". Or a source of reviews could describe the degree of kashrut compliance, or halal compliance or vegan-friendliness. I as a designer don't know the possible optional fields future data sources may have. I want to be able to answer queries:
What are all the fields in the Zagat reviews?
For review id=x, what is value of the optional field "vegan-friendly"?
So how do I design a schema that can handle these disparate data sources and answer these queries? My reasons for not going for NoSQL are that I do want certain types of normalization, and that this is part of an existing MySQL based project.
I'd use a many-to-many relationship with a table containing a review_id, a field (e.g. "vegan-friendly") and the value of the field. Then of course a reviews_fields table to map one to the other.
Cheers

How to store Goals (think RPG Quest) in SQL

Someone asked me today how they should store quest goals in a SQL database. In this context, think of an RPG. Goals could include some of the following:
Discover [Location]
Kill n [MOB Type]
Acquire n of [Object]
Achieve a [Skill] in [Skillset]
All the other things you get in RPGs
The best I could come up with is:
Quest 1-* QuestStep
QuestStep 1-* MobsToKill
QuestStep 1-* PlacesToFind
QuestStep 1-* ThingsToAcquire
QuestStep 1-* etc.
This seems a little clunky - Should they be storing a query of some description instead (or a formula or ???)
Any suggestions appreciated
User can embark on many quests.
One quest belongs to one user only (in this model).
One quest has many goals, one goal belongs to one quest only.
Each goal is one of possible goals.
A possible goal is an allowed combination of an action and an object of the action.
PossibleGoals table lists all allowed combinations of actions and objects.
Goals are ordered by StepNo within a quest.
Quantity defines how many objects should an action act upon, (kill 5 MOBs).
Object is a super-type for all possible objects.
Location, MOBType, and Skill are object sub-types, each with different properties (columns).
I would create something like this.
For the Quest table:
| ID | Title | FirstStep (Foreign key to GuestStep table) | etc.
The QuestStep table
| ID | Title | Goal (Foreign key to Goal table) | NextStep (ID of next QuestStep) | etc.
Ofcourse this is where the hard part start, how do we describe the goals? I'd say create one record for the goal in the Goal table and save each of the fields of the goal (I.E. how many mobs of what type to kill, what location to visit, etc.) in a GoalFields table, thus:
Goal table:
| ID | Type (type is one from an Enum of goal types) |
The GoalFields Table
| ID | Goal (Foreign key to goal) | Field | Value |
I understand that this can be a bit vague, so here is an example of what dat in the database could look like.
Quest table
| 0 | "Opening quest" | 0 | ...
| 1 | "Time for a Sword" | 2 | ...
QuestStep table
| 0 | "Go to the castle" | 0 | 1 | ...
| 1 | "Kill two fireflies" | 1 | NULL | ...
| 2 | "Get a sword" | 2 | NULL | ...
Goal table
| 0 | PlacesToFind |
| 1 | MobsToKill |
| 2 | ThingsToAcquire |
GoalFields table
| 0 | 0 | Place | "Castle" |
| 1 | 1 | Type | "firefly" |
| 2 | 1 | Amount | 2 |
| 3 | 2 | Type | "sword" |
| 4 | 2 | Amount | 1 |

Retrieve comma delimited data from a field

I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .