What is the best way to structure my database tables? - sql

I'm having trouble designing my database schema.
I am having doubts on whether to separate or group tables.
I have a blog and a News section in my application and database. Both receive Comments and Likes.
TABLE.BLOG | TABLE.COMMENTS.BLOG | TABLE.LIKE.BLOG
TABLE.NEWS | TABLE.COMMENTS.NEWS | TABLE.LIKE.NEWS
I share everything?
TABLE.BLOG | TABLE.NEWS
TABLE.COMMENTS | TABLE.LIKE
or should I keep comments grouped?
In case I should have some kind of reference type blog | news?
I am very confused about the best way to structure the database.
I really appreciate if someone can help me.

Although the exact layout depends mostly on the way that you are going to access and use your data, you are probably going to be better off with comments and likes sitting in the same table. Your second approach is close, although I would probably introduce a third table, called CONTENT, with an ID of anything that could be liked or commented.
Each row in the NEWS and the BLOG table would have a corresponding row in the CONTENT table. A CONTENT row could correspond to either a NEWS or a BLOG, but not both. CONTENT table has attributes common to blogs and news (date, title, author, and so on).
The LIKE and COMMENT tables would then be connected to the CONTENT table, so you would not need to duplicate these two tables for NEWS and for BLOG.
Here is an illustration:

Best is subjective, though I'd have two tables:
Content (news and blog content, store like count)
Comment (news and blog comments)
and then add a type field on both that distinguishes between blog and news. That way you can reuse a lot of the db code.
IOW, minimize the number of tables if you can.

Related

Is a two table solution a performant and scalable solution to implement tagging in Postgres 9.5?

Background
I work for a real estate technology company. An upcoming project involves building out functionality to allow users to affix tags/labels (plural) to a MLS listing (real estate property). The second requirement is to allow a user to search by one or more tags. We won't be dealing with keeping track of counts or building word clouds or anything like that.
Solutions Researched
I found this SO Q&A and think the solution is pretty straightforward and have attempted to adapt some ideas from it below. Also, I understand that JSONB support is much better in 9.5 and it may be a possibility. If you have any insight here I'd love to hear your thoughts as well in an answer.
Attempted Solution
Table: Tags
Columns: ID, OwnerID, TagName, CreatedDate
Table: TaggedItems
Columns: ID, TagID (references above), PropertyID, CreatedDate, (Possibly some denormalized data to assist with presenting search results; property name, original listor, etc.)
Inserting new tags should be straightforward. Searching tags should also be straightforward since the user will select one or multiple tags from a searchable dropdown, thus affording me access to the actual TagID which I can use to query the TaggedItems table. When showing the full profile view for a listing, I can use it's PropertyID and the UserID to query my tables for the existence of one or more Tags to display in the view.
Edit: It's probably worth noting that we don't keep an entire database of properties, we access them via an API partner; hence the two table solution and not 3.
If you want to Nth normalize you would actually use 3 tables.
1 Property/Listing
2 Tags
3 CrossReferenceBetween the Two
The 3rd table creates a many to many relationship between the other 2 tables.
In this case only the 3 rd table would carry both the tagid and the property.
Going with 2 tables if fine too depending on how large of use you have as a small string won't bloat your databse too much.
I would say that it is strongly preferable to separate the tags to a separate table when you need to do lookups and more on it. Otherwise you have to have a delimited list which then what happens if a user injects a delimiter into their tag value? Also how do you plan on searching the delimited list? You will constantly expand that to a table or use regex and the regex might give you false positives as "some" will match "some" and "something" depending on how you write your code.......

How to structure a SQL table for many authors for one post?

I've gotten far enough where I realize I need a relationship between two tables.
I've got everything working, except for this scenario:
Let's say one table is called "posts," and the other one is called "authors."
they both have keys and all that good stuff.
What if a "post" is written by two or more "authors?"
I can't seem to figure out how I can make a "post" link to multiple "author" keys.
i.e.
PostID | postText | date| etc | authorkey1, authorkey2, authorkey3 ...
I'm extremely sorry if this has been answered before, but I've scoured stackoverflow and other online sources and have not found anything that applies to my scenario.
what if a "post" is written by two or more "authors?"
You will want another table if you can have multiple authors per post.
You can have:
1 author per post (two tables, Authors/Posts, )
2 authors per post (two tables, Authors/Posts like you describe)
* authors per post (three tables, Authors/Posts and PostAuthors)
I would strongly recommend the third. If you choose the second approach, it will work, but you are going to run into scaling issues. What if you have 3 authors? Or 10?
If you add new columns for each, your table is going to get messy fast. But if your PostAuthors table just contains posts and authors, you can have as many entries in that table as you want.

Best way to add content (large list) to relational database

I apologize if this may seem like somewhat of a novice question (which it probably is), but I'm just introducing myself to the idea of relational databases and I'm struggling with this concept.
I have a database with roughly 75 fields which represent different characteristics of a 'user'. One of those fields represents a the locations that user has been and I'm wondering what the best way is to store the data so that it is easily retrievable and can be used later on (i.e. tracking a route on Google Maps, identifying if two users shared the same location etc.)
The problem is that some users may have 5 locations in total while others may be well over 100.
Is it best to store these locations in a text file named using the unique id of each user(one location on each line, or in a csv)?
Or to create a separate table for each individual user connected to their unique id (that seems like overkill to me)?
Or, is there a way to store all of the locations directly in the single field in the original table?
I'm hoping that I'm missing a concept, or there is a link to a tutorial that will help my understanding.
If it helps, you can assume that the locations will be stored in order and will not be changed once stored. Also, these locations are static (I don't need to add any more locations once as they can't be updated).
Thank you for time in helping me. I appreciate it!
Store the location data for the user in a separate table. The location table would link back to the user table by a common user_id.
Keeping multiple locations for a particular user in a single table is not a good idea - you'll end up with denormalized data.
You may want to read up on:
Referential Integrity
Relational denormalization
The most common way would be to have a separate table, something like
USER_LOCATION
+------------+------------------+
| USER_ID | LOCATION_ID |
+------------+------------------+
| | |
If user 3 has 5 locations, there will be five rows containing user_id 3.
However, if you say the order of locations matter then an additional field specifying the ordinal position of the location within a user can be used.
The separate table approach is what we call normalized.
If you store a location list as a comma-separated string of location ids, for example, it is trival to maintain the order, but you lose the ability for the database to quickly answer the question "which users have been at location x?". Your data would be what we call denormalized.
You do have options, of course, but relational databases are pretty good with joining tables, and they are not overkill. They do look a little funny when you have ordering requirements, like the one you mention. But people use them all the time.
In a relational database you would use a mapping table. So you would have user, location and userlocation tables (user is a reserved word so you may wish to use a different name). This allows you to have a many-to-many relationship, i.e. many users can visit many locations. If you want to model a route as an ordered collection of locations then you will need to do more work. This site gives an example

Sql design question - many tables or not?

15 ECTS credits worth of database design down the bin.. I really can't come up with the best design solution for my problem.
Which is this: Basically I'm making a tool that gathers a lot of information concerning the user. At the most the user would fill in 50 fields of data, ranging from simple checkboxes to text input. I'm designing the db right now (with mySql) and can't decide whether or not to use a single User table with all of those fields, or to have a table for each category of input.
One example would be "type of payment". This one has three options and if I went with the "table" way I would add a table paymentType and give it binary fields for each payment type. Then I would need and id table to identify which paymentType the user has chosen whereas if I use a single user table, the data would already be there.
The site will probably see a lot of users (tv, internet and radio marketing) so I'm concerned which alternative would be the best.
I'll be happy to provide more details if you need more to base a decision.
Thanks for reading.
Read this article "Database Normalization Basics", and come back here if you still have questions. It should help a lot.
The most fundamental idea behind these decisions, as you will see in this article, is that each table should represent one and only one "thing", and each field should relate directly and only to that thing.
In your payment types example, it probably makes sense to break it out into a separate table if you anticipate the need to store additional information about each payment type.
Create your "Type of Payment" table; there's no real question there. That's proper normalization and the power behind using relational databases. One of the many reasons to do so is the ability to update a Type of Payment record and not have to touch the related data in your users table. Your join between the two tables will allow your app to see the updated type of payment info by changing it in just the 1 place.
Regarding your other fields, they may not be as clear cut. The question to ask yourself about each field is "does this field relate only to a user or does it have meaning and possible use in its own right?". If you can never imagine a field having meaning outside of the context of a user you're safe leaving it as a field on the user table, otherwise do the primary key-foreign key relationship and put the information in its own table.
If you are building a form with variable inputs, I wouldn't recommend building it as one table. This is inflexible and dirty.
Normalization is the key, though if you end up with a key/value setup, or effectively a scalar type implementation across many tables and can't cache:
a) the form definition from table data and
b) the joined result of storage (either a caching view or otherwise)
c) or don't build in proper sharding
Then you may hit a performance boundary.
In this KVP setup, you might want to look at something like CouchDB or a less table-driven storage format.
You may also want to look at trickier setups such as serialized object storage and cache-tables if your internal data is heavily relative to other data already in the database
50 columns is a lot. Have you considered a table that stores values like a property sheet? This would only be useful if you didn't need to regularly query the values it contains.
INSERT INTO UserProperty(UserID, Name, Value)
VALUES(1, 'PaymentType', 'Visa')
INSERT INTO UserProperty(UserID, Name, Value)
VALUES(1, 'TrafficSource', 'TV')
I think I figured out a great way of solving this. Thanks to a friend of mine for suggesting this!
I have three tables, Field {IdField, FieldName, FieldType}, FieldInput {IdInput, IdField, IdUser} and User { IdUser, UserName... etc }
This way it becomes very easy to see what a user has answered, the solution is somewhat scalable and it provides a good overview. I will constrain the alternatives in another layer, farther away from the db. I believe it's a tradeoff worth doing.
Any suggestions or critics to this solution?

A database schema for Tags (eg. each Post has some optional tags)

I have a site like SO, Wordpress, etc, where you make a post and u can have (optional) tags against it.
What is a common database schema to handle this? I'm assuming it's a many<->many structure, with three tables.
Anyone have any ideas?
A three table many to many structure should be fine.
Eg. Posts, PostsToTags(post_id,tag_id), Tags
The key is indexing. Make sure you PostsToTags table is indexed both ways (post_id,tag_id and tag_id,post_id) also if read performance is ultra critical you could introduce an indexed view (which could give you post_name, tag_name)
You will of course need indexes on Posts and Tags as well.
"I'm assuming it's a many<->many structure, with three tables. Anyone have any ideas?"
More to the point, there aren't any serious alternatives, are there? Two relational tables in a many-to-many relationship require at least an association table to carry all the combination of foreign keys.
Does SO do this? Who knows. Their data model includes reference counts, and -- for all any knows -- date time stamps and original creator and a lot of other junk about the tag.
Minimally, there have to be three tables.
What they do on SO is hard to know.
I'm not entirely sure if this is what SO uses. But there is a good discussion here.
It would be a good idea to loook at how wordpress handles tags for posts and it will give you some idea.
The other possibility of course is that there are only two tables.
Given there are at most 5 tags, a Question table with five nullable foreign-key references to a Tag table is a possiblity.
Not very normalized, but it could be more performant.