Best way to store and retrieve comment replies in sql server - sql

I want to store comment replies in database table.
I have a table to store comments:
comment_id comment_par_id, comment_from comment_text comment date ....
New comment has par_id=0 while the replies has par_id set to comment id to which it was replied.
The nesting is just one level. Reply to a reply also has the same parent id.
Is this the best way to store the replies?
I looked few articles that recommends to create a separate table to store the replies.
Then have a mapping column to point the comment in the main table.
Another alternate is to create a third table that stores the mapping like:
reply_id comment_id
Which is the best way?
No matter what, I only run a query to return the replies for a given comment.
And it is the most running query and must run fast as we have millions of rows in the comment table.

It's a one (comment) to many (replies) relationship, so you should use two tables, with the replies table foreign keyed to the comments table.

If I understand you right, you have an "original post" of some kind, with a set of replies? Similar to how StackOverflow works, with an initial question, with a set of answers? If that is the case, there are a few options. There is the option of using a single table that supports different "types" of records. This choice has the benefit of only requiring a single table, however it also has the drawback of more ambiguity. One has to know that multiple types of records are stored in such a table, making it more confusing.
A better alternative is to have multiple tables, for each "type" of record. This removes the ambiguity, while adding complexity. From a different perspective, different "types" of similar records often have different data, even if some of the data is the same. By using separate tables, it is easier to add distinct traits to each type of comment (original vs. reply), without having to resort to a variety of oddball ways of storing and referencing he extra "unique" data in a single-table system.

Since it is similar to StackOverflow, check out the Schema. Look at the Posts and Comments table.
http://sqlserverpedia.com/wiki/Understanding_the_StackOverflow_Database_Schema

Related

SQL Server database design with foreign keys

I have the following partial database design:
All the tables are dependent on each other so the table bvd_docflow_subdocuments is dependent on the table bdd_docflow_subsets
and the table bvd_docflow_subdocuments is dependent on bvd_docflow_subsets. So I thought I could me smart and use foreign keys on every table (and ON DELETE CASCADE). However the FK are being drilldown how further I go in to the tables.
The problem is the table bvd_docflow_documents has no point having a reference to the 1docflow_documentset_id` PK / FK. Is there a way (and maybe my design is crappy) that only the table standing above it has an FK relationship between the tables and not all the tables above it.
Edit:
More explanation:
In the bvd_docflow_subsets table information is stored about objects to create documents. There is an relation between that table and bvd_docflow_subdocuments table (This table stores master data about all the documents for an subset. (docflow_subset_id is in both tables). This is the link between those to tables.
Going further down we also got the table bvd_docflow_documents this table contains the actual document data. The link between bvd_docflow_documents and bvd_docflow_subdocuments is bvd_docflow_subdocument_id.
On every table I got an foreign key defined so when data is removed on a table all the data linked to that data is also removed.
However when we look to the bvd_docflow_documents table it has all the foreign keys from the other tables (docflow_subset_id and docflow_documentset_id) and there is the problem. The only foreign key needed for that bvd_docflow_documents table is docflow_subdocument_id and no other.
Edit 2
I have changed my design further and removed information that I don't need after initial import of the data.
See the following link for the (total) databse design:
https://sqldbm.com/Project/SQLServer/Share/_AUedvNutCEV2DGLJleUWA
The tables subsets, subdocuments and documents have a many to many relationship so I thought a table in between those 3 documents_subdocuments is the way to go were I define all the different keys for those tables.
I am not used to the database design first and then build it. But, for everything there is a first time, and I try to do make a database that is using standards and is using the power of SQL Server the correct way.
I'll address the bottom-most table and ignore the rest for the most part.
But first some comments. Your schema is simply a model of a system. To provide feedback, one must understand this "system" and how it actually works to evaluate your model. In addition, it is important to understand your entities and your reasons for choosing them and modelling them in the specified manner. Without that understanding all of this guessing based on experience.
And another comment. Slapping an identity column into every table is just lazy modelling IMO. Others will disagree, but you need to also enforce all natural keys. Do you have natural keys? It is rare not to have any. Enforce those that do exist.
And one last comment. Stop the ridiculous pattern of prepending the column names with the table names. And you should really think long and hard about using very long table names. Given what you have, I sense you need a schema for your docflow stuff.
For the documents table, your current PK makes no sense. Again, you've slapped an identity column into the table. By itself, this column is a key for the table. The inclusion of any other columns does not make the key any more "unique" - that inclusion is logical nonsense. Following your pattern, you would designate the identity column as the primary key. But ...
According to your image, the documents table is related to one and only one subdocument. You added a foreign key to that table - which matches the image. You also added additional columns and foreign keys to the "higher" tables. So now a document "points" to a specific subdocument. It also points to a specific subset - which may have no relationship to the subdocument. The same thought applies to the other FK. I have a doubt that this is logically correct. So why do these columns (and related FKs) exist? Perhaps this is the result of premature optimization - which everyone knows is the root of all evil coding. Again, it is impossible to know if this is "right" or even "useful" for your model.
To answer your question "... is there a way", the answer is obviously yes. You remove the columns of which you complain. You added them - Why? Is this perhaps a problem with the tool you are using?
And some last comments. There is nothing special about "varchar(50)". Perhaps this is a place holder that will be updated later. It may also be another sign of laziness. And generally speaking, columns with names like "type" and "code" tend to be foreign keys to "lookup" tables - because people like to add, modify, or remove these sorts categorization values over time. I'm also concerned about the column name overlap among the tables. "Location" exists in multiple tables, as do action_code and action_id. And a column named "id" (action_id) suggests a lookup to another table - is it? Should it be? Is there a relationship between action_id and action_code? From a distance it is impossible to answer any of these questions.
But designing a database is more art than science. Sometimes you just need to create something, populate it with some sample data, and then determine if it works for your needs. Everyone will get something wrong in the first try. That is expected; that is how you learn. The most difficult part is actually completing your first attempt.

Can There be a Table in a Relational Database that Doesn't Have a Relationship to Any Other Table? [duplicate]

This question already has answers here:
in a relational database, can we have a table without any relation with the other tables?
(5 answers)
Closed 7 years ago.
I have an application in which I store PostId and keywords (Keyword) belonging to a Post in a table named KeywordsForPost. The primary key for that table is the combination of PostId and Keyword. PostId is not unique nor is Keyword.
I needed this implementation because I might need to search for posts regarding the keywords they contain.
I have another table named NewKeywords. The one and only column in that table is Keyword. When a post is created, keywords in that post are inserted into both KeywordsForPost and NewKeywords tables. An operation is applied to the keywords in the table NewKeywords at the user's command so that they no longer become "New keywords". So I delete those keywords after that operation is applied. Currently my NewKeywords table does not have a relationship with any other table. Is this practice justified? Or is there a better practice?
I searched and found this answer.
can we have a table without any relation with the other tables
But did not find it satisfactory.
I also find it different to the question previously asked because it asks a general question, whereas mine is specific. I need to know if a relationship can be added to the table. So far I came up with nothing.
Yes you can. The only thing that could happen is that those table wouldn't have a relationship to other table. i would not say that this is the best way to go, because all depend in your situation. And, like the answer says: It can still be given a relationship later.
Either I'm misreading your question, or there actually is a relationship between NewKeyWords and KeyWordsForPost. It's a value (Keyword) that's common to both tables, and could be used for a relational join. That might be a stupid join that no one would want to do, it might be real slow, for lack of a relevant index, and the keywords aren't a declared key anywhere, but it's still a relationship.
The relationship is inherent in the data, whether you have declared it or not.
I am going to take #rlartiga 's approach guys. I am going to create a Keywords table with the column Keyword and have it as the primary key. Then I am going to have both KeywordsForPost and NewKeywords tables refer to Keyword in Keywords. Thanks for your support guys! Comment if you think this is not the appropriate move.

Redundant field in SQL for Performance

Let's say I have two Tables, called Person, and Couple, where each Couple record stores a pair of Person id's (also assume that each person is bound to at most another different person).
I am planning to support a lot of queries where I will ask for Person records that are not married yet. Do you guys think it's worthwhile to add a 'partnerId' field to Person? (It would be set to null if that person is not married yet)
I am hesitant to do this because the partnerId field is something that is computable - just go through the Couple table to find out. The performance cost for creating new couple will also increase because I have to do this extra book keeping.
I hope that it doesn't sound like I am asking two different questions here, but I felt that this is relevant. Is it a good/common idea to include extra fields that are redundant (computable/inferable by joining with other tables), but will make your query a lot easier to write and faster?
Thanks!
A better option is to keep the data normalized, and utilize a view (indexed, if supported by your rdbms). This gets you the convenience of dealing with all the relevant fields in one place, without denormalizing your data.
Note: Even if a database doesn't support indexed views, you'll likely still be better off with a view as the indexes on the underlying tables can be utilized.
Is there always a zero to one relationship between Person and Couples? i.e. a person can have zero or one partner? If so then your Couple table is actually redundant, and your new field is a better approach.
The only reason to split Couple off to another table is if one Person can have many partners.
When someone gets a partner you either write one record to the Couple table or update one record in the Person table. I argue that your Couple table is redundant here. You haven't indicated that there is any extra info on the Couple record besides the link, and it appears that there is only ever zero or one Couple record for every Person record.
How about one table?
-- This is psuedo-code, the syntax is not correct, but it should
-- be clear what it's doing
CREATE TABLE Person
(
PersonId int not null
primary key
,PartnerId int null
foreign key references Person (PersonId)
)
With this,
Everyone on the system has a row and a PersonId
If you have a partner, they are listed in the PartnerId column
Unnormalized data is always bad. Denormalized data, now, that can be beneficial under very specific circumstances. The best advice I ever heard on this subject it to first fully normalize your data, assess performance/goals/objectives, and then carefully denormalize only if it's demonstrably worth the extra overhead.
I agree with Nick. Also consider the need for history of the couples. You could use row versioning in the same table, but this doesn't work very well for application databases, works best in a in a DW scenario. A history table in theory would duplicate all the data in the table, not just the relationship. A secondary table would give you this flexibility to add additional information about the relationship including StartDate and EndDate.

Are relationship tables really needed?

Relationship tables mostly contain two columns: IDTABLE1, and IDTABLE2.
Only thing that seems to change between relationship tables is the names of those two columns, and table name.
Would it be better if we create one table Relationships and in this table we place 3 columns:
TABLE_NAME, IDTABLE1, IDTABLE2, and then use this table for all relationships?
Is this a good/acceptable solution in web/desktop application development? What would be downside of this?
Note:
Thank you all for feedback. I appreciate it.
But, I think you are taking it a bit too far... Every solution works until one point.
As data storage simple text file is good till certain point, than excel is better, than MS Access, than SQL Server, than...
To be honest, I haven't seen any argument that states why this solution is bad for small projects (with DB size of few GB).
It would be a monster of a table; it would also be cumbersome. Performance-wise, such a table would not be a great idea. Also, foreign keys are impossible to add to such a table. I really can't see a lot of advantages to such a solution.
Bad idea.
How would you enforce the foreign keys if IDTABLE1 could contain ids from any table at all?
To achieve acceptable performance on joins without a load of unnecessary IO to bring in completely unrelated rows you would need a composite index with leading column TABLE_NAME that basically ends up partitioning the table into sections anyway.
Obviously even with this pseudo partitioning going on you would still be wasting a lot of space in the table/indexes just repeating the table name for each row.
Isn't it a big IF that you're only going to store the 2 ID fields? If I have a StudentCourse (or better yet Enrollment) table that has StudentID & CourseID, but wouldn't EnrollmentDate go in this table as well since not all students enroll on the first day of class. Seems like a bad idea to add this column to an already bloated table where most records will be null.
The benefit of a single table could be a requirement that the application has the ability to allow user/admin to create these relationships with data (Similar to have a single lookup or reference list table) and avoid having to create a new table to address these User Created References. Needing dynamic querying may benefit as well. An application that requires such dynamic data structure requirements might be better suited for a schemaless or nosql database.

A database schema for Tags (eg. each Post has some optional tags)

I have a site like SO, Wordpress, etc, where you make a post and u can have (optional) tags against it.
What is a common database schema to handle this? I'm assuming it's a many<->many structure, with three tables.
Anyone have any ideas?
A three table many to many structure should be fine.
Eg. Posts, PostsToTags(post_id,tag_id), Tags
The key is indexing. Make sure you PostsToTags table is indexed both ways (post_id,tag_id and tag_id,post_id) also if read performance is ultra critical you could introduce an indexed view (which could give you post_name, tag_name)
You will of course need indexes on Posts and Tags as well.
"I'm assuming it's a many<->many structure, with three tables. Anyone have any ideas?"
More to the point, there aren't any serious alternatives, are there? Two relational tables in a many-to-many relationship require at least an association table to carry all the combination of foreign keys.
Does SO do this? Who knows. Their data model includes reference counts, and -- for all any knows -- date time stamps and original creator and a lot of other junk about the tag.
Minimally, there have to be three tables.
What they do on SO is hard to know.
I'm not entirely sure if this is what SO uses. But there is a good discussion here.
It would be a good idea to loook at how wordpress handles tags for posts and it will give you some idea.
The other possibility of course is that there are only two tables.
Given there are at most 5 tags, a Question table with five nullable foreign-key references to a Tag table is a possiblity.
Not very normalized, but it could be more performant.