Best practice for a "comment" table in a relational database [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Assume you want to build a database for some web application. This database already contains many tables and you might have to extend it in the future.
Also, you want the end user to be able to comment any kind of object in the database.
I would like to find a solution for this would be generic enough so that I don't have to extend it each time I add a new table in the database.
I thought of the following:
Table Name: comment
columns:
id : the id of a comment
user_id : the id of the user making the comment
object_table_name : the table where the commented object is
object_id : the id of the commented object in the object_table_name table.
text : the text
date : the date
This table sort of solve my problem, the only thing that troubles me is that the relational aspect of it is rather weak (I can't make object_id a foreign key for instance).
Also if some day I need to rename a table I will have to change all the concerned entries in the comment table.
What do you think of this solution ? Is there design pattern that would help me out ?
Thanks.-

Isn't that cleaner ?
table comment_set
id
table comment
id
comment_set_id -> foreign key to comment_set
user_id
date
text
existing table foo
...
comment_set_id -> foreign key to comment_set
existing table bar
...
comment_set_id -> foreign key to comment_set

You are mixing data and metadata, which is not the best design pattern. They should be separated.
However, since the comments don't seem to be very important anyway, you solution is OK. The worst thing you can end up with is to lose comments on your objects.
Some databases, most notably, PostgreSQL, support COMMENT clause just for the cases like this.
Update:
If you want to comment on individual records in each table, it's OK to have such a table.
object_table_name does not have to change if you rename a table, since it's data, not metadata.
You cannot write a native SQL query that will fetch comments for the record of any table (not known by the moment of the query development), though you can build the dynamic queries to do that.
In this case, you will have to keep your data and metadata in sync (UPDATE the comment table when you RENAME the table they refer to). The first one is a DML statement (changes data), the second one is DDL (changes metadata).
Also make sure that all PRIMARY KEYs have the same types (same as object_id).

Read about EAV.
You can make your whole database like that. But then it will be hell working with that data.
Why don't you want to place a Comment attribute for each database entity which should support comments? This way you can get all the data you need in a single query, and many GUI programs for databases will provide you with full code completion in SQL, which will prevent errors that can easily occur when operating with strings. That way the code is heavily dependent on procedural code, which isn't right for database systems.

You can enumerate the table names in a separate table, so that changes in names do not affect the system in any major way. Just update the enumeration table.
Although you are distancing your self from referential integrity, i can see another way to accomplish what you want.

I generally prefer to keep the comments with the rows to which they apply. Assuming your database efficiently stores empty VARCHAR fields, you shouldn't pay a penalty for this. There isn't really anything to "extend" when you implement this approach, the maintenance of the comment becomes part of the queries you are already using to update the rows.
The only advantage to the single-note-table approach is that it allows easy searches across notes for different kinds of database entries.

Assuming MS SQL, and if the volume is relatively small, as you seem to suggest, then Extended Properties might be worth exploring. I've used them sucessfully in the past and they seem to be a permanent fixture.

Related

Altering existing table vs creating new table which approach is beast [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
One scenario came to us today. Here we have a database table where more than 1 million records already present and now we want to alter table which will add one column. That column will be having default value 0. By altering the table with column which will be remain 0 for all existing records will it affect performance.
Also,here that column value remains 0 for 99.9% only if some action triggered by user then it will change to 1.So shall I create new table to hold those value or alter existing table.
I would like to know advantages or disadvantages of both approaches
This is an interesting question. If you use 0, then the existing data will need to be rewritten to make space for the new column. (There might be an exception if you already have bit columns and the new column is a bit.)
Rewriting the table is a one-time operation and it does take time, although on one million rows, it shouldn't take too long.
The alternative is to create a second table to store flags that are set. This could be as either columns or one row per flag. You would use left join to load data from this table.
I would be biased to having a second table, but not for performance reasons. Rather, I would like to include other information about the flag being set -- notably the date/time of when the flag is set. Also, I might want to distinguish between values that default to 0 versus those that are explicitly reset to 0.
By altering the table with column which will be remain 0 for all
existing records will it affect performance.
The answer depends on nullability of your new column and on your server version:
Misconceptions around adding columns to a table by P.Randal
So if your server ##version is >= 2012 and your column is nullable, even if you fill it in with the default value by using WITH VALUES clause will not affect the performance as it will be metadata only operation.
I'm going to assume that you care mostly about "what is best for the ongoing operation of my system", rather than which option is easiest for the one-off change.
To make sure I understand your question, I'd like to restate it with a (made up) example. Please comment if I've misunderstood.
You have a table with > 1 million rows; let's pretend it's called people. The table has several foreign key relationships (e.g. countries, a person lives in a country, and is in turn related to many other table via foreign keys (e.g. interactions, a person interacts with the system).
You have now identified an additional attribute of people. The attribute is mandatory (so you are setting a default value), and of type integer.
You say it's "only incremented when the user interacts with the system"; it would be great if you could be more explicit about that.
In (very) general terms, adding an integer column with a default value to your table will have no noticable effect on performance of your existing queries. Performance impact happens primarily when you include columns in where clauses and/or joins. So, select * from people where name = ? and age > ? will behave exaxctly as it does today.
When you add that column to a where clause, you may get a slightly better performance if the column is included in an index, because it may filter out more rows. select * from people where name = ? and age > ? and new_column = ? might reduce the number of rows to inspect, and thus improve speed.
If I've understood your question, you're considering creating a new table to hold that data instead, presumably with a foreign key relationship (person_id), and a value for that new attribute.
So, if it's true that the attribute is mandatory, and has a default value of 0, in the business domain sense, then creating a separate table makes no real sense. You want to keep all mandatory attributes for an entity in the same table.
If the attribute is not mandatory, it gets a little murkier - you can make a case for attributes that belong together, e.g. "address" to have their own tables. However, for a single attribute, it would be counter intuitive.

Using one numerical database column vs several boolean columns [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
So I have a db with a table user.
The user can be moderator, owner, or nothing of both, or both.
I can design this with to booleans isModerator and isOwner
which would become to db columns
Or I could create a column hasUserRight
with 1 for moderator
2 for owner
What is the better approach to design the db with it and why?
You are clearly talking about user roles, you a second option is using flags for each role. This will limit you to a certain number of roles and is not easy to understand. The first option is not normalized, adding functionality will be more work etc.
Adding a table with roles and a userrole table will give you a more generic solution.
As long as there are only the two roles, both your solutions will work. But I agree with the others that one column per role would be easier to read and should be preferred hence.
However, problems occur when having to add a third role. If this is something that you know for sure will never happen, okay. But if it can happen, you should think of the consequences. Let's add a new role "revisor" and let's say that a revisor must be a moderator.
Solution 1: isModerator, isOwner
Add isRevisor. All written code will run as before. You can add code for isRevisor. Add a check constraint so that isRevisor cannot be set true if isModerator is false. Done.
=> database (DDL) changes only
Solution 2: hasUserRight 0=none, 1=moderator, 2=owner, 3=all=moderator+owner and a constraint hasUserRight in (0,1,2,3)
(I wouldn't recommend this, because it's not obvious what the different values mean.)
You need more values: 4=moderator+revisor, 5=all=moderator+owner+revisor (or better 3=all=moderator+owner+revisor and 5=moderator+owner?). Your code will be broken, because hasUserRight in (1,3) does no longer select all moderators. You will have to fix the code. Change the contraint to hasUserRight in (0,1,2,3,4,5).
=> code changes + database (DDL) changes
Solution 3: hasUserRight 0=none, 1=moderator, 2=owner, 3=all=moderator+owner and a table UserRight holding the values 0 to 3 along with an explnational text.
Again, you need more values: 4=moderator+revisor, 5=all=moderator+owner+revisor (or better 3=all=moderator+owner+revisor and 5=moderator+owner?). Add them to your role table. Your code will be broken, because hasUserRight in (1,3) does no longer select all moderators. You will have to fix the code. No need to change any constraint; the foreign key only allows valid values.
=> only code changes
Solution 4: a table role and a bridge table user_role
Simply insert the new role in table role. Add entries to table user_role if you like. Done. All you need is inserts. Your dbms cannot guarantee however that each revisor is a moderator; you will have to care about this yourself.
=> no changes at all to code or database (DDL)
As you see solution 2 and 3 (hasUserRight) are bad. Decide for either solution 1 or 4, whatever you prefer and find more appropriate.
An issue arises with your second solution. What if the user is an owner and a moderator? You would have to assign a fourth numerical value for that situation assuming that 0 would represent neither owner nor moderator. Even if this layout was well documented, it still would not be intuitive.
Your first solution is much cleaner and easier to understand.

Storing a multiple choice quiz in a database - deciding the schema

I am trying to implement a multiple choice quiz and will want to store all my questions and answers in a SQLite database. I will have many questions, and for each question there will 2 or more possible answers to display.
My question is, how should I store the questions and answers in a database? I have two ideas for a schema (primary key in bold)
as (many to many)
questions (questionID:int , questionString:String, correctAnswerID:int)
answers (answerID:int , answerString:String)
questions_and_answers (questionID, answerID)
2.
questions (questionID:int, questionString:String, correctAnswerID:int)
answers (answerID:int, answerString:String, questionID:int foreign key)
I'm not sure which one is better, or if there is another way?
Maybe questions_and_answers would get very large and cause long retrieval times and memory problems? Then again, I assume question_and_answers would be indexed on the primary keys. In the second schema, answers would be indexed on answerID and not questionID? meaning the search times would go up as the whole table would have to be searched?
There may be ~10,000 - 20,000 answers. (the quiz may be run on a mobile device and questions will need to be shown "instantly")
Note: I don't expect there to much overlap of answers between questions. I wouldn't think the amount of overlap would mean less data being stored, considering the extra space required by the questions_and_answers table
You're second schema is the better one, because it models the actual domain: each question has a set of answers. Even if you can "compress" the data by storing duplicate answers once, it does not match the actual domain.
Down the road you'll want to edit answers. With schema 1, that means first searching if that answer already exists. If it does exist, you then would have to check if any questions still rely on the old answer. If it did not exist, you would still have to check if any other questions relied on that answer, and then either edit that answer in place or create a new answer.
Schema 1 just makes life really hard.
To answer your index questions, you would need to add an index on questionId. Once you have that index, looking up answers for a question should scale.
Now, on a completely different note, why use a database for this? Consider storing them as simple documents in a standard format like json. Anytime you query a question, you will almost always want the answers, and vice versa. Instead of executing multiple queries, you can load the entire document in one step.
If you then find you need more advanced storage (queries, redundancy, etc) you can move to a document database like MongoDB or CouchDB.
It seems deadlock (circular loop) as questionID column is referred as foreign key in answers table and correctAnswerID column is referred as foreign key in questions table.
It's better to create a bit type column in answers table to marked the correct answer and remove correctAnswerID column.

Is a one column table good design? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
It it ok to have a table with just one column? I know it isn't technically illegal, but is it considered poor design?
EDIT:
Here are a few examples:
You have a table with the 50 valid US state codes, but you have no need to store the verbose state names.
An email blacklist.
Someone mentioned adding a key field. The way I see it, this single column WOULD be the primary key.
In terms of relational algebra this would be a unary relation, meaning "this thing exists"
Yes, it's fine to have a table defining such a relation: for instance, to define a domain.
The values of such a table should be natural primary keys of course.
A lookup table of prime numbers is what comes to my mind first.
Yes, it's certainly good design to design a table in such a way as to make it most efficient. "Bad RDBMS Design" is usually centered around inefficiency.
However, I have found that most cases of single column design could benefit from an additional column. For example, State Codes can typically have the Full State name spelled out in a second column. Or a blacklist can have notes associated. But, if your design really does not need that information, then it's perfectly ok to have the single column.
I've used them in the past. One client of mine wanted to auto block anyone trying to sign up with a phone number in this big list he had so it was just one big blacklist.
If there is a valid need for it, then I don't see a problem. Maybe you just want a list of possibilities to display for some reason and you want to be able to dynamically change it, but have no need to link it to another table.
One case that I found sometimes is something like this:
Table countries_id, contains only one column with numeric ID for each country.
Table countries_description, contains the column with country ID, a column With language ID and a column with the localized country name.
Table company_factories, contains information for each factory of the company, including the country in Wich is located.
So to maintain data coherence and language independent data in the tables the database uses this schema with tables with only one column to allow foreign keys without language dependencies.
In this case I think the existence of one column tables are justified.
Edited in response to the comment by: Quassnoi
(source: ggpht.com)
In this schema I can define a foreign key in the table company_factories that does not require me to include Language column on the table, but if I don't have the table countries_id, I must include Language column on the table to define the foreign key.
There would be rare cases where a single-column table makes sense. I did one database where the list of valid language codes was a single-column table used as a foreign key. There was no point in having a different key, since the code itself was the key. And there was no fixed description since the language code descriptions would vary by language for some contexts.
In general, any case where you need an authoritative list of values that do not have any additional attributes is a good candidate for a one-column table.
I use single-column tables all the time -- depending, of course, on whether the app design already uses a database. Once I've endured the design overhead of establishing a database connection, I put all mutable data into tables where possible.
I can think of two uses of single-column tables OTMH:
1) Data item exists. Often used in dropdown lists. Also used for simple legitimacy tests.
Eg. two-letter U.S. state abbreviations; Zip codes that we ship to; words legal in Scrabble; etc.
2) Sparse binary attribute, ie., in a large table, a binary attribute that will be true for only a very few records. Instead of adding a new boolean column, I might create a separate table containing the keys of the records for which the attribute is true.
Eg. employees that have a terminal disease; banks with a 360-day year (most use 365); etc.
-Al.
Mostly I've seen this in lookup type tables such as the state table you described. However, if you do this be sure to set the column as the primary key to force uniqueness. If you can't set this value as unique, then you shouldn't be using one column.
No problem as long as it contains unique values.
I would say in general, yes. Not sure why you need just one column. There are some exceptions to this that I have seen used effectively. It depends on what you're trying to achieve.
They are not really good design when you're thinking of the schema of the database, but really should only be used as utility tables.
I've seen numbers tables used effectively in the past.
The purpose of a database is to relate pieces of information to each other. How can you do that when there is no data to relate to?
Maybe this is some kind of compilation table (i.e. FirstName + LastName + Birthdate), though I'm still not sure why you would want to do that.
EDIT: I could see using this kind of table for a simple list of some kind. Is that what you are using it for?
Yes as long as the field is the primary key as you said it would be. The reason is because if you insert duplicate data those rows will be readonly. If you try to delete one of the rows that are duplicated. it will not work because the server will not know which row to delete.
The only use case I can conceive of is a table of words perhaps for a word game. You access the table just to verify that a string is a word: select word from words where word = ?. But there are far better data structures for holding a list of words than a relational database.
Otherwise, data in a database is usually placed in a database to take advantage of the relationships between various attributes of the data. If your data has no attributes beyond its value how will these relationship be developed?
So, while not illegal, in general you probably should not have a table with just one column.
All my tables have at least four tech fields, serial primary key, creation and modification timestamps, and soft delete boolean. In any blacklist, you will also want to know who did add the entry. So for me, answer is no, a table with only one column would not make sense except when prototyping something.
Yes that is perfectly fine. but an ID field couldn't hurt it right?

What is the best way to handle this constraint in SQL Server 2005?

I have SMS based survey application which takes in a survey domain, and a answer.
I've gotten requests for detailed DDL, so.... The database looks like this
SurveyAnswer.Answer must be unique within all active Surveys for that SurveyDomain. In SQL terms, this should always return 0..1 rows:
select * from survey s, surveyanswer sa
where s.surveyid = sa.surveyid and
s.active = 1 and
s.surveydomainid = #surveydomainid
sa.answer = #answer
I plan on handling this constraint at the application level, but would also like some database integrity to be enforced. What is the best way to do this? Trigger? Possible in a constraint?
As you are covering 2 tables there is AFAIK only 2 ways to enforce this.
Trigger as you suggested.
Indexed view with unique constraint accross the 3 columns.
As far as reliability is concerned I would go for the Indexed view but the only downside is that it will be difficult to understand by third parties.
It is possible to add a constraint that is implemented in a UDF like this:
alter table MyTable add constraint complexConstraint
check (dbo.complexConstraintFct()=0)
Where complexConstraintFct would be a function containing a query on other tables. However this approach has some issues as check constraints were designed to be evaluated on a single row at a time but updates can affect more that one row at a time.
So, the bottom line is: stick with triggers.
Assuming you are using stored procedures to perform DML operations, you could add a guard clause to the SP that adds answers to surveys to check for the existence of an equivalent answer. You could then either throw an exception or return a status code to indicate that the answer could not be added.
You can't do it at the row level (eg CHECK constraint) so you have to have something that can view all rows
A trigger can send "nice" messages, but they run after the DML statement. You have fine control over processing.
An indexed view prevents the DML statement, but it gives a technical error message. It's an extra object and indexes to maintain.
I think what you're saying is that for any active question, the tuple (surveyDomain, surveyQuestion, surveyAnswer) must be unique?
Or in other words, survey:surveyanswer is 1:1 if the survey is active, even though survey:surveyanswer is set up to be 1:many.
If so, the answer is to change your table structure. Adding a nullable activeAnswerId column to survey will effectively make the relation 1:1; your existing constraint unique SurveyId (or unique SurveyId, SurvetDomainId) will suffice to enforce uniqueness.
Indeed, unless I'm misunderstanding, I'm surprised that Survey has a Question column; I'd expect Survey:Question to be 1:many (a survey has many questions) or even many:many, if a question can show up on more than one survey.
More generally, I suspect the reason that figuring out how to enforce the constraint is difficult and requires "heroics" like triggers or user defined functions, is a symptom of a schema that doesn't accurately model your problem domain.
OP comments:
no, you're missing it. Survey:Answer is 1:n. "Question" is the survey question – Tuple would be (SurveyDomain.SurveyDomainId, Survey.Answer)
You mean that for every domain, there's at most one answer? Again, looking at your schema, it's misleading at best. A SurveyDomain has many Surveys (each of which has a Question column) and a Survey has many Answers? (Schema)
But if the Survey's active bit is set, there should be only one Answer?
Is Survey a misnomer for Question?
It's really not clear what you're trying to model.
Again, if it's hard to add a constraint, that suggests that your model doesn't work.