Storing a multiple choice quiz in a database - deciding the schema - sql

I am trying to implement a multiple choice quiz and will want to store all my questions and answers in a SQLite database. I will have many questions, and for each question there will 2 or more possible answers to display.
My question is, how should I store the questions and answers in a database? I have two ideas for a schema (primary key in bold)
as (many to many)
questions (questionID:int , questionString:String, correctAnswerID:int)
answers (answerID:int , answerString:String)
questions_and_answers (questionID, answerID)
2.
questions (questionID:int, questionString:String, correctAnswerID:int)
answers (answerID:int, answerString:String, questionID:int foreign key)
I'm not sure which one is better, or if there is another way?
Maybe questions_and_answers would get very large and cause long retrieval times and memory problems? Then again, I assume question_and_answers would be indexed on the primary keys. In the second schema, answers would be indexed on answerID and not questionID? meaning the search times would go up as the whole table would have to be searched?
There may be ~10,000 - 20,000 answers. (the quiz may be run on a mobile device and questions will need to be shown "instantly")
Note: I don't expect there to much overlap of answers between questions. I wouldn't think the amount of overlap would mean less data being stored, considering the extra space required by the questions_and_answers table

You're second schema is the better one, because it models the actual domain: each question has a set of answers. Even if you can "compress" the data by storing duplicate answers once, it does not match the actual domain.
Down the road you'll want to edit answers. With schema 1, that means first searching if that answer already exists. If it does exist, you then would have to check if any questions still rely on the old answer. If it did not exist, you would still have to check if any other questions relied on that answer, and then either edit that answer in place or create a new answer.
Schema 1 just makes life really hard.
To answer your index questions, you would need to add an index on questionId. Once you have that index, looking up answers for a question should scale.
Now, on a completely different note, why use a database for this? Consider storing them as simple documents in a standard format like json. Anytime you query a question, you will almost always want the answers, and vice versa. Instead of executing multiple queries, you can load the entire document in one step.
If you then find you need more advanced storage (queries, redundancy, etc) you can move to a document database like MongoDB or CouchDB.

It seems deadlock (circular loop) as questionID column is referred as foreign key in answers table and correctAnswerID column is referred as foreign key in questions table.
It's better to create a bit type column in answers table to marked the correct answer and remove correctAnswerID column.

Related

Can There be a Table in a Relational Database that Doesn't Have a Relationship to Any Other Table? [duplicate]

This question already has answers here:
in a relational database, can we have a table without any relation with the other tables?
(5 answers)
Closed 7 years ago.
I have an application in which I store PostId and keywords (Keyword) belonging to a Post in a table named KeywordsForPost. The primary key for that table is the combination of PostId and Keyword. PostId is not unique nor is Keyword.
I needed this implementation because I might need to search for posts regarding the keywords they contain.
I have another table named NewKeywords. The one and only column in that table is Keyword. When a post is created, keywords in that post are inserted into both KeywordsForPost and NewKeywords tables. An operation is applied to the keywords in the table NewKeywords at the user's command so that they no longer become "New keywords". So I delete those keywords after that operation is applied. Currently my NewKeywords table does not have a relationship with any other table. Is this practice justified? Or is there a better practice?
I searched and found this answer.
can we have a table without any relation with the other tables
But did not find it satisfactory.
I also find it different to the question previously asked because it asks a general question, whereas mine is specific. I need to know if a relationship can be added to the table. So far I came up with nothing.
Yes you can. The only thing that could happen is that those table wouldn't have a relationship to other table. i would not say that this is the best way to go, because all depend in your situation. And, like the answer says: It can still be given a relationship later.
Either I'm misreading your question, or there actually is a relationship between NewKeyWords and KeyWordsForPost. It's a value (Keyword) that's common to both tables, and could be used for a relational join. That might be a stupid join that no one would want to do, it might be real slow, for lack of a relevant index, and the keywords aren't a declared key anywhere, but it's still a relationship.
The relationship is inherent in the data, whether you have declared it or not.
I am going to take #rlartiga 's approach guys. I am going to create a Keywords table with the column Keyword and have it as the primary key. Then I am going to have both KeywordsForPost and NewKeywords tables refer to Keyword in Keywords. Thanks for your support guys! Comment if you think this is not the appropriate move.

Database Table Design for storing yes, no, and quantity type responses to questions

I have a table that stores answers to checklist questions, where the checklists are in the format of yes, no, not applicable, or resolved.
Table: CHECKLIST_ANSWER
ATTRIBUTE_ID PK, FK
CHECKLIST_INSTANCE_ID PK, FK
TOGGLE_VALUE (1=yes, 2=No, 3=n/a, 4=was a no then it was resolved)
FAIL_REASON
ATTRIBUTE_ID is a foreign key to a table of questions, i.e. Was the part measured within some tolerance?
Now I want to model a checklist that would store quantity responses, i.e. How many incorrect dimensions were found on the drawing?
I feel confident that I can store these questions in my same table as the yes/no/na type attributes, but can I utilize the the same table to store the quantity value? Should I add a new column say QUANTITY_VALUE ? And then either QUANTITY_VALUE or TOGGLE_VALUE would be null depending on the attribute.
Table: CHECKLIST_ANSWER
ATTRIBUTE_ID PK, FK
CHECKLIST_INSTANCE_ID PK, FK
TOGGLE_VALUE (1=yes, 2=No, 3=n/a, 4=was a no then it was resolved)
QUANTITY_VALUE
FAIL_REASON
The goal of this database application is to move paper and excel checklists online and capture into Oracle to give provide more efficient collection of metrics and then better aggreagation of the inputs. Am I asking for trouble down the road by blending two into one table? Or should I create a table, CHECKLIST_QTY_ANSWER
If you have many options, you usually create a seperate table, only with an id and description (or name). To Connect these two tables, you insert a field into the CHECKLIST_ANSWER-Table, and define it as a foreign key, which references to the id (primary key) of the new table, I have mentioned first.
Hope it is clear :)
I don't see any problem with adding the new column to your existing table. I would include a check constraint that required that either TOGGLE_VALUE or QUANTITY_VALUE be null (but not both).
There's no good reason to create a second, nearly identical table, where only a single column varies. In my experience, that tends to lead to more problems than the single-table solution (it's practically an invitation to use dynamic SQL).
I definitely would not re-use the existing column (as suggested in another answer), as that would prevent the use of a foreign key on the toggle value.
If I understand your question correctly you're looking for advice on how to store the new type of answers in your schema?
Since this is a new type of answer you'd need to denote that the format of the data is now different from your y/n/na answer type. You could do this by adding another table CheckListAnswerType and a FK in your CHECKLIST_ANSWER table.
However, your CHECKLIST_INSTANCE_ID could easily indicate that this is a type of checklist that follows a certain answer pattern. I'm not sure about the rest of your schema buy you could have a CHECKLIST_INSTANCE table that specifies it's answer type...
Your TOGGLE_VALUE could follow a numeric scheme for your new answer types and with the a fore mentioned CheckListAnswerType you could and would have to always take this into account when querying the data to make sure you weren't picking the wrong answer type given the question context so that you didn't get a Yes value while looking for your How many incorrect dimensions were found on the drawing? answer.
I would think all of that would be fine, UNTIL you start wanting to store answers of a different data-type. Then it would be time to redesign schema.
TL;DR: If you're using the same data-type for answers then you would be okay re-using the existing schema (column) while adding a way to tell the answer, or question/answer, types apart to query accurately. If you want to store other data-types in TOGGLE_VALUE, implement new schema objects to do so. Don't try and force other data-types into the current schema if you can avoid it. Also if you did this consider renaming TOGGLE_VALUE as it no longer represents a Toggle. answerValue might better fit the new design.

Survey Data Model - How to avoid EAV and excessive denormalization?

My database skills are mediocre at best and I have to design a data model for survey data. I have spent some thoughts on this and right now I feel that I am stuck between some kind of EAV model and a design involving hundreds of tables, each with hundreds of columns (and thousands of records). There must be a better way to do this and I hope that the wise folks on this forum can help me.
My question is: how should I model the answers to survey questions in an RDBMS? Using SQL Server is mandatory. So alternative data storage systems should be excluded from this discussion. (Sure, some should and will be evaluated, but not here please.) I don't need a solution for the entire data model, for now I'm only interested in the Answers part.
I have already searched various forums, but I couldn't really find a solution. If it has already been given elsewhere, please excuse me and provide me with a link so I can read it up.
Some assumptions about the data I have to deal with:
Each survey consists of 1 to n questionnaires
Each questionnaire consists of 100-2,000 questions (please ignore that 2,000 questions really sound like a lot to answer...)
Questions can be of various types: multiple-choice, free text, a number (like age, income, percentages, ...)
Each survey involves 10-200 countries (These are not the respondents. The respondents are actually people in the countries.)
Depending on the type of questionnaire, each questionnaire is answered by 100-20,000 respondents per country.
A country can adapt the questionnaires for a survey, i.e. add, remove or edit questions
The data for one country is gathered in a separate database in that country. There is no possibility for online integration from the start.
The data for all countries has to be integrated later. This means for example, if a country has deleted a question, that data must somehow be derived from what they sent in order to achieve a uniform design across all countries
I will have to write the integration and cleaning software, which will need to work with every country's data
In the end the data needs to be exported to flat files, one rectangular grid per country and questionnaire.
I have already discussed this topic with people from various backgrounds and have not come to a good solution yet. I mainly got two kinds of opinions.
The domain experts, who are used to working with flat files (spreadsheet-style) for data processing and analysis vote for a denormalized structure with loads of tables and columns as I described above (1 table per country and questionnaire). This sounds terrible to me, because I learned that wide tables are to be avoided, it will be annoying to determine which columns are actually in a table when working with it, the database will become cluttered with hundreds of tables (or I even need to set up multiple databases, each with a similar yet a bit differetn design), etc.
O-O-programmers vote for a strongly "normalized" design, which would effectively lead to a central table containing all the answers from all respondents to all questions. This table would either need to contain a column of type sql_variant type or multiple answer columns with different types to store answers of different types (multiple choice, free text, ..). The former would essentially be a EAV model. I tend to follow Joe Celko here, who strongly discourages its use (he calls it OTLT or "One True Lookup Table"). The latter would imply that each row would contain null cells for the not applicable types by design.
Another alternative I could think of would be to create one table per answer type, i.e., one for multiple-choice questions, one for free text questions, etc.. That's not so generic, it would lead to a lot of union joins, I think and I would have to add a table if a new answer type is invented.
Sorry for boring you with all this text and thank you for your input!
Cheers,
Alex
PS: I asked the same question here: http://www.eggheadcafe.com/community/aspnet/13/10242616/survey-data-model--how-to-avoid-eav-and-excessive-denormalization.aspx
Well imgur is down so i'll post the pic later.
I think this is completely feasible within a relational model. I've built a CDM to show how I would do this.
Outbound
It takes 4 entities to define a Country's Survey. Some Parent Survey, the country and a list of questions. Your questions have an internal relationship so when one country "edits" a question, you can track both the question asked by the country and the question it came from. The other thing you need is a Possible Answer entity/table. Each question may have an associated list of possible answers (multiple choice or ranges etc). Those 4 should completely define the "OUTBOUND" side of this.
Inbound
The "INBOUND" side is just 2 new entities, The Respondent and the answer. The respondent is straightforward, just the demographics of that person if you know them and here you can include a relationship back to country. Each respondent answered the survey in a given country. (Person may be 1:n with Respondent if the person travels or has dual citizenship)
The answer is basic; either it is one of the choices listed in the list of Possible Answers or it is provided. Don't get all caught up in the fact that the answer may be a number, date, etc just yet. Either it's a FK or a string of characters.
Reporting
A report is a join over all of these... You'll choose a country and a survey, get the list of questions and answers.
Answer Complexity
Depends on where you want to do your calculations. If you used a Varchar2(4000) column for your user-provided answers, you could add an attribute to question to describe the datatype of the answer. Q: Age? DT: Integer Between (0 and 130). Then your integration layer can do the validation instead of the database enforcing it. Or you can have 4 columns, one for number, date, character and CLOB. And your integration layer will determine the column to use. When you report those answers out, you'll just select all four columns with Coalesce().
Is this an EAV because there's a slight ambiguity to the datatype of "Answer"
No, it's not.
AN EAV model breaks down an Entity into a list of attributes.
like so:
Entity Attribute Value
1 Fname Stephanie
1 Lname Page
1 Age 30
because you see the Answer column of the Survey schema is holding both words and numbers like the Value column does here you think that defines EAV. It does not. Just as if I added 3 datatype columns to this model it wouldn't change it FROM an EAV.
I soooo hate it when
I've had people tell me that the query I'm tuning has to go "as fast as possible". Ok, so give me a billion dollars and 30 years. "Wait, a Billion what?" "As much as", "as fast as" aren't requirements. You can validate anything you want in a database... build a shedload of Before triggers, voila! Validation galore.
What's the datatype of an age column? Or Birthdate column? Depends on what your data source is. Some older records may only have Month and Year, or just year, or 'around' or 'circa' some year. You couldn't have just a number column and do 'as much validation as possible'. and NUMBER(2) may be BETTER validation than just NUMBER. So now you'll have NUMBER(1), NUMBER(2), NUMBER... to have "as much as".
Where I think you are getting tripped up
Think of this as a Conceptual Data Model, not a Physical one. In those terms Survey is an entity. Is Question an entity or just an attribute of Survey. If you built One table PER you're clearly saying that Question is just an Attribute of Survey and storing them vertically makes this an EAV. What this model shows is that Question is actually another entity. There is a relationship between Questions, e.g. 'a country [can] edit questions'. There was the original question and edited one. Each question has a collection of possible answers. And the most important this is that, they are all questions. In an EAV I call fname, lname, bdate, age, major, salary, etc... all very disparate things, just attributes. In this case we're not including the name of the agency who originated the survey and the date it was issued and the date is due back and the etc... as questions.
Let me put this another way. You're Fedex. You want to store timestamps for certain events. Each time a package enters or leaves a facility or vehicle. Time on the picking up truck, time off the truck and into the first facility, time out of that facility and onto a plane, etc. Do you store them Horizontally? How do you know the number of hops in advance? If you store them vertically does that automatically make it an EAV? And if so why.
You're a weather company getting temps from stations around the country. Let's say the sensors are designed to send a reading when the temperature changes +/- a full degree. If you store a sensor_ID|timestamp|temp is a Reading Table is that an EAV? Each reading isn't an attribute of the sensor, they are themselves entities which belong to a collection/series.
One thing that vertical storage of answers has in common with an EAV is its difficulty in performing analytic queries. If you wanted a list of all the people who answered TRUE to question 5 and 10 but FALSE to 6 and 11 would be very difficult when done vertically. Maybe that's why you see this an EAV. If you want to do that, you need a different storage. The relational storage of the question and answers isn't the best reporting database. Let's go back to the Fedex example. It's not simple to do "transit" time reporting when the rows are vertical.
This sounds like you are wrestling with a common problem: how to use a hammer to fasten a screw.
Both alternatives you listed are bad, each for different reasons. But that's because you are trying to stuff your particular data model into a relational database system. A good approach would be to look beyond the relational database at some other database/storage systems, try a couple out, and find the best fit for your project.
I have tried the EAV model and gave up because it was far too complex, and I am afraid to try the multi-tables model with a relational database system. The easiest solution I have found with a relational database is: store each complete response as a single CLOB, serialized into JSON or YAML (or something else lightweight), in a responses table.
create table responses (
id uuid primary key,
questionnaire_id uuid references questionnaires.id,
data text
)
If I was using SQL Server, Express will be OK, then I would do this:
Table with list of questions, flags
for type (bit), if required flag
(bit), the correct answer if exists,
etc
Table with list of countries
Table linking of countries and
questions (some countries may not get some questions
Table for answers with columns for
the question(s) and a xml
column for the optional questions
including those which are added
If you are not versed in shredding XML then use sparse columns for all the optional questions. I do not recall exactly the limit on the number of sparse columns in a table but I believe it is above 30,000. SQL Server internally stores sparse columns as XML and will shred it when one selects the column and yes it can be indexed
The diagram below show a diagram created with SQL Server. the column AL_A4 will hold the answer to QL_Id = 4 and is of type sparse. The QL_Id in the QuestionList table is not flagged required letting you know to make the column in AnswerList sparse.
Since countries will add questions create QuestionListCustom, QuestiontoCountryCustom and AnswerListCustom tables and add the information from the custom questions.
I am sure there are other ways to design the storage, this is the way I would turn in the homework, if this is not homework then you surely work for the UN.
Have you considered not reinventing the wheel? There are open source survey applications already built. Even if they don't meet your needs, download a few and check out their data models.

Best practice for a "comment" table in a relational database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Assume you want to build a database for some web application. This database already contains many tables and you might have to extend it in the future.
Also, you want the end user to be able to comment any kind of object in the database.
I would like to find a solution for this would be generic enough so that I don't have to extend it each time I add a new table in the database.
I thought of the following:
Table Name: comment
columns:
id : the id of a comment
user_id : the id of the user making the comment
object_table_name : the table where the commented object is
object_id : the id of the commented object in the object_table_name table.
text : the text
date : the date
This table sort of solve my problem, the only thing that troubles me is that the relational aspect of it is rather weak (I can't make object_id a foreign key for instance).
Also if some day I need to rename a table I will have to change all the concerned entries in the comment table.
What do you think of this solution ? Is there design pattern that would help me out ?
Thanks.-
Isn't that cleaner ?
table comment_set
id
table comment
id
comment_set_id -> foreign key to comment_set
user_id
date
text
existing table foo
...
comment_set_id -> foreign key to comment_set
existing table bar
...
comment_set_id -> foreign key to comment_set
You are mixing data and metadata, which is not the best design pattern. They should be separated.
However, since the comments don't seem to be very important anyway, you solution is OK. The worst thing you can end up with is to lose comments on your objects.
Some databases, most notably, PostgreSQL, support COMMENT clause just for the cases like this.
Update:
If you want to comment on individual records in each table, it's OK to have such a table.
object_table_name does not have to change if you rename a table, since it's data, not metadata.
You cannot write a native SQL query that will fetch comments for the record of any table (not known by the moment of the query development), though you can build the dynamic queries to do that.
In this case, you will have to keep your data and metadata in sync (UPDATE the comment table when you RENAME the table they refer to). The first one is a DML statement (changes data), the second one is DDL (changes metadata).
Also make sure that all PRIMARY KEYs have the same types (same as object_id).
Read about EAV.
You can make your whole database like that. But then it will be hell working with that data.
Why don't you want to place a Comment attribute for each database entity which should support comments? This way you can get all the data you need in a single query, and many GUI programs for databases will provide you with full code completion in SQL, which will prevent errors that can easily occur when operating with strings. That way the code is heavily dependent on procedural code, which isn't right for database systems.
You can enumerate the table names in a separate table, so that changes in names do not affect the system in any major way. Just update the enumeration table.
Although you are distancing your self from referential integrity, i can see another way to accomplish what you want.
I generally prefer to keep the comments with the rows to which they apply. Assuming your database efficiently stores empty VARCHAR fields, you shouldn't pay a penalty for this. There isn't really anything to "extend" when you implement this approach, the maintenance of the comment becomes part of the queries you are already using to update the rows.
The only advantage to the single-note-table approach is that it allows easy searches across notes for different kinds of database entries.
Assuming MS SQL, and if the volume is relatively small, as you seem to suggest, then Extended Properties might be worth exploring. I've used them sucessfully in the past and they seem to be a permanent fixture.

Is a one column table good design? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
It it ok to have a table with just one column? I know it isn't technically illegal, but is it considered poor design?
EDIT:
Here are a few examples:
You have a table with the 50 valid US state codes, but you have no need to store the verbose state names.
An email blacklist.
Someone mentioned adding a key field. The way I see it, this single column WOULD be the primary key.
In terms of relational algebra this would be a unary relation, meaning "this thing exists"
Yes, it's fine to have a table defining such a relation: for instance, to define a domain.
The values of such a table should be natural primary keys of course.
A lookup table of prime numbers is what comes to my mind first.
Yes, it's certainly good design to design a table in such a way as to make it most efficient. "Bad RDBMS Design" is usually centered around inefficiency.
However, I have found that most cases of single column design could benefit from an additional column. For example, State Codes can typically have the Full State name spelled out in a second column. Or a blacklist can have notes associated. But, if your design really does not need that information, then it's perfectly ok to have the single column.
I've used them in the past. One client of mine wanted to auto block anyone trying to sign up with a phone number in this big list he had so it was just one big blacklist.
If there is a valid need for it, then I don't see a problem. Maybe you just want a list of possibilities to display for some reason and you want to be able to dynamically change it, but have no need to link it to another table.
One case that I found sometimes is something like this:
Table countries_id, contains only one column with numeric ID for each country.
Table countries_description, contains the column with country ID, a column With language ID and a column with the localized country name.
Table company_factories, contains information for each factory of the company, including the country in Wich is located.
So to maintain data coherence and language independent data in the tables the database uses this schema with tables with only one column to allow foreign keys without language dependencies.
In this case I think the existence of one column tables are justified.
Edited in response to the comment by: Quassnoi
(source: ggpht.com)
In this schema I can define a foreign key in the table company_factories that does not require me to include Language column on the table, but if I don't have the table countries_id, I must include Language column on the table to define the foreign key.
There would be rare cases where a single-column table makes sense. I did one database where the list of valid language codes was a single-column table used as a foreign key. There was no point in having a different key, since the code itself was the key. And there was no fixed description since the language code descriptions would vary by language for some contexts.
In general, any case where you need an authoritative list of values that do not have any additional attributes is a good candidate for a one-column table.
I use single-column tables all the time -- depending, of course, on whether the app design already uses a database. Once I've endured the design overhead of establishing a database connection, I put all mutable data into tables where possible.
I can think of two uses of single-column tables OTMH:
1) Data item exists. Often used in dropdown lists. Also used for simple legitimacy tests.
Eg. two-letter U.S. state abbreviations; Zip codes that we ship to; words legal in Scrabble; etc.
2) Sparse binary attribute, ie., in a large table, a binary attribute that will be true for only a very few records. Instead of adding a new boolean column, I might create a separate table containing the keys of the records for which the attribute is true.
Eg. employees that have a terminal disease; banks with a 360-day year (most use 365); etc.
-Al.
Mostly I've seen this in lookup type tables such as the state table you described. However, if you do this be sure to set the column as the primary key to force uniqueness. If you can't set this value as unique, then you shouldn't be using one column.
No problem as long as it contains unique values.
I would say in general, yes. Not sure why you need just one column. There are some exceptions to this that I have seen used effectively. It depends on what you're trying to achieve.
They are not really good design when you're thinking of the schema of the database, but really should only be used as utility tables.
I've seen numbers tables used effectively in the past.
The purpose of a database is to relate pieces of information to each other. How can you do that when there is no data to relate to?
Maybe this is some kind of compilation table (i.e. FirstName + LastName + Birthdate), though I'm still not sure why you would want to do that.
EDIT: I could see using this kind of table for a simple list of some kind. Is that what you are using it for?
Yes as long as the field is the primary key as you said it would be. The reason is because if you insert duplicate data those rows will be readonly. If you try to delete one of the rows that are duplicated. it will not work because the server will not know which row to delete.
The only use case I can conceive of is a table of words perhaps for a word game. You access the table just to verify that a string is a word: select word from words where word = ?. But there are far better data structures for holding a list of words than a relational database.
Otherwise, data in a database is usually placed in a database to take advantage of the relationships between various attributes of the data. If your data has no attributes beyond its value how will these relationship be developed?
So, while not illegal, in general you probably should not have a table with just one column.
All my tables have at least four tech fields, serial primary key, creation and modification timestamps, and soft delete boolean. In any blacklist, you will also want to know who did add the entry. So for me, answer is no, a table with only one column would not make sense except when prototyping something.
Yes that is perfectly fine. but an ID field couldn't hurt it right?