What mysql database tables and relationships would support a Q&A survey with conditional questions? [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm working on a fairly simple survey system right now. The database schema is going to be simple: a Survey table, in a one-to-many relation with Question table, which is in a one-to-many relation with the Answer table and with the PossibleAnswers table.
Recently the customer realised she wants the ability to show certain questions only to people who gave one particular answer to some previous question (eg. Do you buy cigarettes? would be followed by What's your favourite cigarette brand?, there's no point of asking the second question to a non-smoker).
Now I started to wonder what would be the best way to implement this conditional questions in terms of my database schema? If question A has 2 possible answers: A and B, and question B should only appear to a user if the answer was A?
Edit: What I'm looking for is a way to store those information about requirements in a database. The handling of the data will be probably done on application side, as my SQL skills suck ;)

Survey Database Design
Last Update: 5/3/2015
Diagram and SQL files now available at https://github.com/durrantm/survey
If you use this (top) answer or any element, please add feedback on improvements !!!
This is a real classic, done by thousands. They always seems 'fairly simple' to start with but to be good it's actually pretty complex. To do this in Rails I would use the model shown in the attached diagram. I'm sure it seems way over complicated for some, but once you've built a few of these, over the years, you realize that most of the design decisions are very classic patterns, best addressed by a dynamic flexible data structure at the outset.
More details below:
Table details for key tables
answers
The answers table is critical as it captures the actual responses by users.
You'll notice that answers links to question_options, not questions. This is intentional.
input_types
input_types are the types of questions. Each question can only be of 1 type, e.g. all radio dials, all text field(s), etc. Use additional questions for when there are (say) 5 radio-dials and 1 check box for an "include?" option or some such combination. Label the two questions in the users view as one but internally have two questions, one for the radio-dials, one for the check box. The checkbox will have a group of 1 in this case.
option_groups
option_groups and option_choices let you build 'common' groups.
One example, in a real estate application there might be the question 'How old is the property?'.
The answers might be desired in the ranges:
1-5
6-10
10-25
25-100
100+
Then, for example, if there is a question about the adjoining property age, then the survey will want to 'reuse' the above ranges, so that same option_group and options get used.
units_of_measure
units_of_measure is as it sounds. Whether it's inches, cups, pixels, bricks or whatever, you can define it once here.
FYI: Although generic in nature, one can create an application on top of this, and this schema is well-suited to the Ruby On Rails framework with conventions such as "id" for the primary key for each table. Also the relationships are all simple one_to_many's with no many_to_many or has_many throughs needed. I would probably add has_many :throughs and/or :delegates though to get things like survey_name from an individual answer easily without.multiple.chaining.

You could also think about complex rules, and have a string based condition field in your Questions table, accepting/parsing any of these:
A(1)=3
( (A(1)=3) and (A(2)=4) )
A(3)>2
(A(3)=1) and (A(17)!=2) and C(1)
Where A(x)=y means "Answer of question x is y" and C(x) means the condition of question x (default is true)...
The questions have an order field, and you would go through them one-by one, skipping questions where the condition is FALSE.
This should allow surveys of any complexity you want, your GUI could automatically create these in "Simple mode" and allow for and "Advanced mode" where a user can enter the equations directly.

one way is to add a table 'question requirements' with fields:
question_id (link to the "which brand?" question)
required_question_id (link to the "do you smoke?" question)
required_answer_id (link to the "yes" answer)
In the application you check this table before you pose a certain question.
With a seperate table, it's easy adding required answers (adding another row for the "sometimes" answer etc...)

Personally, in this case, I would use the structure you described and use the database as a dumb storage mechanism. I'm fan of putting these complex and dependend constraints into the application layer.
I think the only way to enforce these constraints without building new tables for every question with foreign keys to others, is to use the T-SQL stuff or other vendor specific mechanisms to build database triggers to enforce these constraints.
At an application level you got so much more possibilities and it is easier to port, so I would prefer that option.
I hope this will help you in finding a strategy for your app.

Related

Which routes to pick for REST API? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm developing a restful API using NodeJS. To give you a little more insight in my application:
My application has surveys. A survey contains questions which in their turn has choices.
To add a question, you need to provide the id of the survey in the body of the post. To add an option, you need to provide the id of the question.
Now for the API routes. What would be better:
Option 1
/api/departments
/api/surveys
/api/questions
/api/choices
Option 2
/api/departments
/api/departments/department_id/surveys
/api/departments/department_id/surveys/survey_id/questions
/api/departments/department_id/surveys/survey_id/questions/question_id/options
The last one seems more logical because I don't need to provide the id of the parent in the body of the post.
What is best practice to use as endpoints?
I don't think there's a "best practice" between the two; rather, it's about having the interface that makes the most sense for your application. #2 makes the most sense if you're typically going to access the surveys on a per-department basis, and also makes sense in terms of accessing questions on a per-survey basis. If you wanted to eliminate the per-department part, you'd do something that's kind of a mix of the above:
/api/departments
/api/surveys
/api/surveys/survey_id/questions
/api/surveys/survey_id/questions/question_id/options
If you DO want to go by per-department, I'd change #2 so that instead of /api/departments/surveys one would access /api/departments/department_id/surveys ...
But without knowing more about the application, it's difficult to know what the best answer is.
Do surveys contain anything besides questions? do questions contain anything besides choices? The reason I ask is that if the answer to both is no then I'd actually prefer something like this:
/api/departments/ # returns a list of departments
/api/departments/<survey-id>/ # returns a list of questions
/api/departments/<survey-id>/<question-id>/ # returns a list of choices
/api/departments/<survey-id>/<question-id>/<choice-id> # returns a list of options
or something to that effect. Basically, I like to keep the concept of "containers" and "data" rigid. I like to think of it like a file system.
So if the concept ends in an "s", it's a container (and I'd like the route to end with a "/" to indicate that it acts like a folder, but that's a nit).
Any access to "/" results in the element at that index, which of course can be another container. Similar to directory structure in a file system. For example, if I were to lay these out in a file system, I might come up with something like this:
+ /api/departments/
|-----------------/human-resources/
|---------------/survery-10/
|----------/choice-10
The choice depends on whether resources are owned or shared by higher-level resources; whether you want cascading delete or not. If owned (with cascading delete), choose option 2 and if shared, choose option 1.
If a survey is deleted, I guess you want to delete all questions and options with it (cascading delete). This matches well with option 2, because if you delete resource /api/departments/departmentid/surveys/surveyid, you naturally also delete all subresources /api/departments/departmentid/surveys/surveyid/questions/....
On the other hand, if you want the option to share questions among multiple surveys and share surveys among multiple departments, then option 1 is better.
Of course, you can also have a mix of option 1 and option 2, if some resource types are owned and others are shared.

Does Stack Exchange's database schema follow good practice? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
This is somewhat of a meta question, but because it relates to a database design, I thought I should post it here.
I'm building a site that includes Q+A and was wondering how I should structure my SQL database, so naturally, I looked to the best of the best. However, the Stack Exchange database schema seems to defy what I've learned about creating maintainable/extensible table hierarchies.
As you can see, Stack Exchange stores all of its "Posts" in one table, except for comments, which has its own table. Post types include questions, answers, and various wiki things. This results in a lot of NULL columns in the table. For example, questions have titles, tags, and answerCounts, while answers don't, so all answer entries have NULL for all three of those columns. If more post types are added over time, this will progressively become less maintainable. And the fact that comments is the only type of post that has its own table just seems inconsistent.
What I've read states that it's generally preferred to use an object subclass hierarchy, in which there's a generic "Posts" table along with a bunch of tables for each type of post that all have one column that maps back to the corresponding entry in the "Posts" table. This keeps the number of null columns to a minimum and makes it more extensible, but slows down queries because they'll require more joins.
So why does Stack Exchange use this giant table method? Is it just the result of ages of modifications to an old database? More specifically, should I use this model for my own Q+A system or stick with an object subclass hierarchy (my Q+A/forum system will closely resemble SO's, with several types of posts including questions, answers, polls, reviews, etc.)?
This is a classic case of so-called "Object-relational impedance mismatch". Specifically, you are taking about mapping OO's inheritance into a relational database structure. There are several common ways of doing that -
A table per subclass,
A table per leaf subclass, and
A table per class hierarchy (with a discriminator)
Each of these strategies is perfectly valid. Moreover, the structures could be mixed as needed.
It looks like Stack Exchange used a table per class hierarchy approach, with PostTypeId serving as a discriminator. This approach is as valid as any other approach that they could have taken. It is also one of the simplest ones to take from the maintenance standpoint, because it lets you construct manual queries with less work.
There is another thing in the structure of the table that you did not mention: it is not normalized. Specifically, there are AnswerCount and CommentCount fields that store information that could be obtained by aggregating the table (i.e. running a SELECT COUNT(*) FROM ... WHERE ... AND other.ParentId = p.Id ...) This is a common tradeoff between normalization and speed of execution: most likely, the profiling has indicated that the aggregation takes significant amount of time, so the counts have been moved into the "parent" record.

Database Design For Users\Downloads [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I need to design a database for something like a downloads site . I want to keep track of users , the programs each users downloaded and also allow users to rate+comment said programs.The things I need from this database - get average rating for a program , get all comments for a program , know exactly what program was downloaded by whom(I dont care how many times each program was downloaded but I want to know for each users what programs he has downloaded),maybe also count number of comments for each program and thats about it(it's a very small project for personal use that I want to keep simple)
I come up with these entities -
User(uid,uname etc)
Program(pid,pname)
And the following relationships-
UserDownloadedProgram(uid,pid,timestamp)
UserCommentedOnProgram(uid,pid,commentText,timestamp)
UserRatedProgram(uid,pid,rating)
Why I chose it this way - the relationships (user downloads , user comments and rates) are many to many . A user downloads many programs and a program is downloaded by many users. Same goes for the comments (A user comments on many programs and a program is commented or rated by many users). The best practice as far as I know is to create a third table which is one to many (a relationship table).
. I suppose that in this design the average rating and comment retrieval is done by join queries or something similar.
I'm a total noob in database design but I try to adhere to best practices , is this design more or less ok or am I overlooking something ?
I can definitely think of other possibilities - maybe comment and\or rating can be an entity(table) by itself and the relationships are between 3 entities. I'm not really sure what the benefits\drawbacks of that are: I know that I don't really care about the comments or the ratings , I only want to display them where appropriate and maintain them(delete when needed) , so how do I know if they better become an entity themselves?
Any thoughts?
You would create new entities as dictated by the rules of normalization. There is no particular reason to make an additional (separate) table for comments because you already have one. Who made the comment and which program the comment applied to are full-fledged attributes of a comment. The foreign keys representing these relationships (which are many-to-one, from the perspective of the comment table) belong right where you've put them.
The tables you've proposed are in third normal form which is acceptable according to best practices. I would add that you seem to be tracking data on a transactional basis (i.e. recording events as and when they occur). That is a good practice too because you can always figure out whatever you want to based on detailed information.
Calculating number of downloads or number of comments is a simple matter of using SQL Aggregate Functions with filters on the foreign key(s) that apply to your query - e.g. where pid=1234 etc.
I would do an entity for Downloads with its own id. You could have download status, you may have multiple download of the same program for one user. you may need to associate your download to an order or something else,..

Are user-defined SQL datatypes used much? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
My DBA told me to use a user-defined SQL datatype to represent addresses, and then use a single column of that new type in our users table instead of multiple address columns. I've never done this before and am wondering if this is a common approach.
Also, what's the best place to get information about this - is it product-specific?
As far as I can tell, at least in the SQL Server world, UDT aren't used very much.
Trouble with UDT is the fact you can't easily update them. Once created and used in databases, they're almost like set in stone.
There's no "CREATE OR ALTER (UDT)" command :-( So to change something, you have to do a lot of shuffling around - possibly copying away existing data, then dropping lots of columns from other tables, then dropping your UDT, re-creating it with the new structure and reapplying the data and everything.
That's just too much hassle - and you know : there will be change!
Right now, in SQL Server land, UDT are just a nice idea - but really badly implemented. I wouldn't recommend using them extensively.
Marc
There are a number of other questions on SO about how to represent addresses in a database. AFAICR, none of them suggest a user-defined type for the purpose. I would not regard it as a common approach; that is not to say it is not a reasonable approach. The main difficulties lie in deciding what methods to provide to manipulate the address data - those used for formatting the data to appear on an envelope, or in specific places on a printed form, or to update fields, worrying about the many ramifications of international addresses, and so on.
Defining user-defined types is very product specific. The ways you do it in Informix are different from the ways it is done in DB2 and Oracle, for example.
I would also rather avoid using User defined datatypes as their defination and usability will make your code dependant on a particular database.
Instead if you are using any object oriented language, create a composition relationship to define addresses for an employee (for example) and store the addresses in a separate table.
Eg. Employees table and Employee_Addresses table. One employee can have multiple addresses.
user-defined SQL datatype to represent addresses
User-defined types can be quite useful, but a mailing address doesn't jump out as one of those cases (to me, at least). What is a mailing address to you? Is it something you print on an envelope to mail someone? If so, text is about as good as it's going to get. If you need to know what state someone is in for legal reasons, store that separately and it's not a problem.
Other posts here have criticized UDTs, but I think they do have some amazing uses. PostgreSQL has had full text search as a plugin based on UDTs for a long time before full-text search was actually integrated into the core product. Right now PostGIS is a very successful GIS product that is entirely a plugin based on UDTs (it has GPL license, so will never be integrated into core).

A beginner's guide to SQL database design [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Do you know a good source to learn how to design SQL solutions?
Beyond the basic language syntax, I'm looking for something to help me understand:
What tables to build and how to link them
How to design for different scales (small client APP to a huge distributed website)
How to write effective / efficient / elegant SQL queries
I started with this book: Relational Database Design Clearly Explained (The Morgan Kaufmann Series in Data Management Systems) (Paperback) by Jan L. Harrington and found it very clear and helpful
and as you get up to speed this one was good too Database Systems: A Practical Approach to Design, Implementation and Management (International Computer Science Series) (Paperback)
I think SQL and database design are different (but complementary) skills.
I started out with this article
http://en.tekstenuitleg.net/articles/software/database-design-tutorial/intro.html
It's pretty concise compared to reading an entire book and it explains the basics of database design (normalization, types of relationships) very well.
Experience counts for a lot, but in terms of table design you can learn a lot from how ORMs like Hibernate and Grails operate to see why they do things. In addition:
Keep different types of data separate - don't store addresses in your order table, link to an address in a separate addresses table, for example.
I personally like having an integer or long surrogate key on each table (that holds data, not those that link different tables together, e,g., m:n relationships) that is the primary key.
I also like having a created and modified timestamp column.
Ensure that every column that you do "where column = val" in any query has an index. Maybe not the most perfect index in the world for the data type, but at least an index.
Set up your foreign keys. Also set up ON DELETE and ON MODIFY rules where relevant, to either cascade or set null, depending on your object structure (so you only need to delete once at the 'head' of your object tree, and all that object's sub-objects get removed automatically).
If you want to modularise your code, you might want to modularise your DB schema - e.g., this is the "customers" area, this is the "orders" area, and this is the "products" area, and use join/link tables between them, even if they're 1:n relations, and maybe duplicate the important information (i.e., duplicate the product name, code, price into your order_details table). Read up on normalisation.
Someone else will recommend exactly the opposite for some or all of the above :p - never one true way to do some things eh!
I really liked this article..
http://www.codeproject.com/Articles/359654/important-database-designing-rules-which-I-fo
Head First SQL is a great introduction.
These are questions which, in my opionion, requires different knowledge from different domains.
You just can't know in advance "which" tables to build, you have to know the problem you have to solve and design the schema accordingly;
This is a mix of database design decision and your database vendor custom capabilities (ie. you should check the documentation of your (r)dbms and eventually learn some "tips & tricks" for scaling), also the configuration of your dbms is crucial for scaling (replication, data partitioning and so on);
again, almost every rdbms comes with a particular "dialect" of the SQL language, so if you want efficient queries you have to learn that particular dialect --btw. much probably write elegant query which are also efficient is a big deal: elegance and efficiency are frequently conflicting goals--
That said, maybe you want to read some books, personally I've used this book in my datbase university course (and found a decent one, but I've not read other books in this field, so my advice is to check out for some good books in database design).
It's been a while since I read it (so, I'm not sure how much of it is still relevant), but my recollection is that Joe Celko's SQL for Smarties book provides a lot of info on writing elegant, effective, and efficient queries.