I'm involved on a project to make a survey system. We've been hammering out the logic for a few question types, and I could use a second opinion on what is the best way to proceed. We work on a ASP.NET 2.0 website using VB(VS2005) with an Oracle database.
In our oracle server, we plan for some tables to organize our data. There's a table for our surveys, one for questions (keys determine which survey it goes on), one for answers (again, keys determine what question it belongs to), and one for answer collection. Most questions only return one response, and that's pretty easy to figure out. However, when we start thinking about items that return multiple answers, it starts to get tricky.
For example, if we have a simple matrix of 3x3 filled with check boxes. The rows are days 'Monday', 'Wednesday', 'Friday'. The columns are activities like 'Biking', 'Running', 'Driving'. The user checks each one they did for a given day, thus each row can have more than one response. Another one we want to think about is what if instead of checkboxes, we have textboxes where users write in a value of how many minutes they spent on an activity.
So far, for collecting responses, I like the idea of traversing a list of controls in the form and keeping tabs on the kinds of controls that collect data. Since the controls are created in code, usually they're given an ID of a string with a number affixed to the end to keep track of what question type it is, and what number it is.
Question #1:
Should the data returned from the user be in a single database entry with delimiters to separate each answer, or should each answer get it's own entry?
Question #2:
What's the best way to identify what response goes with what answer (on the survey)?
assuming that space and speed are unlikely to be serious limiting issues in this system, i suggest that you keep your data normalized
so the answer to question 1 is: each answer gets its own entry; with a sequence number if necessary
and the answer to question 2 is: by the foreign keys
Question #1: Should the data returned from the user be in a single database entry with delimiters to separate each answer, or should each answer get it's own entry?
I dont think so, i think you need to maintain a simple cross reference table with the question Id and the answer (either a key if its multiple choice, or text if you allow it).
If you are talking about an answer grid, then your cross reference table could have one more column, with the id of the "category".
Related
(NB. The question is not a duplicate for this, since I am dealing with an ORM system)
I have a table in my database to store all Contacts information. Some of the columns for each contact is fixed (e.g. Id, InsertDate and UpdateDate). In my program I would like to give user the option to add or remove properties for each contact.
Now there are of course two alternatives here:
First is to save it all in one table and add and remove entire columns when user needs to;
Create a key-value table to save each property alongside its type and connect the record to user's id.
These alternatives are both doable. But I am wondering which one is better in terms of speed? In the program it will be a very common thing for the user to view the entire Contact list to check for updates. Plus, I am using an ORM framework (Microsoft's Entity Framework) to deal with database queries. So if the user is to add and remove columns from a table all the time, it will be a difficult task to map them to my program. But again, if alternative (1) is a significantly better option than (2), then I can reconsider the key-value option.
I have actually done both of these.
Example #1
Large, wide table with columns of data holding names, phone, address and lots of small integer values of information that tracked details of the clients.
Example #2
Many different tables separating out all of the Character Varying data fields, the small integer values etc.
Example #1 was a lot faster to code for but in terms of performance, it got pretty slow once the table filled with records. 5000 wasn't a problem. When it reached 50,000 there was a noticeable performance degradation.
Example #2 was built later in my coding experience and was built to resolve the issues found in Example #1. While it took more to get the records I was after (LEFT JOIN this and UNION that) it was MUCH faster as you could ultimately pick and choose EXACTLY what the client was after without having to search a massive wide table full of data that was not all being requested.
I would recommend Example #2 to fit your #2 in the question.
And your USER specified columns for their data set could be stored in a table just to their own (depending on how many you have I suppose) which would allow you to draw on the table specific to that USER, which would also give you unlimited ability to remove and add columns to suit that particular setup.
You could then also have another table which kept track of the custom columns in the custom column table, which would give you the ability to "recover" columns later, as in "Do you want to add this to your current column choices or to one of these columns you have deleted in the past".
I need to create a table to store a user’s responses to a question and they can have up to 12 responses, what table structure would work best. I have created 2 options but if you have a better Idea I am open for suggestions.
Table 1 (Store each answer in a new row)
UserId
QuestionId
Answer Number
Answer
Table 2(Store all answers in one row)
UserId
QuestionId
Answer 1
Answer2
Answer3
Answer4
Answer5
Answer6
Answer7
Answer8
Answer9
Answer10
Answer11
Answer12
giving each answer its own row would better. so i would recommend going with your idea for table 1. that way if you want to up the limit from 12 to say 20 you do not need to add a new column and you can count responses easier.
You don't want redundancy and unnecessary/unused columns. From proper db design, you should definitely go with option one. This is a more normalized, and will add value if you decide to scale it any time later.
I'd recommend neither design.
All answers in one row breaks first normal form.
I'd have a Question table, a User table, and an Answer table. A User could be given many Questions; there's one Answer per Question.
The answer is option 2 will perform better, because you only need one I/O operation to retrieve all answers. I once built a data warehouse with a similar "wide" design, and it performed amazingly well.
...but typically, performance shouldn't be the only consideration.
From a database design point of view, it's better to use one row per answer.
This is because:
adding columns (to cater for more answers) involves a schema change (much harder), but adding rows does not
rows are scaleable (what if someone had 1000 answers - are you going to 1000 columns?)
queries are easier - you must actually name each answer if stored in columns, but with rows you name only the answer column and use SQL to pull everything together
Unless raw speed is your stand out goal, prefer option 2 (more rows) over option 1 (more columns).
From a true performance perspective it depends (from a good database design perspective it's a no brainer, multiple rows is the way to go).
If all your answers fit within a single page and you're seeking that row using a clustered index it is probably going to be slightly faster with solution 2. Your tree would have less leaves making the search of a smaller dataset. You also avoid the Cartesian that comes with a join.
Solution 1 will be a little faster if you have page splits. As long as the join column is indexed of course.
Though the in the end minor performance increase you could get with option 1 over option 2 would probably be insignificant compared to the maintenance costs of bad design.
You should definitely store the answers as separate records.
If you store the answers in one record, you will have data (the answer number) in the field names, so that breaks the first normal form. This is a sign of a really bad database design.
With the answers in separate records it's easier to access the data. Consider for example that you want to get the last answer for each question and user. This is very easy if you have the answers as separate records, but very complicated if you have them in a single record.
The first option would need to store the user-id multiple times too.
I would go for the second option, especially if you can put a hard limit on it such as 12.
This also requires only a single write operation for the database.
What are these 12 things ... months?
This question already has answers here:
Why is SELECT * considered harmful?
(16 answers)
Closed 9 years ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Original close reason(s) were not resolved
I know generally it is always better performance to build mysql queries to name every item you need but for example on a profile page I might need every item except a few.
SELECT user_name,f_name,l_name,country,usa_state,other_state,zip_code,city,gender,birth_date,date_created,date_last_visit,
user_role,photo_url,user_status,friend_count,comment_count,forum_post_count,referral_count,referral_count_total,
setting_public_profile,setting_online,profile_purpose,profile_height,profile_body_type,profile_ethnicity, profile_occupation,profile_marital_status,profile_sex_orientation,profile_home_town,profile_religion,
profile_smoker,profile_drinker,profile_kids,profile_education,profile_income,profile_headline,profile_about_me,
profile_like_to_meet,profile_interest,profile_music,profile_television,profile_books,profile_heroes,profile_here_for,profile_counter FROM users WHERE user_id=1 AND user_role >
So without doing a bunch of test, maybe someone with more experience can chime in with some advice?
Would this be worse
SELECT * FROM users WHERE user_id=1 AND user_role >
I prefer to list all items because then on that page it is just easiar to see what I have available to me if I need something from the DB but if it would be faster then I would not list them
Note: naming all fields is of course a best practice, but in this post I will discuss only performance benefits, not design or maintenance ones.
The * syntax can be slower for the following reasons:
Not all fields are indexed and the query uses full table scan. Probably not your case: it's hardly possible that all fields you return are indexed with a single index.
Returning trailing fields from a table that contains variable length columns can result in a slight searching overhead: to return 20th field, previous 19 should be examined and offsets calculated.
Just more data need to be returned (passed over the connection).
Since you need almost all fields, the last reason is probably the most important one. Say, the description TEXT field can be only 1 of 50 fields not used on the page, but can occupy 10 times as much space as all other fields together.
In this case it will be of course better to name all fields and omit the long fields you don't need.
When considering using *, you should always consider the possibility that more fields will be added to the table later.
If it's a lot more fields, you could end up retrieving and returning more data than you need.
You might have a problem with some of the new fields. For example, if you just loop through the fields and display them, you might have new fields you do not want to display. Or the data type might need some formatting first.
There is also a chance that a field will be removed from the table, for example in normalizing the table. Code that expects a particular field could break in that case.
You should always specify the columns you need, unless your programming language supports associative lists/arrays, so that names can be retreived by name.
If you need to retreive it by index number, then using * could pose a huge problem later if you insert a new column anywhere in the table, as all the indices from that point will increase by one...
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
What is the point (if any) in having a table in a database with only one row?
Note: I'm not talking about the possibility of having only one row in a table, but when a developer deliberately makes a table that is intended to always have exactly one row.
Edit:
The sales tax example is a good one.
I've just observed in some code I'm reviewing three different tables that contain three different kinds of certificates (a la SSL), each having exactly one row. I don't understand why this isn't made into one large table; I assume I'm missing something.
I've seen something like this when a developer was asked to create a configuration table to store name-value pairs of data that needs to persist without being changed often. He ended up creating a one-row table with a column for each configuration variable. I wouldn't say it's a good idea, but I can certainly see why the developer did it given his instructions. Needless to say it didn't pass review.
I've just observed in some code I'm reviewing three different tables that contain three different kinds of certificates (a la SSL), each having exactly one row. I don't understand why this isn't made into one row; I assume I'm missing something.
This doesn't sound like good design, unless there are some important details you don't know about. If there are three pieces of information that have the same constraints, the same use and the same structure, they should be stored in the same table, 99% of the time. That's a big part of what tables are for fundamentally.
For some things you only need one row - typically system configuration data. For example, "current sales tax rate". This might change in the future and so shouldn't be hardcoded, but you'll typically only ever need one at any given time. This kind of data needs to be in the database so that queries can use it in computations.
It's not necessarily a bad idea.
What if you had some global state (say, a boolean) that you wanted to store somewhere? And you wanted your stored procedures to easily access this state?
You could create a table with a primary key whose value range was limited to exactly one value.
Single row is like a singleton class. purpose: to control or manage some other process.
Single row table could act as a critical section or as deterministic automaton (kind of dispatcher based on row values)
Single row is use full in a table COMPANY_DESCRIPTION, to obtain consistent data about that company. Use full on company letters and addressing.
Single row is use full to contain an actual value like VAT or Date or Time, and so on.
It can be useful sometime to emulate some features the Database system doesn't provide. I'm thinking of sequences in MySQL for instance.
If your database is your application, then it probably makes sense for storing configuration data that might be required by stored procedures implementing business logic.
If you have an application that could use the file system to store information, then I don't think there is an advantage to using the database over an XML or flat file, except maybe that most developers are now far more well versed in using SQL to store and retrieve data than accessing the file system.
What is the point (if any) in having a table in a database with only one row?
A relational database stores things as relations: a tuples of data satisfying some relation.
Like, this one: "a VAT of this many percent is in effect in my country now".
If only one tuple satisifies this relation, then yes, it will be the only one in the table.
SQL cannot store variables: it can store a set consisting of 1 element, this is a one-row table.
Also, SQL is a set based language, and for some operations you need a fake set of only one row, like, to select a constant expression.
You cannot just SELECT out of nothing in Oracle, you need a FROM clause.
Oracle has a pseudotable, dual, which contains only one row and only one column.
Once, long time ago, it used to have two rows (hence the name dual), but lost its second row somewhere on its way to version 7.
MySQL has this pseudotable too, but MySQL is able to do selects without FROM clause. Still, it's useful when you need an empty rowset: SELECT 1 FROM dual WHERE NULL
I've just observed in some code I'm reviewing three different tables that contain three different kinds of certificates (a la SSL), each having exactly one row. I don't understand why this isn't made into one large table; I assume I'm missing something.
It may be a kind of "have it all or lose" scenario, when all three certificates are needed at once:
SELECT *
FROM ssl1
CROSS JOIN
ssl2
CROSS JOIN
ssl3
If any if the certificates is missing, the whole query returns nothing.
A table with a single row can be used to store application level settings that are shared across all database users. 'Maximum Allowed Users' for example.
Funny... I asked myself the same question. If you just want to store some simple value and your ONLY method of storage is an SQL server, that's pretty much what you have to do. If I have to do this, I usually end up creating a table with several columns and one row. I've seen a couple commercial products do this as well.
We have used a single-row table in the past (not often). In our case, this table was used to store system-wide configuration values that were updatable via a web interface. We could have gone the route of a simple name/value table, but the end client preferred a single row. I personally would have preferred the latter, but it really is up to preference, especially if this table will never have any sort of relationship with another table.
I really cannot figure out why this would be the best solution. It seams more efficient to just have some kind of config file that will contain the data that would be in the tables one row. The cost of connecting to the database and querying the one row would be more costly. However if this is going to be some kind of config for the database logic. Then this would make a little bit more sense depending on the type of database you are using.
I use the totally awesome rails-settings plugin for this http://github.com/Squeegy/rails-settings/tree/master
It's really easy to set up and provides a nice syntax:
Settings.admin_password = 'supersecret'
Settings.date_format = '%m %d, %Y'
Settings.cocktails = ['Martini', 'Screwdriver', 'White Russian']
Settings.foo = 123
Want a list of all the settings?
Settings.all # returns {'admin_password' => 'super_secret', 'date_format' => '%m %d, %Y'}
Set defaults for certain settings of your app. This will cause the defined settings to return with the Specified value even if they are not in the database. Make a new file in config/initializers/settings.rb with the following:
Settings.defaults[:some_setting] = 'footastic'
A use for this might be to store the current version of the database.
If one were storing database versions for schema changes it would need to reside within the database itself.
I currently analyse the schema and update accordingly but am thinking of moving to versioning. Unless someone has a better idea.
I use vb.net and sql express
Unless there are insert constraints on the table a timestamp for versioning then this sounds like a bad idea.
There was a table set up like this in a project I inherited. It was for configuration data, and the reason that was given was that it made for very simple queries:
SELECT WidgetSize FROM ConfigTable
SELECT FooLength FROM ConfigTable
Okay fine. We converted to a generalized configuration table:
ID Name IntValue StringValue TextValue
This has served our purposes well.
CREATE TABLE VERSION (VERSION_STRING VARCHAR2(20 BYTE))
?
I used a single datum in a SQLite database as a counter in a dynamic web page. That's the simplest way I can think of to make it thread-safe (or process-safe to be precise). But I am not sure whether it's a good idea.
I think the best way to deal with these scenarios is to, rather than using a database at all, use the configuration file (which is usually XML) or make your own configuration file that is read during start up of the application. It only takes a few minutes to write the code to read the file in.
The advantage here is that the there is no chance accidentally adding additional values for the same XML variable, and its great for testing because you don't need to write a lot of code to test the different inputs, just a simple change to the text value and re-run the application.
Every year our company holds a conference/stand where participants can show their products.
We have a web-application which let the participants sign up for the conference.
They can enter information such as the name of their company, billing information, and so on.
It seems as if the requirements for what information the participants need to enter, vary from year to year.
I.E , one year the participants might need to enter the size of the stand they want, the next year this is no longer needed, and so on.
One year, you might just have to enter a total number of m^2 you want, while the next year, you might need to add the length, height and number of floors you want.
Over they years, this has caused the DB schema to become quite crazy.
We now have a lot of 'obsolete' fields and tables in our database, and it's beginning to look quite messy.
For historical reasons, we can't just reset the schema back to basics for each year.
We might need some of the data from the old conferences.
So: Does anyone have a good idea on how we can deal with this ?
The only solutions I can think of are
Version our database for each conference i.e
Store all of the 'varying' information as xml
If anyone has some good litterature for how to handle evolving databases and dealing with obsolete data, it would be good !
much as I hate to say this, this might be case where the Entity-attribute-value structure would work best.
http://en.wikipedia.org/wiki/Entity-Attribute-Value_model
Note this is not a model to use lightly, there are significant problems with it. But this iexactly the kind of problem it is designed to solve.
I would consider using a name-value approach for all the extended data. Essentially you define your static data from year over year. This will be things like Company information, the definition of an Address for example doesn't change year after year. These will be modled normally.
Then you would define a table that will contain a master of all the questions you have, and will be linked somehow to tell you what year those questions are valid for. This table might also indicate other attributes about the question that could let you dynamically create a GUI on top of it. Things such as regular expressions to validate the type of data etc.
Here's a really naive approach which even after doing this would not be the end state of what I would model (I would probally have another table the correlates a year to a question, and this is what I would link the company too. this way we can reuse questions over and over).
"We now have a lot of 'obsolete' fields and tables in our database, and it's beginning to look quite messy. For historical reasons, we can't just reset the schema back to basics for each year. We might need some of the data from the old conferences."
If you might need them, they're not obsolete.
I would code the front-end generically however. This means having a system that can handle any form of stand area configuration (in the example you give), and maybe more in the future if that should occur.
If you have tables like "standarea" (area in m^2), "standsize" (length, width, height, etc) - then you would have objects in your model to match these (StandArea, StandSize) - these could both extend a common base class StandData.
One year one table gets data set, the next year another table gets the data. Your DAO will try to load each object from each table (by a parent, err, stand_uid field) and then set the StandData field in your "ConferenceApplication" object to whatever it discovered.
The other option is to just have all possible fields in a single table, and allow them to be empty.