Is a one column table good design? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
It it ok to have a table with just one column? I know it isn't technically illegal, but is it considered poor design?
EDIT:
Here are a few examples:
You have a table with the 50 valid US state codes, but you have no need to store the verbose state names.
An email blacklist.
Someone mentioned adding a key field. The way I see it, this single column WOULD be the primary key.

In terms of relational algebra this would be a unary relation, meaning "this thing exists"
Yes, it's fine to have a table defining such a relation: for instance, to define a domain.
The values of such a table should be natural primary keys of course.
A lookup table of prime numbers is what comes to my mind first.

Yes, it's certainly good design to design a table in such a way as to make it most efficient. "Bad RDBMS Design" is usually centered around inefficiency.
However, I have found that most cases of single column design could benefit from an additional column. For example, State Codes can typically have the Full State name spelled out in a second column. Or a blacklist can have notes associated. But, if your design really does not need that information, then it's perfectly ok to have the single column.

I've used them in the past. One client of mine wanted to auto block anyone trying to sign up with a phone number in this big list he had so it was just one big blacklist.

If there is a valid need for it, then I don't see a problem. Maybe you just want a list of possibilities to display for some reason and you want to be able to dynamically change it, but have no need to link it to another table.

One case that I found sometimes is something like this:
Table countries_id, contains only one column with numeric ID for each country.
Table countries_description, contains the column with country ID, a column With language ID and a column with the localized country name.
Table company_factories, contains information for each factory of the company, including the country in Wich is located.
So to maintain data coherence and language independent data in the tables the database uses this schema with tables with only one column to allow foreign keys without language dependencies.
In this case I think the existence of one column tables are justified.
Edited in response to the comment by: Quassnoi
(source: ggpht.com)
In this schema I can define a foreign key in the table company_factories that does not require me to include Language column on the table, but if I don't have the table countries_id, I must include Language column on the table to define the foreign key.

There would be rare cases where a single-column table makes sense. I did one database where the list of valid language codes was a single-column table used as a foreign key. There was no point in having a different key, since the code itself was the key. And there was no fixed description since the language code descriptions would vary by language for some contexts.
In general, any case where you need an authoritative list of values that do not have any additional attributes is a good candidate for a one-column table.

I use single-column tables all the time -- depending, of course, on whether the app design already uses a database. Once I've endured the design overhead of establishing a database connection, I put all mutable data into tables where possible.
I can think of two uses of single-column tables OTMH:
1) Data item exists. Often used in dropdown lists. Also used for simple legitimacy tests.
Eg. two-letter U.S. state abbreviations; Zip codes that we ship to; words legal in Scrabble; etc.
2) Sparse binary attribute, ie., in a large table, a binary attribute that will be true for only a very few records. Instead of adding a new boolean column, I might create a separate table containing the keys of the records for which the attribute is true.
Eg. employees that have a terminal disease; banks with a 360-day year (most use 365); etc.
-Al.

Mostly I've seen this in lookup type tables such as the state table you described. However, if you do this be sure to set the column as the primary key to force uniqueness. If you can't set this value as unique, then you shouldn't be using one column.

No problem as long as it contains unique values.

I would say in general, yes. Not sure why you need just one column. There are some exceptions to this that I have seen used effectively. It depends on what you're trying to achieve.
They are not really good design when you're thinking of the schema of the database, but really should only be used as utility tables.
I've seen numbers tables used effectively in the past.

The purpose of a database is to relate pieces of information to each other. How can you do that when there is no data to relate to?
Maybe this is some kind of compilation table (i.e. FirstName + LastName + Birthdate), though I'm still not sure why you would want to do that.
EDIT: I could see using this kind of table for a simple list of some kind. Is that what you are using it for?

Yes as long as the field is the primary key as you said it would be. The reason is because if you insert duplicate data those rows will be readonly. If you try to delete one of the rows that are duplicated. it will not work because the server will not know which row to delete.

The only use case I can conceive of is a table of words perhaps for a word game. You access the table just to verify that a string is a word: select word from words where word = ?. But there are far better data structures for holding a list of words than a relational database.
Otherwise, data in a database is usually placed in a database to take advantage of the relationships between various attributes of the data. If your data has no attributes beyond its value how will these relationship be developed?
So, while not illegal, in general you probably should not have a table with just one column.

All my tables have at least four tech fields, serial primary key, creation and modification timestamps, and soft delete boolean. In any blacklist, you will also want to know who did add the entry. So for me, answer is no, a table with only one column would not make sense except when prototyping something.

Yes that is perfectly fine. but an ID field couldn't hurt it right?

Related

SQL Server database design with foreign keys

I have the following partial database design:
All the tables are dependent on each other so the table bvd_docflow_subdocuments is dependent on the table bdd_docflow_subsets
and the table bvd_docflow_subdocuments is dependent on bvd_docflow_subsets. So I thought I could me smart and use foreign keys on every table (and ON DELETE CASCADE). However the FK are being drilldown how further I go in to the tables.
The problem is the table bvd_docflow_documents has no point having a reference to the 1docflow_documentset_id` PK / FK. Is there a way (and maybe my design is crappy) that only the table standing above it has an FK relationship between the tables and not all the tables above it.
Edit:
More explanation:
In the bvd_docflow_subsets table information is stored about objects to create documents. There is an relation between that table and bvd_docflow_subdocuments table (This table stores master data about all the documents for an subset. (docflow_subset_id is in both tables). This is the link between those to tables.
Going further down we also got the table bvd_docflow_documents this table contains the actual document data. The link between bvd_docflow_documents and bvd_docflow_subdocuments is bvd_docflow_subdocument_id.
On every table I got an foreign key defined so when data is removed on a table all the data linked to that data is also removed.
However when we look to the bvd_docflow_documents table it has all the foreign keys from the other tables (docflow_subset_id and docflow_documentset_id) and there is the problem. The only foreign key needed for that bvd_docflow_documents table is docflow_subdocument_id and no other.
Edit 2
I have changed my design further and removed information that I don't need after initial import of the data.
See the following link for the (total) databse design:
https://sqldbm.com/Project/SQLServer/Share/_AUedvNutCEV2DGLJleUWA
The tables subsets, subdocuments and documents have a many to many relationship so I thought a table in between those 3 documents_subdocuments is the way to go were I define all the different keys for those tables.
I am not used to the database design first and then build it. But, for everything there is a first time, and I try to do make a database that is using standards and is using the power of SQL Server the correct way.
I'll address the bottom-most table and ignore the rest for the most part.
But first some comments. Your schema is simply a model of a system. To provide feedback, one must understand this "system" and how it actually works to evaluate your model. In addition, it is important to understand your entities and your reasons for choosing them and modelling them in the specified manner. Without that understanding all of this guessing based on experience.
And another comment. Slapping an identity column into every table is just lazy modelling IMO. Others will disagree, but you need to also enforce all natural keys. Do you have natural keys? It is rare not to have any. Enforce those that do exist.
And one last comment. Stop the ridiculous pattern of prepending the column names with the table names. And you should really think long and hard about using very long table names. Given what you have, I sense you need a schema for your docflow stuff.
For the documents table, your current PK makes no sense. Again, you've slapped an identity column into the table. By itself, this column is a key for the table. The inclusion of any other columns does not make the key any more "unique" - that inclusion is logical nonsense. Following your pattern, you would designate the identity column as the primary key. But ...
According to your image, the documents table is related to one and only one subdocument. You added a foreign key to that table - which matches the image. You also added additional columns and foreign keys to the "higher" tables. So now a document "points" to a specific subdocument. It also points to a specific subset - which may have no relationship to the subdocument. The same thought applies to the other FK. I have a doubt that this is logically correct. So why do these columns (and related FKs) exist? Perhaps this is the result of premature optimization - which everyone knows is the root of all evil coding. Again, it is impossible to know if this is "right" or even "useful" for your model.
To answer your question "... is there a way", the answer is obviously yes. You remove the columns of which you complain. You added them - Why? Is this perhaps a problem with the tool you are using?
And some last comments. There is nothing special about "varchar(50)". Perhaps this is a place holder that will be updated later. It may also be another sign of laziness. And generally speaking, columns with names like "type" and "code" tend to be foreign keys to "lookup" tables - because people like to add, modify, or remove these sorts categorization values over time. I'm also concerned about the column name overlap among the tables. "Location" exists in multiple tables, as do action_code and action_id. And a column named "id" (action_id) suggests a lookup to another table - is it? Should it be? Is there a relationship between action_id and action_code? From a distance it is impossible to answer any of these questions.
But designing a database is more art than science. Sometimes you just need to create something, populate it with some sample data, and then determine if it works for your needs. Everyone will get something wrong in the first try. That is expected; that is how you learn. The most difficult part is actually completing your first attempt.

Table architecture best practice

I have a table where I am storing configurations for a tool I have. It has a ConfigID which is just an identity field, customer name, application name, then it has 18 well known fields (wellknownfield1,wellknownfield2,...,wellknownfield18) that I know what to put in based off another table values.
Now my problem comes in. I also need custom values. Currently I have a dumb solution of having customfieldname1, customfieldvalue1,...,customfieldname20, customfieldvalue20). Where the values have all the random values I need delimited by pipes. I am using a SQL Server Database. Anyone have any suggestions? Please comment if anything is unclear.
Strictly speaking, you should not put groups of values in a column. It violates the first normal form of relational data. Create a separate table called Custom Data (Config_ID, CUSTOM_NAME, CUSTOM_DATA_VALUE, CUSTOM_DATA_TYPE) and store the custom values in it.
Use another table with a foreign key. Save there all the customfieldname values you need to save. Use the ConfigID as foreign key to reference the ConfigID on main table that has the extra custom value.
There is a standard way to lay out database tables to make them easy to manage - called normalization. There are different levels of normalization - first normal form, second normal form, third normal form...and higher (above third normal form are somewhat esoteric, in my opinion).
Explanations of these definitions here :
Normalization in plain English
What are 1NF, 2NF and 3NF in database design?
It can seem quite abstract - but the point is to get rid of any ambiguity
or duplication in your database, and prevent problems further down the line.
As srini.venigalla points out - your table doesn't meet the criteria for first normal form - that every row should have the same number of data, one per DB column. Again, it might seem an abstract rule - but it's there to prevent real world problems - like, how do I parse this column value? How do I know what the separator is? What if it doesn't have enough data points? What if there are extra columns, and what are their names? All of these problems go away if you stick to one value per column.
The same is true for second normal form and third normal form - they disallow repeated values / redudancy in your database, which prevents real world problems of getting your DB in an inconsistent state.
There is debate / trade-offs about how far to normalize your database - but making everything meet third normal form seems to be an acceptable rule of thumb for a beginner.
(this is my conclusion after having to write code workarounds for my own non-1NF and non-2NF database schema)

Database Table Design for storing yes, no, and quantity type responses to questions

I have a table that stores answers to checklist questions, where the checklists are in the format of yes, no, not applicable, or resolved.
Table: CHECKLIST_ANSWER
ATTRIBUTE_ID PK, FK
CHECKLIST_INSTANCE_ID PK, FK
TOGGLE_VALUE (1=yes, 2=No, 3=n/a, 4=was a no then it was resolved)
FAIL_REASON
ATTRIBUTE_ID is a foreign key to a table of questions, i.e. Was the part measured within some tolerance?
Now I want to model a checklist that would store quantity responses, i.e. How many incorrect dimensions were found on the drawing?
I feel confident that I can store these questions in my same table as the yes/no/na type attributes, but can I utilize the the same table to store the quantity value? Should I add a new column say QUANTITY_VALUE ? And then either QUANTITY_VALUE or TOGGLE_VALUE would be null depending on the attribute.
Table: CHECKLIST_ANSWER
ATTRIBUTE_ID PK, FK
CHECKLIST_INSTANCE_ID PK, FK
TOGGLE_VALUE (1=yes, 2=No, 3=n/a, 4=was a no then it was resolved)
QUANTITY_VALUE
FAIL_REASON
The goal of this database application is to move paper and excel checklists online and capture into Oracle to give provide more efficient collection of metrics and then better aggreagation of the inputs. Am I asking for trouble down the road by blending two into one table? Or should I create a table, CHECKLIST_QTY_ANSWER
If you have many options, you usually create a seperate table, only with an id and description (or name). To Connect these two tables, you insert a field into the CHECKLIST_ANSWER-Table, and define it as a foreign key, which references to the id (primary key) of the new table, I have mentioned first.
Hope it is clear :)
I don't see any problem with adding the new column to your existing table. I would include a check constraint that required that either TOGGLE_VALUE or QUANTITY_VALUE be null (but not both).
There's no good reason to create a second, nearly identical table, where only a single column varies. In my experience, that tends to lead to more problems than the single-table solution (it's practically an invitation to use dynamic SQL).
I definitely would not re-use the existing column (as suggested in another answer), as that would prevent the use of a foreign key on the toggle value.
If I understand your question correctly you're looking for advice on how to store the new type of answers in your schema?
Since this is a new type of answer you'd need to denote that the format of the data is now different from your y/n/na answer type. You could do this by adding another table CheckListAnswerType and a FK in your CHECKLIST_ANSWER table.
However, your CHECKLIST_INSTANCE_ID could easily indicate that this is a type of checklist that follows a certain answer pattern. I'm not sure about the rest of your schema buy you could have a CHECKLIST_INSTANCE table that specifies it's answer type...
Your TOGGLE_VALUE could follow a numeric scheme for your new answer types and with the a fore mentioned CheckListAnswerType you could and would have to always take this into account when querying the data to make sure you weren't picking the wrong answer type given the question context so that you didn't get a Yes value while looking for your How many incorrect dimensions were found on the drawing? answer.
I would think all of that would be fine, UNTIL you start wanting to store answers of a different data-type. Then it would be time to redesign schema.
TL;DR: If you're using the same data-type for answers then you would be okay re-using the existing schema (column) while adding a way to tell the answer, or question/answer, types apart to query accurately. If you want to store other data-types in TOGGLE_VALUE, implement new schema objects to do so. Don't try and force other data-types into the current schema if you can avoid it. Also if you did this consider renaming TOGGLE_VALUE as it no longer represents a Toggle. answerValue might better fit the new design.

Best practice for a "comment" table in a relational database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Assume you want to build a database for some web application. This database already contains many tables and you might have to extend it in the future.
Also, you want the end user to be able to comment any kind of object in the database.
I would like to find a solution for this would be generic enough so that I don't have to extend it each time I add a new table in the database.
I thought of the following:
Table Name: comment
columns:
id : the id of a comment
user_id : the id of the user making the comment
object_table_name : the table where the commented object is
object_id : the id of the commented object in the object_table_name table.
text : the text
date : the date
This table sort of solve my problem, the only thing that troubles me is that the relational aspect of it is rather weak (I can't make object_id a foreign key for instance).
Also if some day I need to rename a table I will have to change all the concerned entries in the comment table.
What do you think of this solution ? Is there design pattern that would help me out ?
Thanks.-
Isn't that cleaner ?
table comment_set
id
table comment
id
comment_set_id -> foreign key to comment_set
user_id
date
text
existing table foo
...
comment_set_id -> foreign key to comment_set
existing table bar
...
comment_set_id -> foreign key to comment_set
You are mixing data and metadata, which is not the best design pattern. They should be separated.
However, since the comments don't seem to be very important anyway, you solution is OK. The worst thing you can end up with is to lose comments on your objects.
Some databases, most notably, PostgreSQL, support COMMENT clause just for the cases like this.
Update:
If you want to comment on individual records in each table, it's OK to have such a table.
object_table_name does not have to change if you rename a table, since it's data, not metadata.
You cannot write a native SQL query that will fetch comments for the record of any table (not known by the moment of the query development), though you can build the dynamic queries to do that.
In this case, you will have to keep your data and metadata in sync (UPDATE the comment table when you RENAME the table they refer to). The first one is a DML statement (changes data), the second one is DDL (changes metadata).
Also make sure that all PRIMARY KEYs have the same types (same as object_id).
Read about EAV.
You can make your whole database like that. But then it will be hell working with that data.
Why don't you want to place a Comment attribute for each database entity which should support comments? This way you can get all the data you need in a single query, and many GUI programs for databases will provide you with full code completion in SQL, which will prevent errors that can easily occur when operating with strings. That way the code is heavily dependent on procedural code, which isn't right for database systems.
You can enumerate the table names in a separate table, so that changes in names do not affect the system in any major way. Just update the enumeration table.
Although you are distancing your self from referential integrity, i can see another way to accomplish what you want.
I generally prefer to keep the comments with the rows to which they apply. Assuming your database efficiently stores empty VARCHAR fields, you shouldn't pay a penalty for this. There isn't really anything to "extend" when you implement this approach, the maintenance of the comment becomes part of the queries you are already using to update the rows.
The only advantage to the single-note-table approach is that it allows easy searches across notes for different kinds of database entries.
Assuming MS SQL, and if the volume is relatively small, as you seem to suggest, then Extended Properties might be worth exploring. I've used them sucessfully in the past and they seem to be a permanent fixture.

Normalization Help

I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.