What to use as ID and PK [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a small question. I want so save information about a product in a sql database. That product has a unique 12 part number.
The product is going to be linked several times in the database. What should I use as unique id? The part number? or should I use the auto incremented id value?
What is better performance wise, and what is better in general?

You should use a "meaningless" surrogate key (an auto-increment, a globally unique ID, etc.) on your table, even though you do have a unique identifier that has meaning in the real world (your 12-digit part number).
The main reason for this is that anything that has meaning in the real world is subject to change: part numbers change when companies merge, registration numbers change on renewal, etc. On top of that, there is always a possibility of mis-typing the number, and having to correct it later.
It is very easy to change an attribute that is not your primary key when this happens: it is a simple update of an attribute. Changing a primary key becomes very hard, though, because you may have foreign keys referencing it from other tables. This reason alone is sufficient to decide in favor of surrogate keys on your tables.

I would use automatically generated BigInteger/Integer (depending on your needs) column as PRIMARY KEY and store part number as UNIQUE (this would be a candidate key). You would benefit from space that is required to store this in tables that are part of relations and more values would probably fit within a single page. If, by any chance, you'd need to change product number, this scenario would be beneficial to do so.
You would, though, need an additional JOIN even if you only need the part number from a table which holds foreign key relationship to your product table. More often than not if you want to display part number you also want some additional information from product table itself, so that doesn't hurt so much (since databases are optimized in a way to quickly perform join operations).
Be sure to create an index on column which is a FOREIGN KEY so in this case something like id_product in all tables which hold the relation to speed up matching operations.

Your choice is between a natural and surrogate primary key. There are a bunch of trade-offs here.
The good news is that performance is unlikely to be one of them - as long as you can search using an index, a natural and surrogate key are likely to perform the same.
The data you store may have a unique attribute - SKU, social security number, etc. - and it may seem logical to use this when creating your schema. However, this opens you up to a whole range of edge cases.
What if the user enters the data incorrectly and wants to change it? Changing primary and foreign keys is a terrible idea.
What if the system that generates your natural keys has a bug and generates duplicates?
What if the system that generates your natural keys is retired and you get records with incompatible data formats? Re-building your schema to move from integers to strings for your primary/foreign keys is horrible.
What if you want to integrate with multiple systems generating data, and they disagree about data formats?
I've seen all these "what-ifs" happen.
I'd go for a surrogate key, using an auto-increment or GUID.

Related

Best practice to choose primary keys in a relational DB, what is the smartest solution? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am designing a relational DB and I have the following doubt about what is the best practice for handling primary keys
I have some tables where the only way to have a primary key is to set a BIGINT autoincrement column named id.
Other table contains univocal data (for example I have a Country table containing an univocal country_code column) that can be used as primary key.
My question is: what is the best practice in this case? I still use a BIGINT autoincrement column named id or in this case is better to use the univocal data as primary key?
Even if a table has a good natural key, it is still generally preferable to assign a surrogate key (usually a numeric auto increment column).
First, as jarlh points out, even countries can and do change their names from time to time, which you can handle easily with a CountryID value.
Also, though, many times a natural key is composed of character data. SQL deals with numbers faster than it deals with characters, so there is a performance boost using numeric ID values.
And it's currently the standard practice in data warehousing, so developers are accustomed to seeing those SK columns.
Best practice? Probably. Standard practice? Definitely. Go with the autoincrements.
There's an entire topic in computer science known as database normalization, where they discuss things like "first, second, and third normal forms."
A staple of this is that database keys should not, themselves, carry information. They should "unambiguously identify the row" and nothing more. So, an auto-increment integer is a good thing to use as a primary key. Then, place an index ... perhaps a UNIQUE index ... on country_code.
In other applications, I have used things like uuid's ... guaranteed-unique identifier strings ... as primary keys. The database generates the uuid value automagically. Now I have something that I can transport from one database to another without ambiguity. (I have also used auto-generated uuid fields in databases that used auto-increment keys.)
So, you do have several good alternatives, but in every case the primary-key only identifies the row and is not "part of the data."

Best practices on primary key, auto-increment, and UUID in SQL databases [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 months ago.
Improve this question
We're designing a table for user entity. The only non-trivial requirement is that there should be a permanent URL to the user entity (for example their profile). There's a lot about int/long vs UUID on the web. But it is still unclear to me.
Considering the fact that the profile contains private information, it's not a good idea to have a predictable ID embedded in the URL. Am I right?
To satisfy the first I can have primary key as UUID and embed it in the URL. But there's two question. Should I be worried about the performance penalty of having UUID as primary key in anyway; indexing, inserting, selecting, joining?
Having that said, which one of the following is better (with respect to the above)?
CREATE TABLE users(
pk UUID NOT NULL,
.....
PRIMARY KEY(pk)
);
or
CREATE TABLE users(
pk INT NOT NULL AUTO_INCREMENT,
id UUID NOT NULL,
.....
PRIMARY KEY(pk),
UNIQUE(id)
);
It's a matter of choice actually and this question can raise opinion based answers from my point of view. What I always do, even if it's redundant is I create primary key on auto increment column (I call it technical key) to keep it consistent within the database, allow for "primary key" to change in case something went wrong at design phase and also allow for less space to be consumed in case that key is being pointed to by foreign key constraint in any other table and also I make the candidate key unique and not null.
Technical key is something you don't normally show to end users, unless you decide to. This can be the same for other technical columns that you're keeping only at database level for any purpose you may need like modify date, create date, version, user who changed the record and more.
In this case I would go for your second option, but slightly modified:
CREATE TABLE users(
pk INT NOT NULL AUTO_INCREMENT,
id UUID NOT NULL,
.....
PRIMARY KEY(pk),
UNIQUE(id)
);
This question is quite opinion-based so here's mine.
My take is to use the second one, a separate UUID from the PK. The thing is:
The PK is unique and not exposed to the public.
The UUID is unique and may get exposed to the public.
If, for any reason, the UUID gets compromised, you'll need to change it. Changing a PK may be expensive and has a lot of side effects. If the UUID is separate from the PK, then its change (though not trivial) has far less consequences.
I came across a nice article that explains both pros and cons of using UUID as a primary key. In the end, it suggests using both but Incremental integer for PK and UUIDs for the outside world. Never expose your PK to the outside.
One solution used in several different contexts that has worked for me
is, in short, to use both. (Please note: not a good solution — see
note about response to original post below). Internally, let the
database manage data relationships with small, efficient, numeric
sequential keys, whether int or bigint. Then add a column populated
with a UUID (perhaps as a trigger on insert). Within the scope of the
database itself, relationships can be managed using the usual PKs and
FKs.
But when a reference to the data needs to be exposed to the
outside world, even when “outside” means another internal system, they
must rely only on the UUID. This way, if you ever do have to change
your internal primary keys, you can be sure it’s scoped only to one
database. (Note: this is just plain wrong, as Chris observed)
We used this strategy at a different company for customer data, just to avoid
the “guessable” problem. (Note: avoid is different than prevent, see
below).
In another case, we would generate a “slug” of text (e.g. in
blog posts like this one) that would make the URL a little more human
friendly. If we had a duplicate, we would just append a hashed value.
Even as a “secondary primary key”, using a naive use of UUIDs in
string form is wrong: use the built-in database mechanisms as values
are stored as 8-byte integers, I would expect.
Use integers because they are efficient. Use the database
implementation of UUIDs in addition for any external reference to
obfuscate.
https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439
Using UUID as pk: The first problem is, UUID takes 9x storage than int. 2nd problem is, if you need sorting by pk more frequently, don't even think about UUID. UUID as pk doesn't affect the time complexity for where condition or others except sort.
Using int as pk: Easily guessable. Brute force attacker will love this. this is the only problem but biggest one.
Using int as pk but, keeping UUID as well: If the UUID is not pk then the time complexity will be increased for searching by UUID. even though, all the relations will be maintained by int, but, when you will search by UUID, it will take time. As the relations are on int, the 9x storage issue is solved here.
Don’t make it your database primary key: this will cause problems in the future of you want to change your database technology. And if you make it an increasing number, your competitors will know how many users you have and how fast you are adding new ones.
The rule of thumb is to keep a clear separation between:
a business value (even some UUID being a representation)
and
a technical value (like a primary key)
For example, if you'd like to use a mapping to some record by its id, such mapping is a business value so to keep the above separation, you will need a dedicated field (like UUID) instead of a technical primary key.

Database design for many-to-many relations with restrictions

I have one database with users and one with questions. What I want is to ensure that every user can answer every question only once.
I thought of a database that has all the question id's as columns and all the user id's as records, but this gets very big (and slow I guess) when the questions and the user count grow.
Is there another way to do this with better performance?
You probably want a setup like this.
Questions table (QuestionID Primary Key, QuestionText)
Users table (UserID Primary Key, Username)
Answers table (QuestionID, UserID, Date) -- plus AnswerText/Score/Etc as needed.
In the Answers table the two first columns together form a compound primary key (QuestionID, UserID) and both are foreign keys to Question(QuestionID) and Users(UserID) respectively.
The compound primary key ensures that each combination of QuestionID/UserID is only allowed once. If you want to allow users to answer the same question multiple times you could extend the ¨compound primary key to include the date (it would then be a composite key).
This is a normalized design and should be efficient enough. It's common to use a surrogate primary key (like AnswerID) instead of the compound key and use a unique constraint instead to ensure uniqueness - the use of a surrogate key is often motivated by ease of use, but it's by no means necessary.
Diagram
Below is a diagram of my own table design, quite similar to the correct Answer by jpw. I made up a few column names to give more of a flavor of the nature of the table. I used Postgres data types.
As the last paragraph of that Answer discusses, I would go with a simple single primary key on the response_ ("Answers") table rather than a compound primary key combining fkey_user_ & fkey_question_.
Unrealistic
This diagram fits the problem description in the Question. However this design is not practicable. This scenario is for a single set of questions to be put to the user, only a single survey or quiz ever. In real life in a situation like a school, opinion survey, or focus group, I expect we would put more than one questionnaire to a user. But I will ignore that to directly address the Question as worded.
Also in some scenarios we might have versions of a question, as it is tweaked and revised over time when given on successive quizzes/questionnaires.
Performance
Your Question correctly identifies this problem as a Many-To-Many relationship between a user and a question, where each user can answer many questions and each question may be answered by many users. In relational database design there is only one proper way to represent a many-to-many. That way is to add a third child table, sometimes called a "bridge table", with a foreign key linking to each of the two parent tables.
In a diagram where you draw parent tables vertically higher up the page than child tables, I personally see such a many-to-many diagram as a butterfly or bird pattern where the child bridge table is the body/thorax and the two parents are wings.
Performance is irrelevant in a sense, as this is the only correct design. Fortunately, modern relational databases are optimized for such situations. You should see good performance for many millions of records. Especially if you a sequential number as your primary key values. I tend to use UUID data type instead; their arbitrary bit values may have less efficient index performance when table size reaches the millions (but I don't know the details.

Does every table really need an auto-incrementing artificial primary key? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
Almost every table in every database I've seen in my 7 years of development experience has an auto-incrementing primary key. Why is this? If I have a table of U.S. states where each state where each state must have a unique name, what's the use of an auto-incrementing primary key? Why not just use the state name as the primary key? Seems to me like an excuse to allow duplicates disguised as unique rows.
This seems plainly obvious to me, but then again, no one else seems to be arriving at and acting on the same logical conclusion as me, so I must assume there's a good chance I'm wrong.
Is there any real, practical reason we need to use auto-incrementing keys?
This question has been asked numerous times on SO and has been the subject of much debate over the years amongst (and between) developers and DBAs.
Let me start by saying that the premise of you question implies that one approach is universally superior to the other ... this is rarely the case in real life. Surrogate keys and natural keys both have their uses and challenges - and it's important to understand what they are. Whichever choice you make in your system, keep in mind there is benefit to consistency - it makes the data model easier to understand and easier to develop queries and applications for. I also want to say that I tend to prefer surrogate keys over natural keys for PKs ... but that doesn't mean that natural keys can't sometimes be useful in that role.
It is important to realize that surrogate and natural keys are NOT mutually exclusive - and in many cases they can complement each other. Keep in mind that a "key" for a database table is simply something that uniquely identifies a record (row). It's entirely possible for a single row to have multiple keys representing the different categories of constraints that make a record unique.
A primary key, on the other hand, is a particular unique key that the database will use to enforce referential integrity and to represent a foreign key in other tables. There can only be a single primary key for any table. The essential quality of a primary key is that it be 100% unique and non-NULL. A desirable quality of a primary key is that it be stable (unchanging). While mutable primary keys are possible - they cause many problems for database that are better avoided (cascading updates, RI failures, etc). If you do choose to use a surrogate primary key for your table(s) - you should also consider creating unique constraints to reflect the existence of any natural keys.
Surrogate keys are beneficial in cases where:
Natural keys are not stable (values may change over time)
Natural keys are large or unwieldy (multiple columns or long values)
Natural keys can change over time (columns added/removed over time)
By providing a short, stable, unique value for every row, we can reduce the size of the database, improve its performance, and reduce the volatility of dependent tables which store foreign keys. There's also the benefit of key polymorphism, which I'll get to later.
In some instances, using natural keys to express relationships between tables can be problematic. For instance, imagine you had a PERSON table whose natural key was {LAST_NAME, FIRST_NAME, SSN}. What happens if you have some other table GRANT_PROPOSAL in which you need to store a reference to a Proposer, Reviewer, Approver, and Authorizer. You now need 12 columns to express this information. You also need to come up with a naming convention of some kind to identify which columns belong to which kind of individual. But what if your PERSON table required 6, or 8, or 24 columns to for a natural key? This rapidly becomes unmanageable. Surrogate keys resolve such problems by divorcing the semantics (meaning) of a key from its use as an identifier.
Let's also take a look at the example you described in your question.
Should the 2-character abbreviation of a state be used as the primary key of that table.
On the surface, it looks like the abbreviation field meets the requirements of a good primary key. It's relatively short, it is easy to propagate as a foreign key, it looks stable. Unfortunately, you don't control the set of abbreviations ... the postal service does. And here's an interesting fact: in 1973 the USPS changed the abbreviation of Nebraska from NB to NE to minimize confusion with New Brunswick, Canada. The moral of the story is that natural keys are often outside of the control of the database ... and they can change over time. Even when you think they cannot. This problem is even more pronounced for more complicated data like people, or products, etc. As businesses evolve, the definitions for what makes such entities unique can change. And this can create significant problems for data modelers and application developers.
Earlier I mentioned that primary keys can support key polymorphism. What does that mean? Well, polymorphism is the ability of one type, A, to appear as and be used like another type, B. In databases, this concept refers to the ability to combine keys from different classes of entities into a single table. Let's look at an example. Imagine for a moment that you want have an audit trail in your system that identifies which entities were modified by which user on what date. It would be nice to create a table with the fields: {ENTITY_ID, USER_ID, EDIT_DATE}. Unfortunately, using natural keys, different entities have different keys. So now we need to create a separate linking table for each kind of entity ... and build our application in a manner where it understand the different kinds of entities and how their keys are shaped.
Don't get me wrong. I'm not advocating that surrogate keys should ALWAYS be used. In the real world never, ever, and always are a dangerous position to adopt. One of the biggest drawbacks of surrogate keys is that they can result in tables that have foreign keys consisting of lots of "meaningless" numbers. This can make it cumbersome to interpret the meaning of a record since you have to join or lookup records from other tables to get a complete picture. It also can make a distributed database deployment more complicated, as assigning unique incrementing numbers across servers isn't always possible (although most modern database like Oracle and SQLServer mitigate this via sequence replication).
No.
In most cases, having a surrogate INT IDENTITY key is an easy option: it can be guaranteed to be NOT NULL and 100% unique, something a lot of "natural" keys don't offer - names can change, so can SSN's and other items of information.
In the case of state abbreviations and names - if anything, I'd use the two-letter state abbreviation as a key.
A primary key must be:
unique (100% guaranteed! Not just "almost" unique)
NON NULL
A primary key should be:
stable if ever possible (not change - or at least not too frequently)
State two-letter codes definitely would offer this - that might be a candidate for a natural key. A key should also be small - an INT of 4 bytes is perfect, a two-letter CHAR(2) column just the same. I would not ever use a VARCHAR(100) field or something like that as a key - it's just too clunky, most likely will change all the time - not a good key candidate.
So while you don't have to have an auto-incrementing "artificial" (surrogate) primary key, it's often quite a good choice, since no naturally occuring data is really up to the task of being a primary key, and you want to avoid having huge primary keys with several columns - those are just too clunky and inefficient.
I think the use of the word "Primary", in the phrase "Primary" Key is in a real sense, misleading.
First, use the definition that a "key" is an attribute or set of attributes that must be unique within the table,
Then, having any key serves several often mutually inconsistent purposes.
Purpose 1. To use as joins conditions to one or many records in child tables which have a relationship to this parent table. (Explicitly or implicitly defining a Foreign Key in those child tables)
Purpose 2. (related) Ensuring that child records must have a parent record in the parent table (The child table FK must exist as Key in the parent table)
Purpose 3. To increase performance of queries that need to rapidly locate a specific record/row in the table.
Purpose 4. (Most Important from data consistency perspective!) To ensure data consistency by preventing duplicate rows which represent the same logical entity from being inserted itno the table. (This is often called a "natural" key, and should consist of table (entity) attributes which are relatively invariant.)
Clearly, any non-meaningfull, non-natural key (like a GUID or an auto-generated integer is totally incapable of satisfying Purpose 4.
But often, with many (most) tables, a totally natural key which can provide #4 will often consist of multiple attributes and be excessively wide, or so wide that using it for purposes #1, #2, or #3 will cause unacceptable performance consequencecs.
The answer is simple. Use both. Use a simple auto-Generating integral key for all Joins and FKs in other child tables, but ensure that every table that requires data consistency (very few tables don't) have an alternate natural unique key that will prevent inserts of inconsistent data rows... Plus, if you always have both, then all the objections against using a natural key (what if it changes? I have to change every place it is referenced as a FK) become moot, as you are not using it for that... You are only using it in the one table where it is a PK, to avoid inconsistent duplciate data...
The only time you can get away without both is for a completely stand alone table that participates in no relationships with other tables and has an obvious and reliable natural key.
In general, a numeric primary key will perform better than a string. You can additionaly create unique keys to prevent duplicates from creeping in. That way you get the assurance of no duplicates, but you also get the performance of numbers (vs. strings in your scenario).
In all likelyhood, the major databases have some performance optimizations for integer-based primary keys that are not present for string-based primary keys. But, that is only a reasonable guess.
Yes, in my opinion every table needs an auto incrementing integer key because it makes both JOINs and (especially) front-end programming much, much, much easier. Others feel differently, but this is over 20 years of experience speaking.
The single exception is small "code" or "lookup" tables in which I'm willing to substitute a short (4 or 5 character) TEXT code value. I do this because the I often use a lot of these in my databases and it allows me to present a meaningful display to the user without having to look up the description in the lookup table or JOIN it into a result set. Your example of a States table would fit in this category.
No, absolutely not.
Having a primary key which can't change is a good idea (UPDATE is legal for primary key columns, but in general potentially confusing and can create problems for child rows). But if your application has some other candidate which is more suitable than an auto-incrementing value, then you should probably use that instead.
Performance-wise, in general fewer columns are better, and particularly fewer indexes. If you have another column which has a unique index on it AND can never be changed by any business process, then it may be a suitable primary key.
Speaking from a MySQL (Innodb) perspective, it's also a good idea to use a "real" column as a primary key rather than an "artificial" one, as InnoDB always clusters the primary key and includes it in secondary indexes (that is how it finds the rows in them). This gives it potential to do useful optimisation with a primary key which it can't with any other unique index. MSSQL users often choose to cluster the primary key, but it can also cluster a different unique index.
EDIT:
But if it's a small database and you don't really care about performance or size too much, adding an unnecessary auto-increment column isn't that bad.
A non auto-incrementing value (e.g. UUID, or some other string generated according to your own algorithm) may be useful for distributed, sharded, or diverse systems where maintaining a consistent auto-incrementing ID is difficult (or impossible - think of a distributed system which continues to insert rows on both sides of a network partition).
I think there are two things that may explain the reason why auto-incrementing keys are sometimes used:
Space consideration; ok your state name doesn't amount to much, but the space it takes may add up. If you really want to store the state with its name as a primary key, then go ahead, but it will take more place. That may not be a problem in certain cases, and it sounds like a problem of olden days, but the habit is perhaps ingrained. And we programmers and DBA do love habits :D
Defensive consideration: i recently had the following problem; we have users in the database where the email is the key to all identification. Why not make the email the promary key? except suddenly border cases creep in where one guy must be there twice to have two different adresses, and nobody talked about it in the specs so the adress is not normalized, and there's this situation where two different emails must point to the same person and... After a while, you stop pulling your hairs out and add the damn integer id column
I'm not saying it's a bad habit, nor a good one; i'm sure good systems can be designed around reasonable primary keys, but these two points lead me to believe fear and habit are two among the culprits
It's a key component of relational databases. Having an integer relate to a state instead of having the whole state name saves a bunch of space in your database! Imagine you have a million records referencing your state table. Do you want to use 4 bytes for a number on each of those records or do you want to use a whole crapload of bytes for each state name?
Here are some practical considerations.
Most modern ORMs (rails, django, hibernate, etc.) work best when there is a single integer column as the primary key.
Additionally, having a standard naming convention (e.g. id as primary key and table_name_id for foreign keys) makes identifying keys easier.

Is a one column table good design? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
It it ok to have a table with just one column? I know it isn't technically illegal, but is it considered poor design?
EDIT:
Here are a few examples:
You have a table with the 50 valid US state codes, but you have no need to store the verbose state names.
An email blacklist.
Someone mentioned adding a key field. The way I see it, this single column WOULD be the primary key.
In terms of relational algebra this would be a unary relation, meaning "this thing exists"
Yes, it's fine to have a table defining such a relation: for instance, to define a domain.
The values of such a table should be natural primary keys of course.
A lookup table of prime numbers is what comes to my mind first.
Yes, it's certainly good design to design a table in such a way as to make it most efficient. "Bad RDBMS Design" is usually centered around inefficiency.
However, I have found that most cases of single column design could benefit from an additional column. For example, State Codes can typically have the Full State name spelled out in a second column. Or a blacklist can have notes associated. But, if your design really does not need that information, then it's perfectly ok to have the single column.
I've used them in the past. One client of mine wanted to auto block anyone trying to sign up with a phone number in this big list he had so it was just one big blacklist.
If there is a valid need for it, then I don't see a problem. Maybe you just want a list of possibilities to display for some reason and you want to be able to dynamically change it, but have no need to link it to another table.
One case that I found sometimes is something like this:
Table countries_id, contains only one column with numeric ID for each country.
Table countries_description, contains the column with country ID, a column With language ID and a column with the localized country name.
Table company_factories, contains information for each factory of the company, including the country in Wich is located.
So to maintain data coherence and language independent data in the tables the database uses this schema with tables with only one column to allow foreign keys without language dependencies.
In this case I think the existence of one column tables are justified.
Edited in response to the comment by: Quassnoi
(source: ggpht.com)
In this schema I can define a foreign key in the table company_factories that does not require me to include Language column on the table, but if I don't have the table countries_id, I must include Language column on the table to define the foreign key.
There would be rare cases where a single-column table makes sense. I did one database where the list of valid language codes was a single-column table used as a foreign key. There was no point in having a different key, since the code itself was the key. And there was no fixed description since the language code descriptions would vary by language for some contexts.
In general, any case where you need an authoritative list of values that do not have any additional attributes is a good candidate for a one-column table.
I use single-column tables all the time -- depending, of course, on whether the app design already uses a database. Once I've endured the design overhead of establishing a database connection, I put all mutable data into tables where possible.
I can think of two uses of single-column tables OTMH:
1) Data item exists. Often used in dropdown lists. Also used for simple legitimacy tests.
Eg. two-letter U.S. state abbreviations; Zip codes that we ship to; words legal in Scrabble; etc.
2) Sparse binary attribute, ie., in a large table, a binary attribute that will be true for only a very few records. Instead of adding a new boolean column, I might create a separate table containing the keys of the records for which the attribute is true.
Eg. employees that have a terminal disease; banks with a 360-day year (most use 365); etc.
-Al.
Mostly I've seen this in lookup type tables such as the state table you described. However, if you do this be sure to set the column as the primary key to force uniqueness. If you can't set this value as unique, then you shouldn't be using one column.
No problem as long as it contains unique values.
I would say in general, yes. Not sure why you need just one column. There are some exceptions to this that I have seen used effectively. It depends on what you're trying to achieve.
They are not really good design when you're thinking of the schema of the database, but really should only be used as utility tables.
I've seen numbers tables used effectively in the past.
The purpose of a database is to relate pieces of information to each other. How can you do that when there is no data to relate to?
Maybe this is some kind of compilation table (i.e. FirstName + LastName + Birthdate), though I'm still not sure why you would want to do that.
EDIT: I could see using this kind of table for a simple list of some kind. Is that what you are using it for?
Yes as long as the field is the primary key as you said it would be. The reason is because if you insert duplicate data those rows will be readonly. If you try to delete one of the rows that are duplicated. it will not work because the server will not know which row to delete.
The only use case I can conceive of is a table of words perhaps for a word game. You access the table just to verify that a string is a word: select word from words where word = ?. But there are far better data structures for holding a list of words than a relational database.
Otherwise, data in a database is usually placed in a database to take advantage of the relationships between various attributes of the data. If your data has no attributes beyond its value how will these relationship be developed?
So, while not illegal, in general you probably should not have a table with just one column.
All my tables have at least four tech fields, serial primary key, creation and modification timestamps, and soft delete boolean. In any blacklist, you will also want to know who did add the entry. So for me, answer is no, a table with only one column would not make sense except when prototyping something.
Yes that is perfectly fine. but an ID field couldn't hurt it right?