Altering existing table vs creating new table which approach is beast [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
One scenario came to us today. Here we have a database table where more than 1 million records already present and now we want to alter table which will add one column. That column will be having default value 0. By altering the table with column which will be remain 0 for all existing records will it affect performance.
Also,here that column value remains 0 for 99.9% only if some action triggered by user then it will change to 1.So shall I create new table to hold those value or alter existing table.
I would like to know advantages or disadvantages of both approaches

This is an interesting question. If you use 0, then the existing data will need to be rewritten to make space for the new column. (There might be an exception if you already have bit columns and the new column is a bit.)
Rewriting the table is a one-time operation and it does take time, although on one million rows, it shouldn't take too long.
The alternative is to create a second table to store flags that are set. This could be as either columns or one row per flag. You would use left join to load data from this table.
I would be biased to having a second table, but not for performance reasons. Rather, I would like to include other information about the flag being set -- notably the date/time of when the flag is set. Also, I might want to distinguish between values that default to 0 versus those that are explicitly reset to 0.

By altering the table with column which will be remain 0 for all
existing records will it affect performance.
The answer depends on nullability of your new column and on your server version:
Misconceptions around adding columns to a table by P.Randal
So if your server ##version is >= 2012 and your column is nullable, even if you fill it in with the default value by using WITH VALUES clause will not affect the performance as it will be metadata only operation.

I'm going to assume that you care mostly about "what is best for the ongoing operation of my system", rather than which option is easiest for the one-off change.
To make sure I understand your question, I'd like to restate it with a (made up) example. Please comment if I've misunderstood.
You have a table with > 1 million rows; let's pretend it's called people. The table has several foreign key relationships (e.g. countries, a person lives in a country, and is in turn related to many other table via foreign keys (e.g. interactions, a person interacts with the system).
You have now identified an additional attribute of people. The attribute is mandatory (so you are setting a default value), and of type integer.
You say it's "only incremented when the user interacts with the system"; it would be great if you could be more explicit about that.
In (very) general terms, adding an integer column with a default value to your table will have no noticable effect on performance of your existing queries. Performance impact happens primarily when you include columns in where clauses and/or joins. So, select * from people where name = ? and age > ? will behave exaxctly as it does today.
When you add that column to a where clause, you may get a slightly better performance if the column is included in an index, because it may filter out more rows. select * from people where name = ? and age > ? and new_column = ? might reduce the number of rows to inspect, and thus improve speed.
If I've understood your question, you're considering creating a new table to hold that data instead, presumably with a foreign key relationship (person_id), and a value for that new attribute.
So, if it's true that the attribute is mandatory, and has a default value of 0, in the business domain sense, then creating a separate table makes no real sense. You want to keep all mandatory attributes for an entity in the same table.
If the attribute is not mandatory, it gets a little murkier - you can make a case for attributes that belong together, e.g. "address" to have their own tables. However, for a single attribute, it would be counter intuitive.

Related

What to use as ID and PK [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a small question. I want so save information about a product in a sql database. That product has a unique 12 part number.
The product is going to be linked several times in the database. What should I use as unique id? The part number? or should I use the auto incremented id value?
What is better performance wise, and what is better in general?
You should use a "meaningless" surrogate key (an auto-increment, a globally unique ID, etc.) on your table, even though you do have a unique identifier that has meaning in the real world (your 12-digit part number).
The main reason for this is that anything that has meaning in the real world is subject to change: part numbers change when companies merge, registration numbers change on renewal, etc. On top of that, there is always a possibility of mis-typing the number, and having to correct it later.
It is very easy to change an attribute that is not your primary key when this happens: it is a simple update of an attribute. Changing a primary key becomes very hard, though, because you may have foreign keys referencing it from other tables. This reason alone is sufficient to decide in favor of surrogate keys on your tables.
I would use automatically generated BigInteger/Integer (depending on your needs) column as PRIMARY KEY and store part number as UNIQUE (this would be a candidate key). You would benefit from space that is required to store this in tables that are part of relations and more values would probably fit within a single page. If, by any chance, you'd need to change product number, this scenario would be beneficial to do so.
You would, though, need an additional JOIN even if you only need the part number from a table which holds foreign key relationship to your product table. More often than not if you want to display part number you also want some additional information from product table itself, so that doesn't hurt so much (since databases are optimized in a way to quickly perform join operations).
Be sure to create an index on column which is a FOREIGN KEY so in this case something like id_product in all tables which hold the relation to speed up matching operations.
Your choice is between a natural and surrogate primary key. There are a bunch of trade-offs here.
The good news is that performance is unlikely to be one of them - as long as you can search using an index, a natural and surrogate key are likely to perform the same.
The data you store may have a unique attribute - SKU, social security number, etc. - and it may seem logical to use this when creating your schema. However, this opens you up to a whole range of edge cases.
What if the user enters the data incorrectly and wants to change it? Changing primary and foreign keys is a terrible idea.
What if the system that generates your natural keys has a bug and generates duplicates?
What if the system that generates your natural keys is retired and you get records with incompatible data formats? Re-building your schema to move from integers to strings for your primary/foreign keys is horrible.
What if you want to integrate with multiple systems generating data, and they disagree about data formats?
I've seen all these "what-ifs" happen.
I'd go for a surrogate key, using an auto-increment or GUID.

Using one numerical database column vs several boolean columns [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
So I have a db with a table user.
The user can be moderator, owner, or nothing of both, or both.
I can design this with to booleans isModerator and isOwner
which would become to db columns
Or I could create a column hasUserRight
with 1 for moderator
2 for owner
What is the better approach to design the db with it and why?
You are clearly talking about user roles, you a second option is using flags for each role. This will limit you to a certain number of roles and is not easy to understand. The first option is not normalized, adding functionality will be more work etc.
Adding a table with roles and a userrole table will give you a more generic solution.
As long as there are only the two roles, both your solutions will work. But I agree with the others that one column per role would be easier to read and should be preferred hence.
However, problems occur when having to add a third role. If this is something that you know for sure will never happen, okay. But if it can happen, you should think of the consequences. Let's add a new role "revisor" and let's say that a revisor must be a moderator.
Solution 1: isModerator, isOwner
Add isRevisor. All written code will run as before. You can add code for isRevisor. Add a check constraint so that isRevisor cannot be set true if isModerator is false. Done.
=> database (DDL) changes only
Solution 2: hasUserRight 0=none, 1=moderator, 2=owner, 3=all=moderator+owner and a constraint hasUserRight in (0,1,2,3)
(I wouldn't recommend this, because it's not obvious what the different values mean.)
You need more values: 4=moderator+revisor, 5=all=moderator+owner+revisor (or better 3=all=moderator+owner+revisor and 5=moderator+owner?). Your code will be broken, because hasUserRight in (1,3) does no longer select all moderators. You will have to fix the code. Change the contraint to hasUserRight in (0,1,2,3,4,5).
=> code changes + database (DDL) changes
Solution 3: hasUserRight 0=none, 1=moderator, 2=owner, 3=all=moderator+owner and a table UserRight holding the values 0 to 3 along with an explnational text.
Again, you need more values: 4=moderator+revisor, 5=all=moderator+owner+revisor (or better 3=all=moderator+owner+revisor and 5=moderator+owner?). Add them to your role table. Your code will be broken, because hasUserRight in (1,3) does no longer select all moderators. You will have to fix the code. No need to change any constraint; the foreign key only allows valid values.
=> only code changes
Solution 4: a table role and a bridge table user_role
Simply insert the new role in table role. Add entries to table user_role if you like. Done. All you need is inserts. Your dbms cannot guarantee however that each revisor is a moderator; you will have to care about this yourself.
=> no changes at all to code or database (DDL)
As you see solution 2 and 3 (hasUserRight) are bad. Decide for either solution 1 or 4, whatever you prefer and find more appropriate.
An issue arises with your second solution. What if the user is an owner and a moderator? You would have to assign a fourth numerical value for that situation assuming that 0 would represent neither owner nor moderator. Even if this layout was well documented, it still would not be intuitive.
Your first solution is much cleaner and easier to understand.

Best practice for a "comment" table in a relational database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Assume you want to build a database for some web application. This database already contains many tables and you might have to extend it in the future.
Also, you want the end user to be able to comment any kind of object in the database.
I would like to find a solution for this would be generic enough so that I don't have to extend it each time I add a new table in the database.
I thought of the following:
Table Name: comment
columns:
id : the id of a comment
user_id : the id of the user making the comment
object_table_name : the table where the commented object is
object_id : the id of the commented object in the object_table_name table.
text : the text
date : the date
This table sort of solve my problem, the only thing that troubles me is that the relational aspect of it is rather weak (I can't make object_id a foreign key for instance).
Also if some day I need to rename a table I will have to change all the concerned entries in the comment table.
What do you think of this solution ? Is there design pattern that would help me out ?
Thanks.-
Isn't that cleaner ?
table comment_set
id
table comment
id
comment_set_id -> foreign key to comment_set
user_id
date
text
existing table foo
...
comment_set_id -> foreign key to comment_set
existing table bar
...
comment_set_id -> foreign key to comment_set
You are mixing data and metadata, which is not the best design pattern. They should be separated.
However, since the comments don't seem to be very important anyway, you solution is OK. The worst thing you can end up with is to lose comments on your objects.
Some databases, most notably, PostgreSQL, support COMMENT clause just for the cases like this.
Update:
If you want to comment on individual records in each table, it's OK to have such a table.
object_table_name does not have to change if you rename a table, since it's data, not metadata.
You cannot write a native SQL query that will fetch comments for the record of any table (not known by the moment of the query development), though you can build the dynamic queries to do that.
In this case, you will have to keep your data and metadata in sync (UPDATE the comment table when you RENAME the table they refer to). The first one is a DML statement (changes data), the second one is DDL (changes metadata).
Also make sure that all PRIMARY KEYs have the same types (same as object_id).
Read about EAV.
You can make your whole database like that. But then it will be hell working with that data.
Why don't you want to place a Comment attribute for each database entity which should support comments? This way you can get all the data you need in a single query, and many GUI programs for databases will provide you with full code completion in SQL, which will prevent errors that can easily occur when operating with strings. That way the code is heavily dependent on procedural code, which isn't right for database systems.
You can enumerate the table names in a separate table, so that changes in names do not affect the system in any major way. Just update the enumeration table.
Although you are distancing your self from referential integrity, i can see another way to accomplish what you want.
I generally prefer to keep the comments with the rows to which they apply. Assuming your database efficiently stores empty VARCHAR fields, you shouldn't pay a penalty for this. There isn't really anything to "extend" when you implement this approach, the maintenance of the comment becomes part of the queries you are already using to update the rows.
The only advantage to the single-note-table approach is that it allows easy searches across notes for different kinds of database entries.
Assuming MS SQL, and if the volume is relatively small, as you seem to suggest, then Extended Properties might be worth exploring. I've used them sucessfully in the past and they seem to be a permanent fixture.

Normalization Help

I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.

Is a one column table good design? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
It it ok to have a table with just one column? I know it isn't technically illegal, but is it considered poor design?
EDIT:
Here are a few examples:
You have a table with the 50 valid US state codes, but you have no need to store the verbose state names.
An email blacklist.
Someone mentioned adding a key field. The way I see it, this single column WOULD be the primary key.
In terms of relational algebra this would be a unary relation, meaning "this thing exists"
Yes, it's fine to have a table defining such a relation: for instance, to define a domain.
The values of such a table should be natural primary keys of course.
A lookup table of prime numbers is what comes to my mind first.
Yes, it's certainly good design to design a table in such a way as to make it most efficient. "Bad RDBMS Design" is usually centered around inefficiency.
However, I have found that most cases of single column design could benefit from an additional column. For example, State Codes can typically have the Full State name spelled out in a second column. Or a blacklist can have notes associated. But, if your design really does not need that information, then it's perfectly ok to have the single column.
I've used them in the past. One client of mine wanted to auto block anyone trying to sign up with a phone number in this big list he had so it was just one big blacklist.
If there is a valid need for it, then I don't see a problem. Maybe you just want a list of possibilities to display for some reason and you want to be able to dynamically change it, but have no need to link it to another table.
One case that I found sometimes is something like this:
Table countries_id, contains only one column with numeric ID for each country.
Table countries_description, contains the column with country ID, a column With language ID and a column with the localized country name.
Table company_factories, contains information for each factory of the company, including the country in Wich is located.
So to maintain data coherence and language independent data in the tables the database uses this schema with tables with only one column to allow foreign keys without language dependencies.
In this case I think the existence of one column tables are justified.
Edited in response to the comment by: Quassnoi
(source: ggpht.com)
In this schema I can define a foreign key in the table company_factories that does not require me to include Language column on the table, but if I don't have the table countries_id, I must include Language column on the table to define the foreign key.
There would be rare cases where a single-column table makes sense. I did one database where the list of valid language codes was a single-column table used as a foreign key. There was no point in having a different key, since the code itself was the key. And there was no fixed description since the language code descriptions would vary by language for some contexts.
In general, any case where you need an authoritative list of values that do not have any additional attributes is a good candidate for a one-column table.
I use single-column tables all the time -- depending, of course, on whether the app design already uses a database. Once I've endured the design overhead of establishing a database connection, I put all mutable data into tables where possible.
I can think of two uses of single-column tables OTMH:
1) Data item exists. Often used in dropdown lists. Also used for simple legitimacy tests.
Eg. two-letter U.S. state abbreviations; Zip codes that we ship to; words legal in Scrabble; etc.
2) Sparse binary attribute, ie., in a large table, a binary attribute that will be true for only a very few records. Instead of adding a new boolean column, I might create a separate table containing the keys of the records for which the attribute is true.
Eg. employees that have a terminal disease; banks with a 360-day year (most use 365); etc.
-Al.
Mostly I've seen this in lookup type tables such as the state table you described. However, if you do this be sure to set the column as the primary key to force uniqueness. If you can't set this value as unique, then you shouldn't be using one column.
No problem as long as it contains unique values.
I would say in general, yes. Not sure why you need just one column. There are some exceptions to this that I have seen used effectively. It depends on what you're trying to achieve.
They are not really good design when you're thinking of the schema of the database, but really should only be used as utility tables.
I've seen numbers tables used effectively in the past.
The purpose of a database is to relate pieces of information to each other. How can you do that when there is no data to relate to?
Maybe this is some kind of compilation table (i.e. FirstName + LastName + Birthdate), though I'm still not sure why you would want to do that.
EDIT: I could see using this kind of table for a simple list of some kind. Is that what you are using it for?
Yes as long as the field is the primary key as you said it would be. The reason is because if you insert duplicate data those rows will be readonly. If you try to delete one of the rows that are duplicated. it will not work because the server will not know which row to delete.
The only use case I can conceive of is a table of words perhaps for a word game. You access the table just to verify that a string is a word: select word from words where word = ?. But there are far better data structures for holding a list of words than a relational database.
Otherwise, data in a database is usually placed in a database to take advantage of the relationships between various attributes of the data. If your data has no attributes beyond its value how will these relationship be developed?
So, while not illegal, in general you probably should not have a table with just one column.
All my tables have at least four tech fields, serial primary key, creation and modification timestamps, and soft delete boolean. In any blacklist, you will also want to know who did add the entry. So for me, answer is no, a table with only one column would not make sense except when prototyping something.
Yes that is perfectly fine. but an ID field couldn't hurt it right?