SQL database checklist - sql

I need to create a Sql Database Checklist,
I have some basics points like
Each table must have a primary key
Normalize data to third normal form
Check for Integrity column the column value should be incremented properly.
But can anyone help me to enhance this list ?

Objects conform to a single naming convention
Create foreign key relationships
Apply appropriate index(es)
Use of schema or other mechanisms for controlling read/write access, etc
Consideration given to how long data should be kept before deletion or archive
Version control over scripts for updating the database structure
Mechanism for applications to determine version of database
Backup and recovery plans in place

First, It would help if this is supposed to be a recuring check list, or a checklist for each new instance. Also, is there a specific implementation in mind like SQL Server? MySQL? (this is where the real check list begins). For example, you want to keep an eye on the Transactions Log if its SQL Server...
If this is a relational DB, ER diamgrams go a long way in making sure that you have your problem domain identified and analyzed. You are right track using third normal form where practical. I want to emphasize practical because you also want to try and anticipate and identify which data will be used more than others. If data is highly accessed, you may want consider indexing more than just the primary and/or denormalizing to 2nd normal form. (uses more space, but better performance). Remember that accessing data and updating data are inversely related when indexing is concerned. Hope this helps.

Related

How to create a dynamique data base

i'm working with .Net core, i want to create a data base of a stock so that the user can add a new type of product with unknown features and he can also add features to existing product.
i really need help with the design of the data base.
Databases have schemas. This is a rigid structure that defines both the characteristics and constraints of the data that can be placed in it. You cannot do something like dynamically add columns, etc. without fundamentally impacting the database integrity.
In true relational databases (SQL Server, MySQL, Postgresql, etc.). Such changes are flat out disallowed. However, some less rigid NoSQL solutions are either schema-less or have malleable schemas and will allow you to just start tracking some new data point without first altering the structure of the database. Even then, though, data integrity becomes a serious issue, and you can end up borking your entire dataset if you do this kind of stuff willy-nilly.
Long and short, there's really no "dynamic" where databases are concerned. Even in NoSQL solutions, you're largely expected to plan out your data structure before hand, and failure to do so, results in inconsistencies in the data that can negate its usefulness entirely.
Your best bet for something like the described requirement is to actually have a Features table. In the simplest form, it might just have a string column for a name and a foreign key or simply an ID referencing column (depending on whether it's relational or not) back to the product it's associate with. You'll need a primary key as well, which could either be a composite of the name and product ID (essentially making the combination a unique) or you might want to have an actual identity type column.
The key with data, in general, is to generalize. Nothing is completely unique and is just usually variations of other things. Boil down your data to least common denominators to determine your actual schema. Then, where there's outliers, you can take a less-rigid strategy like described above.

Normalisation and multi-valued fields

I'm having a problem with my students using multi-valued fields in access and getting confused about normalisation as a result.
Here is what I can make out. Given a 1-to-many relationship, e.g.
Articles Comments
-------- --------
artID{PK} commID{PK}
text text
artID{FK}
Access makes it possible to store this information into what appears to be one table, something like
Articles
--------
artID{PK}
text
comment
+ value
"value" referring to multiple comment values for the comment "column", which access actually stores as a separate table. The specifics of how the values are stored - table, its PK and FK - is completely hidden, but it is possible to query the multi-valued field, e.g. in the example above with the query
INSERT INTO article( [comment].Value )
VALUES ('thank you')
WHERE artID = 1;
But the query doesn't quite reveal the underlying structure of the hidden table implementing the multi-valued field.
Given this (disaster, in my view) - my problem is how to help newcomers to database design and normalisation understand what Access is offering them, why it may not be helpful, and that it is not a reason to ignore the basics of the relational model. More specifically:
Are there better ways, besides queries as above, to reveal the structure behind multi-valued fields?
Are there good examples of where the multi-valued field is not good enough, and shows the advantage of normalising explicitly?
Are there straightforward ways to obtain the multi-select visual output of Access multi-values, but based on separate, explicit tables?
Thanks!
I cannot give you advice in using this feature, because I never used it; however, I can give you reasons not to use it.
I want to have full control on what I'm doing. This is not the case for multi-valued fields, therefore I don't use them.
This feature is not expandable. What if you want to add a date field to your comments, for instance?
It is sometimes necessary to upsize an Access (backend) database to a "big" database (SQL Server, Oracle). These Databases don't offer such a feature. It is often the customer who decides which database has to be used. Recently I had to migrate an Access application (frontend) using an Oracle backend to a SQL-Server backend because my client decided to drop his Oracle server. Therefore it is a good idea to restrict yourself to use only common features.
For common tasks like editing lookup tables I created generic forms. My existing solutions will not work with multi-valued fields.
I have a (self-made) tool that synchronizes changes in the structure of the database on my developer’s site with the database on the client’s site. This tool cannot deal with multi-valued fields.
I have tools for the security management that can grant SELECT, INSERT, UPDATE and DELETE rights on tables or revoke them. Again, the management tool does not work with multi-valued fields.
Having a separate table for the comments allows you to quickly inspect all the comments (by opening the table). You cannot do this with multi-valued fields.
You will not see the 1 to n relation between the articles and the comments in a database diagram.
With a separate table you can choose whether you want to cascade deletes to the details table or not. If you don't, you will not be able to delete an article as long as there are comments attached to it. This can be desirable, if you want to protect the comments from being deleted inadvertently.
It is important to realize the difference between physical and logical relationships. Today the whole internet and web services (SOAP) quite much realizes on a data format that is multi-value in nature.
When you represent multi-value data with a relational database (such as Access), then behind the scenes you are using a traditional (and legitimate) relation. I cannot stress that as such, then the use of multi-value columns in Access is in fact a LEGITIMATE relational model.
The fact that table is not exposed does not negate this issue. In fact, if you represent an invoice (master record, and repeating details) as a XML data cube, then we see two things:
1) you can build and represent that invoice with a relational database like Access
2) such a relational data model that is normalized can ALSO be represented as a SINGLE xml string.
3) deleting the XML record (or string) means that cascade delete of the child rows (invoice details) MUST occur.
So while it is true that Multi-Value fields been added to Access to deal with SharePoint, it is MOST important to realize that such data can be mapped to a relational database (if you could not do this, then Access could not consume that XML data using relational database tables as ACCESS CURRENTLY DOES RIGHT NOW).
And with the web such as XML, and SharePoint then the need to consume and manage and utilize such data is not only widespread, but is in fact a basic staple of the internet.
As more and more data becomes of a complex nature, we find the requirement for multi-value data exploding in use. Anyone who used that so called "fad" the internet is thus relying and using data that is in fact VERY OFTEN XML and is multi-value (complex) in nature.
As long as the logical (not physical) relational data model is kept, then use of multi-value columns to represent such data is possible and this is exactly what Access is doing (it is mapping the relational data model to a complex model). Note that the complex (xml) data model does NOT necessary have to be relational in nature. However, if you ARE going to map such data to Access then the complex multi-value model MUST CONFORM TO A RELATIONAL data model.
This is EXACTLY what is occurring in Access.
The fact that such a correct and legitimate math relational model is not exposed is of little issue here. Are we to suggest that because Excel does not expose the binary codes used then users will never learn about computers? Or perhaps we all must program in assembler so we all correctly learn how computers works.
At the end of the day, who cares and why does this matter? The fact that people drive automatic cars today does not toss out the concept that they are using different gears to operate that car. The idea that we shut down all of society because someone is going to drive an automatic car or in this case use complex data would be galactic stupid on our part.
So keep in mind that extensions to SQL do exist in Access to query the multi-value data, but as well pointed out here those underlying tables are not exposed. However, as noted, exposing such tables would STILL REQUIRE one to not change or mess with cascade delete since that feature is required TO MAINTAIN A INTERSECTION OF FEATURES and a CORRECT MATH relational model between the complex data model (xml) and that of using two related tables to represent such data.
In other words, you can use related tables to represent the complex data model IF YOU REMOVE the ability of users to play with the referential integrity options. The RI options MUST remain as set in those hidden tables else such data will not be able to make the trip BACK to the XML or complex data model of which it was consumed from.
As noted, in regards to users being taught how gasoline reacts with oxygen for that of learning to drive a car, or using a word processor and being forced to learn a relational model and expose the underlying tables makes little sense here.
However, the points made here in regards to such tables being exposed are legitimate concerns.
The REAL problem is SQL server and Oracle etc. cannot consume or represent that complex data WHILE ACCESS CAN CONSUME such data.
As noted, the complex data ship has LONG ago sailed! XML, soap, and the basic technologies of the internet are based on this complex data model.
In effect, SQL server, Oracle and most databases cannot that consume this multi-value data represent it without users having to create and model such data in a relational fashion is a BIG shortcoming of SQL server etc.
Access stands alone in this ability to consume this data.
So, for anyone who used a smartphone, iPad or the web, you are using basic technologies that are built around using complex data, something that Access now allows.
It is likely that the rest of the industry will have to follow suit given that more and more data is complex in nature. If the database industry does not change, then the mainstream traditional relational database system will NOT be the resting place of such data.
A trend away from storing data in related tables is occurring at a rapid pace right now and products like SharePoint, or even Google docs is proof of this concept. So Access is only reacting to market pressures and it is likely that other database vendors will have to follow suit or simply give up on being part of the "fad" called the internet.
XML and complex data structures are STAPLE and fact of our industry right now – this is not an issue we all should run away from, but in fact embrace.
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada
kallal#msn.com
The technical discussion is interesting. I think the real problem lies in student understanding. Because it is available in Access students will use it, and initially it will probably provide a simple solution to some design problems. The negatives will occur later when they try and use the data. Maybe a simple example demonstrating the problems would persuade some students to avoid using multi-valued fields ? Maybe an example of storing the data in another, more usable format would help ?
Good luck !
Peter Bullard
MS Access does a great job of simplifying database management and abstracting out a lot of complexity. This however makes the learning of dbms concepts a bit difficult. Have you tried using other 'standard' dbms tools like MySQL (or even sqlite). From a learning perspective they may be better.
I know this post is old. But, it's not quite the same as every other post I've seen on this topic. This one has someone making a good case for using Multi Valued Fields...
As someone who is trying who is still trying very hard to get their head around Access, I find the discussion for and against using the Multi Valued Fields incredibly frustrating.
I'm trying to sort through it all, but if everyone is so against them, what is an alternative method? It seems that in every search result I find everyone is either telling you how to use Multi Valued Fields and Controls or telling you how horrible and what a mistake they are. Many people refer to an alternative to them, but nobody says "Here's an example". I'm here to learn about these things. And while I know that this is a simpler concept for a lot of people in these forums, I could really use some examples to take a look at.
I'm at a point where I have to decide which way to go. It would be wonderful to compare examples of using Multi Valued Fields and alternatives and using a control to select multiple values.
Or am I wrong and the functionality of a combobox where you can select multiple items is only available through Access?
I want to address the last of your questions first. There is a way of providing a visual presentation of a parent child relationship. It's called subforms. If you get help about subforms in Access, it will explain the concept.
I have used subforms in a project where I wanted to display the transaction header in a form and the transaction details in a subform. There is nothing to hinder this construct even when the data is stored in two normalized tables.
Of course, this affects the screen, not the database. That's the whole point. Normalization is relevant to storage and retrieval, not to other uses of data.

Store files and comments for a web application. Table design

I have a web appliaction with several entities (tables).Each one has his CRUD pages.
I'd like to add for some the, the ability to add comments and attach files.
I was thinking of two scenarios.
One table for all comments/files - table would have some id for the entity and the particular record.
For each entity a separate comments/files table.
The files would be stored on the disk in a directory.In the table would be the name of the file and some additional info.
In term of application Design having one unique table for all coments seems to make sense. In term of application code that mean the same SQL will be reused for all entities. It's the 'classical way' used by most applications, extending on having the same acitive records and controllers used to handle comments and attachments for all objects.
In term of SQL thesecond solution could be usefull in some databases like MySQL to get more Memory Cache benefit. Every comment/attachmlent added in the 1st solution would drop from the memory cache all requests impacting the comment table. With individual tables a comment on one entity would not invalidate queries on other entities. But you would alos require more file descriptors and a bigger table cache.... so to choose this solution you would need a decision based on real-life, precise, case, where you would be able to compare the benefits in database access speed. And when you will add new entities you'll certainly find your each-entity-have-a-comment-table solution boring, things could have been automated by using 1st solution.
It's a tradeoff. With a single comments table, you get a simple, DRY (don't repeat yourself) schema, but you don't get foreign key constraints and thus no cascade deletion. Thus, if you delete an entity with comments, you must also remember to delete the comments!
If you go with multiple comment tables, you get FK constraints and cascade deletion, but you have a "wet" schema (you are repeating yourself). For example, each comment table might have a commentbody column. If you change that column definition, you have to change it in every comment table!
One interesting solution for a DRY-er schema could involve table inheritance (see http://www.postgresql.org/docs/9.0/interactive/ddl-inherit.html) but please read section 5.8.1. Caveats, as there are some "gotchas" regarding indexing, at least in postgres.
Either way, kudos to you for thinking carefully about your database design!

Upgrade strategies for bad DB schema designs

I've shown up at a new job and discovered database which is in dire need of some help. There are many many things wrong with it, including
No foreign keys...anywhere. They're faked by using ints and managing the relationship in code.
Practically every field can be NULL, which isn't really true
Naming conventions for tables and columns are practically non-existent
Varchars which are storing concatenated strings of relational information
Folks can argue, "It works", which it is. But moving forward, it's a total pain to manage all of this with code and opens us up to bugs IMO. Basically, the DB is being used as a flat file since it's not doing a whole lot of work.
I want to fix this. The issues I see now are:
We have a lot of data (migration, possibly tricky)
All of the DB logic is in code (with migration comes big code changes)
I'm also tempted to do something "radical" like moving to a schema-free DB.
What are some good strategies when faced with an existing DB built upon a poorly designed schema?
Enforce Foreign Keys: If a relationship exists in the domain, then it should have a Foreign Key.
Renaming existing tables/columns is fraught with danger, especially if there are many systems accessing the Database directly. Gotchas include tasks that run only periodically; these are often missed.
Of Interest: Scott Ambler's article: Introduction To Database Refactoring
and Catalog of Database Refactorings
Views are commonly used to transition between changing data models because of the encapsulation. A view looks like a table, but does not exist as a finite object in the database - you can change what column is being returned for a given column alias as desired. This allows you to setup your codebase to use a view, so you can move from the old table structure to the new one without the application needing to be updated. But it means the view has to return the data in the existing format. For example - your current data model has:
SELECT t.column --a list of concatenated strings, assuming comma separated
FROM TABLE t
...so the first version of the view would be the query above, but once you created the new table that uses 3NF, the query for the view would use:
SELECT GROUP_CONCAT(t.column SEPARATOR ',')
FROM NEW_TABLE t
...and the application code would never know that anything changed.
The problem with MySQL is that the view support is limited - you can't use variables within it, nor can they have subqueries.
The reality to the changes you wish to make is effectively rewriting the application from the ground up. Moving logic from the codebase into the data model will drastically change how the application gets the data. Model-View-Controller (MVC) is ideal to implement with changes like these, to minimize the cost of future changes like these.
I'd say leave it alone until you really understand it. Then make sure you don't start with one of the Things You Should Never Do.
Read Scott Ambler's book on Refactoring Databases. It covers a good many techniques for how to go about improving a database - including the transitional measures needed to allow both old and new programs to work with the changing design.
Create a completely new schema and make sure that it is fully normalized and contains any unique, check and not null constraints etc that are required and that appropriate data types are used.
Prepopulate each table that fills the parent role in a foreign key relationship with a single 'Unknown' record.
Create an ETL (Extract Transform Load) process (I can recommend SSIS (SQL Server Integration Services) but there are plenty of others) that you can use to refill the new schema from the existing one on a regular basis. Use the 'Unknown' record as the parent of any orphaned records - there will be plenty ;). You will need to put some thought into how you will consolidate duplicate records - this will probably need to be on a case by case basis.
Use as many iterations as are necessary to refine your new schema (ensure that the ETL Process is maintained and run regularly).
Create views over the new schema that match the existing schema as closely as possible.
Incrementally modify any clients to use the new schema making temporary use of the views where necessary. You should be able to gradually turn off parts of the ETL process and eventually disable it completely.
First see how bad the code is related to the DB if it is all mixed in no DAO layer you shouldn't think about a rewrite but if there is a DAO layer then it would be time to rewrite that layer and DB along with it. If possible make the migration tool based on using the two DAOs.
But my guess is there is no DAO so you need to find what areas of the code you are going to be changing and what parts of the DB that relates to hopefully you can cut it up into smaller parts that can be updated as you maintain. Biggest deal is to get FKs in there and start checking for proper indexes there is a good chance they aren't being done correctly.
I wouldn't worry too much about naming until the rest of the db is under control. As for the NULLs if the program chokes on a value being NULL don't let it be NULL but if the program can handle it I wouldn't worry about it at this point in the future if it is doing a default value move that to the DB but that is way down the line from the sound of things.
Do something about the Varchars sooner rather then later. If anything make that the first pure background fix to the program.
The other thing to do is estimate the effort of each areas change and then add that price to the cost of new development on that section of code. That way you can fix the parts as you add new features.

ALTER TABLE without locking the table?

When doing an ALTER TABLE statement in MySQL, the whole table is read-locked (allowing concurrent reads, but prohibiting concurrent writes) for the duration of the statement. If it's a big table, INSERT or UPDATE statements could be blocked for a looooong time. Is there a way to do a "hot alter", like adding a column in such a way that the table is still updatable throughout the process?
Mostly I'm interested in a solution for MySQL but I'd be interested in other RDBMS if MySQL can't do it.
To clarify, my purpose is simply to avoid downtime when a new feature that requires an extra table column is pushed to production. Any database schema will change over time, that's just a fact of life. I don't see why we should accept that these changes must inevitably result in downtime; that's just weak.
The only other option is to do manually what many RDBMS systems do anyway...
- Create a new table
You can then copy the contents of the old table over a chunk at a time. Whilst always being cautious of any INSERT/UPDATE/DELETE on the source table. (Could be managed by a trigger. Although this would cause a slow down, it's not a lock...)
Once finished, change the name of the source table, then change the name of the new table. Preferably in a transaction.
Once finished, recompile any stored procedures, etc that use that table. The execution plans will likely no longer be valid.
EDIT:
Some comments have been made about this limitation being a bit poor. So I thought I'd put a new perspective on it to show why it's how it is...
Adding a new field is like changing one field on every row.
Field Locks would be much harder than Row locks, never mind table locks.
You're actually changing the physical structure on the disk, every record moves.
This really is like an UPDATE on the Whole table, but with more impact...
Percona makes a tool called pt-online-schema-change that allows this to be done.
It essentially makes a copy of the table and modifies the new table. To keep the new table in sync with the original it uses triggers to update. This allows the original table to be accessed while the new table is prepared in the background.
This is similar to Dems suggested method above, but this does so in an automated fashion.
Some of their tools have a learning curve, namely connecting to the database, but once you have that down, they are great tools to have.
Ex:
pt-online-schema-change --alter "ADD COLUMN c1 INT" D=db,t=numbers_are_friends
This question from 2009. Now MySQL offers a solution:
Online DDL (Data Definition Language)
A feature that improves the performance, concurrency, and availability
of InnoDB tables during DDL (primarily ALTER TABLE) operations. See
Section 14.11, “InnoDB and Online DDL” for details.
The details vary according to the type of operation. In some cases,
the table can be modified concurrently while the ALTER TABLE is in
progress. The operation might be able to be performed without doing a
table copy, or using a specially optimized type of table copy. Space
usage is controlled by the innodb_online_alter_log_max_size
configuration option.
It lets you adjust the balance between performance and concurrency during the DDL operation, by choosing whether to block access to the table entirely (LOCK=EXCLUSIVE clause), allow queries but not DML (LOCK=SHARED clause), or allow full query and DML access to the table (LOCK=NONE clause). When you omit the LOCK clause or specify LOCK=DEFAULT, MySQL allows as much concurrency as possible depending on the type of operation.
Performing changes in-place where possible, rather than creating a new copy of the table, avoids temporary increases in disk space usage and I/O overhead associated with copying the table and reconstructing secondary indexes.
see MySQL 5.6 Reference Manual -> InnoDB and Online DDL for more info.
It seems that online DDL also available in MariaDB
Alternatively you can use ALTER ONLINE TABLE to ensure that your ALTER
TABLE does not block concurrent operations (takes no locks). It is
equivalent to LOCK=NONE.
MariaDB KB about ALTER TABLE
See Facebook's online schema change tool.
http://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932
Not for the faint of heart; but it will do the job.
I recommend Postgres if that's an option. With postgres there is essentially no downtime with the following procedures:
ALTER TABLE ADD COLUMN (if the column can be NULL)
ALTER TABLE DROP COLUMN
CREATE INDEX (must use CREATE INDEX CONCURRENTLY)
DROP INDEX
Other great feature is that most DDL statements are transactional, so you could do an entire migration within a SQL transaction, and if something goes wrong, the entire thing gets rolled back.
I wrote this a little bit ago, perhaps it can shed some more insight on the other merits.
Since you asked about other databases, here's some information about Oracle.
Adding a NULL column to an Oracle table is a very quick operation as it only updates the data dictionary. This holds an exclusive lock on the table for a very short period of time. It will however, invalidate any depedant stored procedures, views, triggers, etc. These will get recompiled automatically.
From there if necessary you can create index using the ONLINE clause. Again, only very short data dictionary locks. It'll read the whole table looking for things to index, but does not block anyone while doing this.
If you need to add a foreign key, you can do this and get Oracle to trust you that the data is correct. Otherwise it needs to read the whole table and validate all the values which can be slow (create your index first).
If you need to put a default or calculated value into every row of the new column, you'll need to run a massive update or perhaps a little utility program that populates the new data. This can be slow, especially if the rows get alot bigger and no longer fit in their blocks. Locking can be managed during this process. Since the old versino of your application, which is still running, does not know about this column you might need a sneaky trigger or to specify a default.
From there, you can do a switcharoo on your application servers to the new version of the code and it'll keep running. Drop your sneaky trigger.
Alternatively, you can use DBMS_REDEFINITION which is a black box designed to do this sort of thing.
All this is so much bother to test, etc that we just have an early Sunday morning outage whenever we release a major version.
If you cannot afford downtime for your database when doing application updates you should consider maintaining a two-node cluster for high availability. With a simple replication setup, you could do almost fully online structural changes like the one you suggest:
wait for all changes to be replicated on a passive slave
change the passive slave to be the active master
do the structural changes to the old master
replicate changes back from the new master to the old master
do the master swapping again and the new app deployment simultaneously
It is not always easy but it works, usually with 0 downtime! The second node does not have to be only passive one, it can be used for testing, doing statistics or as a fallback node.
If you do not have infrastructure replication can be set up within a single machine (with two instances of MySQL).
Nope. If you are using MyISAM tables, to my best understanding they only do table locks - there are no record locks, they just try to keep everything hyperfast through simplicity. (Other MySQL tables operate differently.) In any case, you can copy the table to another table, alter it, and then switch them, updating for differences.
This is such a massive alteration that I doubt any DBMS would support it. It's considered a benefit to be able to do it with data in the table in the first place.
Temporary solution...
Other solution could be, add a another table with primary key of the original table, along with your new column.
Populate your primary key onto the new table and populate values for new column in your new table, and modify your query to join this table for select operations and you also need to insert, update separately for this column value.
When you able to get downtime, you can alter the original table, modify your DML queries and drop your new table created earlier
Else, you may go for clustering method, replication, pt-online-schema tool from percona
You should definitely try pt-online-schema-change. I have been using this tool to do migrations on AWS RDS with multiple slaves and it has worked very well for me. I wrote an elaborate blog post on how to do that which might be helpful for you.
Blog: http://mrafayaleem.com/2016/02/08/live-mysql-schema-changes-with-percona/
Using the Innodb plugin, ALTER TABLE statements which only add or drop secondary indexes can be done "quickly", i.e. without rebuilding the table.
Generally speaking however, in MySQL, any ALTER TABLE involves rebuilding the entire table which can take a very long time (i.e. if the table has a useful amount of data in it).
You really need to design your application so that ALTER TABLE statements do not need to be done regularly; you certainly don't want any ALTER TABLE done during normal running of the application unless you're prepared to wait or you're altering tiny tables.
I would recommend one of two approaches:
Design your database tables with the potential changes in mind. For example, I've worked with Content Management Systems, which change data fields in content regularly. Instead of building the physical database structure to match the initial CMS field requirements, it is much better to build in a flexible structure. In this case, using a blob text field (varchar(max) for example) to hold flexible XML data. This makes structural changes very less frequent. Structural changes can be costly, so there is a benefit to cost here as well.
Have system maintenance time. Either the system goes offline during changes (monthly, etc), and the changes are scheduled during the least heavily trafficked time of the day (3-5am, for example). The changes are staged prior to production rollout, so you will have a good fixed window estimate of downtime.
2a. Have redundant servers, so that when the system has downtime, the whole site does not go down. This would allow you to "roll" your updates out in a staggered fashion, without taking the whole site down.
Options 2 and 2a may not be feasible; they tend to be only for larger sites/operations. They are valid options, however, and I have personally used all of the options presented here.
If anyone is still reading this or happens to come here, this is the big benefit of using a NoSQL database system like mongodb. I had the same issue dealing with altering the table to either add columns for additional features or indexes on a large table with millions of rows and high writes. It would end up locking for a very long time so doing this on the LIVE database would frustrate our users. On small tables you can get away with it.
I hate the fact that we have to "design our tables to avoid altering them". I just don't think that works in today's website world. You can't predict how people will use your software that's why you rapidly change things based on user feedback. With mongodb, you can add "columns" at will with no downtime. You don't really even add them, you just insert data with new columns and it does it automatically.
Worth checking out: www.mongodb.com
In general, the answer is going to be "No". You're changing the structure of the table which potentially will require a lot of updates" and I definitely agree with that. If you expect to be doing this often, then I'll offer an alternative to "dummy" columns - use VIEWs instead of tables for SELECTing data. IIRC, changing the definition of a view is relatively lightweight and the indirection through a view is done when the query plan is compiled. The expense is that you would have to add the column to a new table and make the view JOIN in the column.
Of course this only works if you can use foreign keys to perform cascading of deletes and whatnot. The other bonus is that you can create a new table containing a combination of the data and point the view to it without disturbing client usage.
Just a thought.
The difference between Postgres and MySQL in this regard is that in Postgres it doesn't re-creates a table, but modifies data dictionary which is similar to Oracle. Therefore, the operation is fast, while it's still requires to allocate an exclusive DDL table lock for very short time as stated above by others.
In MySQL the operation will copy data to a new table while blocking transactions, which has been main pain for MySQL DBAs prior to v. 5.6.
The good news is that since MySQL 5.6 release the restriction has been mostly lifted and you now can enjoy the true power of the MYSQL DB.
As SeanDowney has mentioned, pt-online-schema-change is one of the best tools to do what you have described in the question here. I recently did a lot of schema changes on a live DB and it went pretty well. You can read more about it on my blog post here: http://mrafayaleem.com/2016/02/08/live-mysql-schema-changes-with-percona/.
Dummy columns are a good idea if you can predict their type (and make them nullable). Check how your storage engine handles nulls.
MyISAM will lock everything if you even mention a table name in passing, on the phone, at the airport. It just does that...
That being said, locks aren't really that big a deal; as long as you are not trying to add a default value for the new column to every row, but let it sit as null, and your storage engine is smart enough not to go writing it, you should be ok with a lock that is only held long enough to update the metadata. If you do try to write a new value, well, you are toast.
TokuDB can add/drop columns and add indexes "hot", the table is fully available throughout the process. It is available via www.tokutek.com
Not really.
You ARE altering the underlying structure of the table, after all, and that's a bit of information that's quite important to the underlying system. You're also (likely) moving much of the data around on disk.
If you plan on doing this a lot, you're better off simply padding the table with "dummy" columns that are available for future use.