Two scenarios:
Large application - One database w/all tables
or
Large application - Multiple databases w/relevant tables
Can anyone list the advantages/disadvantages?
Unless you have very specific reasons, keep everything in one single database. I cannot think of a single advantage of splinting one single schema into multiple DBs.
One single database creates one single unit of recovery, which allows for a consistent backup. It also presents one single unit of failover for high availability. With multiple databases one cannot take a consistent backup unless it freezes activity (often impossible). also multiple databases pose challenges in orchestrating a 'group' failover in case of failure (some DBs may failover to a new server, while other may stay behind).
Multiple databases offer advantages in multi-tenant models where each tenant can have its own database, specially if tenants may choose or opt-in on version upgrades (this is impossible with single DB). But this is a scenario with many databases having the same schema (same tables in every database), not splitting a schema across several DBs.
Scale out by data partitioning (sharding) can only be achieved by having multiple databases, but that is a different topic from splitting a database into 'parts' (each DB with a different schema). Shards have identical schema, but contain data for specific ranges.
Related
I'm creating a Microsoft SQL server that initially only served one client but am now looking to have many (Up to several thousand if things go well). The entire structure will be the same for each client with only the data within each table being client specific.
I am thinking of adding ClientID to almost all tables and referencing this in all functions (basically a where ClientID = #ClientID on every statement). Along with a Clients table that gains a new entry for every new client
The alternative being a create database [Client_Name] script that is fired whenever a new client joins the server to create another client specific database and all its associated structure and procedures.
Is there any advantage performance wise to either option?
The decision on how to structure such a database should not be made only on performance issues. In fact, that is probably the least of the issues. Some things to consider:
How will you manage updates to your application? Multiple databases can make this easier or harder.
Will individual clients have customizations? This favors multiple databases.
What are the security requirements for the data? This can go either way.
What are the replication and recovery requirements for the data? This would tend to be easier with one database, but not in all scenarios.
Will concurrent usage by different clients interfere with each other?
Will clients be responsible for managing their own data or is this part of your offering?
Is any data shared among clients? How will you maintain common reference tables?
In general, performance is going to be better with a single database (think half-filled data pages occupying memory). Maintenance and development will be easier with a single database (managing multiple client databases is cumbersome). But actual requirements on the application should be driving such a decision.
I have a scenario, my application is a SAAS based app catering to multiple clients. Data Integrity to clients is very essential.
Is it better to keep my Tables
Client specific
OR
Relational Tables
For Ex: I have a mapping table with fields MapField1,MapField2. I need this kind of data for each client.
Should I have tables like MappingData_
or a Single Table with mapping to the ClientId
MappingData with Fields MapField1,MapField2,ClientId
I would have a separate database for each customer. (Multiple databases in a single SQL Server instance.)
This would allow you to design it once, with a single schema.
No dynamically named tables compromising test & development
Upgrades and maintenance can be designed and tested in one DB, then rolled out to all
A single customer's data can be backed-up, restored or dropped exceedingly simply
Bugs discovered/exploited in one DB won't comprise the integrity of other DBs
Data access (read and write) can be managed using SQL Logins (No re-inventing the wheel)
If there is a need for globally shared data, that would go in another database, with it's own set of permissions for the different SQL Logins.
The use of a single database, with all users in it is my next best choice. You still have a single schema. But you don't get to partition the customers' data, you need to manage access rights and permissions yourself, and a whole host of other additional design and testing work.
I would never go near dynamically creating new tables for additional customers. A new table name means all your queries need to be updated with the new table name, and a whole host of other maintenance head-aches.
I'm pretty much of the opinion that if you want to create tables dynamically during the Business As Usual use of an application/service, you've designed it badly.
SO has a tag for the thing you're describing: "multi-tenant".
Visualize the architecture for supporting a multi-tenant database application as a spectrum. At one extreme of the spectrum is "shared nothing", which means each tenant has its own database. At the other extreme of the spectrum is "shared everything", which means tenants share tables, and each row in each table belongs to one tenant. (Each row contains a tenant identifier.)
Terminology seems to overlap, so read carefully. What one writer means by shared schema might be identical to what another writer means by shared everything.
This SO answer, also written by me, describes the differences and the tradeoffs in terms of cost, data isolation and protection, maintenance, and disaster recovery. It also links to a fairly good introductory article.
I'm planing a webproject, containing 4 websites build in MVC3. As a databaseserver I'm going to use the ms sql server.
Each of this websites will have something arround 40 tables. But some of the tables are shared between the websites:
Contact, Cities, Postalcodes, Countries...
How to handle this? should I put all the tables of each database into a common database (so that the database of website 1,2,3 and website 4 are in one databse together). Or should I create one database containing shared datase?
But then I think I'm getting problems with the data consitency, because I think there is no way to point from one database to an other (linking for example the citytable in database one to the buldingtable in databse 2).
Any ideas?
Thanks a lot!
What I like about splitting it out into separate databases is that if each web site has its own database, and one of those web sites gets extremely popular, it is very easy to just move their database to a different, more powerful database server and not much has to change except (a) you need to reference the central "control" data remotely (or replicate/mirror/etc), and (b) you point that web site at a different database server. Another benefit is that if two web sites have the same types of tables (e.g. Patients), you don't have to have tables like Patients_WebSite1, Patients_WebSite2, with different stored procedures that are identical except for table names (or ugly dynamic SQL procedures that paste the table name in). Separated out you can have the exact same schema and the exact same codebase without having to combine everyone's data into a single table.
If you mix the data within a single database, data consistency is easier, and the whole setup is slightly simpler, but splitting it out when you grow is a lot tougher. If you split it out into different databases, no you won't be able to enforce referential integrity using standard DRI (foreign keys). You can accomplish this in other ways if it is important (triggers, validation before insert/update, etc).
I have two applications using two nearly identical MySQL databases within the same cluster. Some tables must contain separate data, but others should hold identical contents (i.e. all writes and rows in db1.tbl should be accessible in db2.tbl and vice versa).
What's the proper way to go about this? Note that the applications use hardcoded table (but not database) names, so simply telling application 2 to access db1.tbl is not an option.
What you need to do is set up replication for the tables that you need. See http://dev.mysql.com/doc/refman/5.0/en/replication.html for the documentation on setting up replication in MySQL.
For databases on different mysqld processes
You should check the official manual for replicating individual tables:
http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#option_mysqld_replicate-do-table
You can setup an Master-Master relation between the two mysql processes just keep in mind to be carefull and have uniqueness on your Primary Key.
For databases residing on the same server & mysqld service
IMHO design wise you should consider the idea of moving all your shared tables under a different DB.
This way you will avoid all the overkill of triggers for updating them.
I have a database server with few main databases, and few dozens of small ones.
These small databases are kind of intermediary/staging databases for data import from various sources into main database. Data import is a daily task. They are all quite similar in structure as the implementation of these data imports are similar, so basically they have a configuration tables, which define mapping, conversions etc, and the data tables, which contain the results of the import.
Some time ago there have been only the handful of small ones, but now I have more then 20 of them will grow further with the number of supported data feeds.
I have just migrated all the server environment to SQL Server 2008, and having some time now for clean-up/refactoring, I am thinking to merge all of data-import databases into just one database, and use database schema to separate them.
Question-0: Any other ideas for the described situation?
Question-1: Shall I change from a separate database to a separate schema?
Question-2: !!!: Any tricky thing to be careful about in database schema implementation?
Edit-1: highlighted question-2 as the most 'unanswered' currently.
In your instance, I would probably put merge the databases into one. I don't really see a reason to have them separated, and merging them will reduce the amount of work you have to do to support backups etc. If you were importing data from a data source once and then never using the staging tables again, I could see the reason to bring up separate databases to handle the data transformation. Since you use these tables on an ongoing basis, I would much rather keep them together so that I only have to go to one place to find the full end to end state of the production data and the data load states.
2008 is really good at handling database partitioning too, if the db gets too large, or you need to separate data for security reasons you get the benefit of having a single db with the advantages like having several smaller ones. You won't get that with multiple smaller dbs.
When we migrated we had a very similar situation and I ended up moving everything into one some-what large Importing database like you have hinted towards. We did not, however, separate them using schemas.
Because the database is the unit of referential integrity and backup, if you are bringing in large amounts of data for staging which does not need to be backed up on the same schedule, it might be easiest to keep it in a separate DB.
You can use a single DB with multiple file groups and different backups, but it will require a lot more design.
The basic factors this will depend on are: recovery model, backup objectives, usage patterns and amount of effort to design and maintain your file group design.
All the prior answers work for me, particularly your comment about selectively combining databases -- if some are very busy, very large, or process sensitive data, you might want to keep them separate, or in separate groupings. This would make it easier to configure backups/restores and disk/drive allocation (give the busy ones their own set of spindles).
Like possibly most database developers, I have dealt almost exclusively with objects in the dbo schema, but I have done some recent work with other schemas. The main gotcha I've encountered is remembering to always specify the schema when referring to any database object. Never assume that any given connection will reference an object in the schema you want it to--always be clear and precise!
I would put all your import staging tables in one database separate from your regular production databse as the backup needs may be very different. This database should also contains things like your configuration management for SSIS packages, any logging tables, any import metadata tables (we keep track of every run of the imports and the status of that run as well as a bazillion other things about the import like the filename, the normal file size, etc. Comes in handy for researching problems and for adding checks to the processing. We usea a schema that is by client and then an additional schema for objects realted to the importing/exporting process (logs, meta data etc.)