I'm planing a webproject, containing 4 websites build in MVC3. As a databaseserver I'm going to use the ms sql server.
Each of this websites will have something arround 40 tables. But some of the tables are shared between the websites:
Contact, Cities, Postalcodes, Countries...
How to handle this? should I put all the tables of each database into a common database (so that the database of website 1,2,3 and website 4 are in one databse together). Or should I create one database containing shared datase?
But then I think I'm getting problems with the data consitency, because I think there is no way to point from one database to an other (linking for example the citytable in database one to the buldingtable in databse 2).
Any ideas?
Thanks a lot!
What I like about splitting it out into separate databases is that if each web site has its own database, and one of those web sites gets extremely popular, it is very easy to just move their database to a different, more powerful database server and not much has to change except (a) you need to reference the central "control" data remotely (or replicate/mirror/etc), and (b) you point that web site at a different database server. Another benefit is that if two web sites have the same types of tables (e.g. Patients), you don't have to have tables like Patients_WebSite1, Patients_WebSite2, with different stored procedures that are identical except for table names (or ugly dynamic SQL procedures that paste the table name in). Separated out you can have the exact same schema and the exact same codebase without having to combine everyone's data into a single table.
If you mix the data within a single database, data consistency is easier, and the whole setup is slightly simpler, but splitting it out when you grow is a lot tougher. If you split it out into different databases, no you won't be able to enforce referential integrity using standard DRI (foreign keys). You can accomplish this in other ways if it is important (triggers, validation before insert/update, etc).
Related
What are some ways that these two databases can [INSERT INTO] Between each other or use commands or display the data in the table of the two databases ?
Not directly, but there are ways...
What you're asking about is called "cross database queries".
Each database in PostgreSQL has its own system tables and ways of keeping itself organized. Queries between two databases can break this, even for databases hosted by the same database server.
But there are ways to achieve what you want
Single database, multiple schemas
Instead of two databases, you can run one database with two schemas. The keeps the tables, views, etc separated and easier to maintain, and allows queries between the two. It also allows security and data isolation for users who are only allowed to access one of the schemas.
You're actually already using a schema in PostgreSQL called "public"; adding more schemas simply extends this.
See the documentation.
Foreign Data Wrappers
Foreign Data Wrappers (fdw) allow you to "link" a schema (or just tables, if you prefer) in another database. See the documentation for CREATE SERVER, and this seems like a pretty clear blog post on the subject.
Note that Foreign Data Wrappers will allow you to link to other databases than just PostgreSQL e.g. Oracle, SQL Server, MySQL, and lots more. See here.
I have a scenario, my application is a SAAS based app catering to multiple clients. Data Integrity to clients is very essential.
Is it better to keep my Tables
Client specific
OR
Relational Tables
For Ex: I have a mapping table with fields MapField1,MapField2. I need this kind of data for each client.
Should I have tables like MappingData_
or a Single Table with mapping to the ClientId
MappingData with Fields MapField1,MapField2,ClientId
I would have a separate database for each customer. (Multiple databases in a single SQL Server instance.)
This would allow you to design it once, with a single schema.
No dynamically named tables compromising test & development
Upgrades and maintenance can be designed and tested in one DB, then rolled out to all
A single customer's data can be backed-up, restored or dropped exceedingly simply
Bugs discovered/exploited in one DB won't comprise the integrity of other DBs
Data access (read and write) can be managed using SQL Logins (No re-inventing the wheel)
If there is a need for globally shared data, that would go in another database, with it's own set of permissions for the different SQL Logins.
The use of a single database, with all users in it is my next best choice. You still have a single schema. But you don't get to partition the customers' data, you need to manage access rights and permissions yourself, and a whole host of other additional design and testing work.
I would never go near dynamically creating new tables for additional customers. A new table name means all your queries need to be updated with the new table name, and a whole host of other maintenance head-aches.
I'm pretty much of the opinion that if you want to create tables dynamically during the Business As Usual use of an application/service, you've designed it badly.
SO has a tag for the thing you're describing: "multi-tenant".
Visualize the architecture for supporting a multi-tenant database application as a spectrum. At one extreme of the spectrum is "shared nothing", which means each tenant has its own database. At the other extreme of the spectrum is "shared everything", which means tenants share tables, and each row in each table belongs to one tenant. (Each row contains a tenant identifier.)
Terminology seems to overlap, so read carefully. What one writer means by shared schema might be identical to what another writer means by shared everything.
This SO answer, also written by me, describes the differences and the tradeoffs in terms of cost, data isolation and protection, maintenance, and disaster recovery. It also links to a fairly good introductory article.
I have a database server with few main databases, and few dozens of small ones.
These small databases are kind of intermediary/staging databases for data import from various sources into main database. Data import is a daily task. They are all quite similar in structure as the implementation of these data imports are similar, so basically they have a configuration tables, which define mapping, conversions etc, and the data tables, which contain the results of the import.
Some time ago there have been only the handful of small ones, but now I have more then 20 of them will grow further with the number of supported data feeds.
I have just migrated all the server environment to SQL Server 2008, and having some time now for clean-up/refactoring, I am thinking to merge all of data-import databases into just one database, and use database schema to separate them.
Question-0: Any other ideas for the described situation?
Question-1: Shall I change from a separate database to a separate schema?
Question-2: !!!: Any tricky thing to be careful about in database schema implementation?
Edit-1: highlighted question-2 as the most 'unanswered' currently.
In your instance, I would probably put merge the databases into one. I don't really see a reason to have them separated, and merging them will reduce the amount of work you have to do to support backups etc. If you were importing data from a data source once and then never using the staging tables again, I could see the reason to bring up separate databases to handle the data transformation. Since you use these tables on an ongoing basis, I would much rather keep them together so that I only have to go to one place to find the full end to end state of the production data and the data load states.
2008 is really good at handling database partitioning too, if the db gets too large, or you need to separate data for security reasons you get the benefit of having a single db with the advantages like having several smaller ones. You won't get that with multiple smaller dbs.
When we migrated we had a very similar situation and I ended up moving everything into one some-what large Importing database like you have hinted towards. We did not, however, separate them using schemas.
Because the database is the unit of referential integrity and backup, if you are bringing in large amounts of data for staging which does not need to be backed up on the same schedule, it might be easiest to keep it in a separate DB.
You can use a single DB with multiple file groups and different backups, but it will require a lot more design.
The basic factors this will depend on are: recovery model, backup objectives, usage patterns and amount of effort to design and maintain your file group design.
All the prior answers work for me, particularly your comment about selectively combining databases -- if some are very busy, very large, or process sensitive data, you might want to keep them separate, or in separate groupings. This would make it easier to configure backups/restores and disk/drive allocation (give the busy ones their own set of spindles).
Like possibly most database developers, I have dealt almost exclusively with objects in the dbo schema, but I have done some recent work with other schemas. The main gotcha I've encountered is remembering to always specify the schema when referring to any database object. Never assume that any given connection will reference an object in the schema you want it to--always be clear and precise!
I would put all your import staging tables in one database separate from your regular production databse as the backup needs may be very different. This database should also contains things like your configuration management for SSIS packages, any logging tables, any import metadata tables (we keep track of every run of the imports and the status of that run as well as a bazillion other things about the import like the filename, the normal file size, etc. Comes in handy for researching problems and for adding checks to the processing. We usea a schema that is by client and then an additional schema for objects realted to the importing/exporting process (logs, meta data etc.)
We have a SQL server that has a database for each client, and we have hundreds of clients. So imagine the following: database001, database002, database003, ..., database999. We want to combine all of these databases into one database.
Our thoughts are to add a siteId column, 001, 002, 003, ..., 999.
We are exploring options to make this transition as smoothly as possible. And we would LOVE to hear any ideas you have. It's proving to be a VERY challenging problem.
I've heard of a technique that would create a view that would match and then filter.
Any ideas guys?
Create a client database id for each of the client databases. You will use this id to keep the data logically separated. This is the "site id" concept, but you can use a derived key (identity field) instead of manually creating these numbers. Create a table that has database name and id, with any other metadata you need.
The next step would be to create an SSIS package that gets the ID for the database in question and adds it to the tables that have to have their data separated out logically. You then can run that same package over each database with the lookup for ID for the database in question.
After you have a unique id for the data that is unique, and have imported the data, you will have to alter your apps to fit the new schema (actually before, or you are pretty much screwed).
If you want to do this in steps, you can create views or functions in the different "databases" so the old client can still hit the client's data, even though it has been moved. This step may not be necessary if you deploy with some downtime.
The method I propose is fairly flexible and can be applied to one client at a time, depending on your client application deployment methodology.
Why do you want to do that?
You can read about Multi-Tenant Data Architecture and also listen to SO #19 (around 40-50 min) about this design.
The "site-id" solution is what's done.
Another possibility that may not work out as well (but is still appealing) is multiple schemas within a single database. You can pull common tables into a "common" schema, and leave the customer-specific stuff in customer-specific schema. In some database products, however, the each schema is -- effectively -- a separate database. In other products (Oracle, DB2, for example) you can easily write queries that work in multiple schemas.
Also note that -- as an optimization -- you may not need to add siteId column to EVERY table.
Sometimes you have a "contains" relationship. It's a master-detail FK, often defined with a cascade delete so that detail cannot exist without the parent. In this case, the children don't need siteId because they don't have an independent existence.
Your first step will be to determine if these databases even have the same structure. Even if you think they do, you need to compare them to make sure they do. Chances are there will be some that are customized or missed an upgrade cycle or two.
Now depending on the number of clients and the number of records per client, your tables may get huge. Are you sure this will not create a performance problem? At any rate you may need to take a fresh look at indexing. You may need a much more powerful set of servers and may also need to partion by client anyway for performance.
Next, yes each table will need a site id of some sort. Further, depending on your design, you may have primary keys that are now no longer unique. You may need to redefine all primary keys to include the siteid. Always index this field when you add it.
Now all your queries, stored procs, views, udfs will need to be rewritten to ensure that the siteid is part of them. PAy particular attention to any dynamic SQL. Otherwise you could be showing client A's information to client B. Clients don't tend to like that. We brought a client from a separate database into the main application one time (when they decided they didn't still want to pay for a separate server). The developer missed just one place where client_id had to be added. Unfortunately, that sent emails to every client concerning this client's proprietary information and to make matters worse, it was a nightly process that ran in the middle of the night, so it wasn't known about until the next day. (the developer was very lucky not to get fired.) The point is be very very careful when you do this and test, test, test, and test some more. Make sure to test all automated behind the scenes stuff as well as the UI stuff.
what I was explaining in Florence towards the end of last year is if you had to keep the database names and the logical layer of the database the same for the application. In that case you'd do the following:
Collapse all the data into consolidated tables into one master, consolidated database (hereafter referred to as the consolidated DB).
Those tables would have to have an identifier like SiteID.
Create the new databases with the existing names.
Create views with the old table names which use row-level security to query the tables in the consolidated DB, but using the SiteID to filter.
Set up the databases for cross-database ownership chaining so that the service accounts can't "accidentally" query the base tables in the consolidated DB. Access must happen through the views or through stored procedures and other constructs that will enforce row-level security. Now, if it's the same service account for all sites, you can avoid the cross DB ownership chaining and assign the rights on the objects in the consolidated DB.
Rewrite the stored procedures to either handle the change (since they are now referring to views and they don't know to hit the base tables and include SiteID) or use InsteadOf Triggers on the views to intercept update requests and put the appropriate site specific information into the base tables.
If the data is large you could look at using a partioned view. This would simplify your access code as all you'd have to maintain is the view; however, if the data is not large, just add a column to identify the customer.
Depending on what the data is and your security requirements the threat of cross contamination may be a show stopper.
Assuming you have considered this and deem it "safe enough". You may need/want to create VIEWS or impose some other access control to prevent customers from seeing each-other's data.
IIRC a product called "Trusted Oracle" had the ability to partition data based on such a key (about the time Oracle 7 or 8 was out). The idea was that any given query would automagically have "and sourceKey = #userSecurityKey" (or some such) appended. The feature may have been rolled into later versions of the popular commercial product.
To expand on Gregory's answer, you can also make a parent ssis that calls the package doing the actual moving within a foreach loop container.
The parent package queries a config table and puts this in an object variable. The foreach loop then uses this recordset to pass variables to the package, such as your database name and any other details the package might need.
You table could list all of your client databases and have a flag to mark when you are ready to move them. This way you are not sitting around running the ssis package on 32,767 databases. I'm hooked on the foreach loop in ssis.
Most questions about tenancy are centered around multi-tenancy database design issues. I want to know about single tenancy but multiple applications. The software I'm developing allows for a single user to create, from a single code base, multiple applications(I call them "sections"):
user could create a blog inside domain.com/application-blog1, another blog on domain.com/application-blog2.
I've already decided for a single database for everything But I am undecided whether or not I should use multiple tables for different application instances or the same table, maybe with a "sectionId" field to distinguish between them.
I'm using mysql and myisam tables. Could storing everything inside the same table lead to locking issues in the case of having many application instances running?
What's your experience on the subject?
I don't think people will typically use multiple tables in the same database. If you have multiple instances of the same application, you often have separate databases - typically only if new instances are created administratively, rather than through end-user actions. In this case, you'ld put the name of the database into a configuration file, and have the software connect to the right database.
In your case, I would go for the single-schema single-database approach, using sectionIds. This really is the same as multi-tenancy, perhaps minus the need to do access control.
You will of course have locking across concurrent transactions. However, this should never cause problems, since transactions for different sections won't operate on records in a conflicting manner (except when new sections are created - you'll probably have another table telling you what sections you have).