What strategies are available for migrating Access databases to SQL server-based applications? - sql

I'm considering undertaking a project to migrate a very large MS Access application to a new system based on SQL Server. The existing system is essentially an ERP application with a couple of dozen users, all sharing the Access database over the network. The database has around 300 tables and lots of messy VBA code. This system is beginning to break down (actually, it's amazing it has worked as long as it has).
Due to the size and complexity of the Access application, a 'big bang' approach is not really feasible. It seems sensible to rope off chunks of functionality and migrate them piecemeal to the new system. During the migration process, which I expect to take several months, there may be a need for both databases to be in operation and be able to query and modify data in both systems.
I have considered using something like the ADO.NET Entity Framework to implement a data abstraction layer to handle this, but as far as I can tell, the Entity Framework has no Access provider.
Does my approach seem reasonable? What other strategies have people used to accomplish similar goals?

You may find that the main problem is using the MS Access JET engine as the backend. I'm assuming that you do have an Access FE (frontend) with all objects except tables, and a BE (backend - tables only).
You may find that migrating the data to SQL Server, and linking the Access FE to that, would help alleviate problems immediately.
Then, if you don't want to continue to use MS Access as the FE, you could consider breaking it up into 'modules', and redesign modules one by one using a separate development platform.

We faced a similar situation a few years ago, but we knew from the beginning that we'll have to swich one day to SQL SERVER, so the whole code was written to work from an Access client to both Access AND SQL server databases.
The idea of having a 'one-step' migration to SQL server is certainly the easier way to manage this on the database side, and there are many tools for that. But, depending on the way your client app talks to the database, your code might then not work properly. If, for example, your code includes a lot of SQL instructions (or generates them on the fly by, for example, adding filters to SELECT instructions), your syntax might not be 'SQL server' compatible: access wildcards, dates, functions, will not work on SQL server.
In addition to this, and as said by #mjv, the other drawback of a one time switch to MS SQL is that you will inheritate many of the problems from the original database: wrong or inapropriate field names, inapropriate primary/foreign key policies, hidden one-to-many relations that you'd like to implement in the new database model, etc.
I'll propose here some principles and rules to implement a 'soft transition' solution, which clearly best fits you. Just to say that it's not going to be easy, but it's definitely very interesting, paticularly when dealing with 300 tables! Lucky you!
I assume here that yo have the ability to update the client code, and you'd prefer to keep at all times the same client interface. It is of course possible to have at transition time two different interfaces, one for each database, but this will be very confusing for the users, and a permanent source of frustration for them.
According to me, the best solution strongly depend on:
The original connection technology,
and the way data is managed in your
client's code: Access linked tables,
ODBC, ADODB, recordset, local
tables, forms recordsources, batch
updating, etc.
The possibilities to split your
tables and your app in 'mostly
independant' modules.
And you will not spare the following mandatory activities:
setup up of a transfer
procedure from Access database to SQL server. You
can use already existing tools (The
access upsizing wizard is very poor,
so do not hesitate to buy a real
one, like SSW or EMS SQL Manager,
very powerfull) or build your own
one with Visual Basic. If your plan
is to make some changes in Data
Definition, you'll definitely have
to write some code. Keep in mind
that you will run this code
maaaaaany times, so make sure that
it includes all time-saving
instructions that will allow you to
restart the process from the start
as many times as you want. You will
have to choose between 2 basic data
import strategies when importing data:
a - DELETE existing record, then INSERT imported record
b - UPDATE existing record from imported record
If you plan to switch to new Primary\foreign key types, you'll have to keep track of old identifiers in your new database model during the transition period. Do not hesitate to switch to GUID Primary Keys at this stage, especially if the plan is to replicate data on multiple sites one of these days.
This transfer procedure will be divided in modules corresponding to the 'logical' modules defined previously, and you should be able to run any of these modules independantly (keeping of course in mind that they'll probably have to be implemented in a specific order, where the 'customers' module has to run before the 'invoicing' module).
implement in your client's code the possibility to connect to both original ms-access database and new MS SQL server. Ideally, you should be able to manage from within your code both connections for displaying and validating data.
This possibility will be implemented by modules, where you will have, for each of them, a 'trial period', ie the possibility to choose at testing time between access connection and sql connection when using the module. Once testing is done and complete, the module can then be run in exclusive SQL server mode.
During the transfer period, that can last a few months, you will have to manage programatically the database constraints that exist between 'SQL server' modules and 'Access' modules. Going back to our customers/invoicing example, the customers module will be first switched to MS SQL. Before the Invoicing module can be switched, you'll have to implement programmatically the one to many relations between Customers and Invoices, where each of the tables will be in a different database. Such a constraint can be implemented on the Invoice form by populating the Customers combobox with the Customers recordset from the SQL server.
My proposal is to build your modules following your database model, allways beginning with the 'one' tables or your 'one-to-many' relations: basic lists like 'Units', 'Currencies', 'Countries', shall be switched first. You'll have a first 'hands on' experience in writting data transfer code, and managing a second connection in your client interface. You'll be then able to 'go up' in your database model, switching the 'products' and 'customers' tables (where units, countries and currencies are foreign keys) to the new server.
Good luck!

I would second the suggestion to upsize the back end to SQL Server as step 1.
I would never go to the suggested Step 2, though (i.e., replacing the Access front end with something else). I would instead suggest investing the effort in fixing the flaws of the schema, and adjusting the Access app to work with the new schema.
Obviously, it is never the case that everything just works hunky dory when you upsize -- some things that were previously quite fast will be dogs, and some things that were previously quite slow will be fast. And I've found that it is often the case that the problems are very often not where you anticipate that they will be. You can only figure out what needs to be fixed by testing.
Basically, anything that works poorly gets re-architected, or moved entirely server-side.
Leverage the investment in the existing Access app rather than tossing all that out and starting from scratch. Access is a fine front end for a SQL Server back end as long as you don't assume it's going to work just the same way as it would with a Jet/ACE back end.

...thinking out loud... I think this may work.
I appears that the complexity of the application resides in the various VBA modules rather than the database table/schema themselves. A possible migration path could therefore be to first migrate the data storage to SQL server, exactly as-is, as follow:
prevent any change to the data for a few hours
duplicate all tables to the SQL server; be sure to create the same indexes as well.
create linked tables to ODBC Source pointing to the newly created tables on SQL Server
these tables should have the very same name as the original tables (which therefore may require being renamed, say with a leading underscore, for possible reference).
Now, the application can be restarted and should be using the SQL tables rather than the Access tables. All logic should work as previously (right...), possible slowness to be expected, depending on the distance between the two machines.
All the above could be tested in about a day's work or so; the most tedious being the creation of the tables on SQL server (much of that can be automated, I'm sure). The next most tedious task is to assert that the application effectively works as previously, but with its storage on SQL.
EDIT: As suggested by a comment, I should stress that there is a [fair ?] possibility that the application would not readily work so smoothly under SQL server back-end, and could require weeks of hard work in testing and fixing. However, and unless some of these difficulties can be anticipated because of insight into the application not expressed in the question, I propose that attempting the "As-is" migration to SQL Server should be considered; after all, it may just work with minimal effort, and if it doesn't, we'd know this very quickly. This is therefore a hi-return, low risk proposal...
The main advantage sought with this approach is that there will be a single storage during the [as the OP expects] longer period during which the old Access application will co-exist with the new application.
The drawback of this approach, is that, at least at first, the schema of original database is reproduced verbatim, i.e. including some of its known quirks and legacy-herited idiosyncrasies. These schema issues (and the underlying application logic) can be in time corrected, but this is of course less easy than if the new application starts ab initio, with its own, separate, storage, and distinct schema.
After the storage is moved to SQL server, the most used and/or the most independent modules of the Access application can be re-written in the new application, and as significant portions of the original application is ported, effective usage, by select beta testers or by actual users can start to be switched to the new application.
Possibly, some kind of screen-scraping based logic or some other system could be used to produce an hybrid application which would provide the end users with a comprehensive application, which sometimes work from new logic, and sometimes from the original MS-Access program.

Related

Updating SQL Schema in Multiple SQL Databases at One Time

Our issue is we have an online application with personally identifiable data. We have sold this application to multiple customers and the law in their States says that the data MUST be physically in their State. So this is why we have the identical database (not identical data) in different locations.
Right now we use RedGate SQL Compare, but as we continue to grow, doing this eight, nine, ten times for every update (be it a small stored procedure bug fix or a larger change creating a new table) is becoming more and more inefficient. Marketing is telling us five more states are on the way.
We've looked into a RedGate method, but its more coding and troubleshooting than its worth.
So...any ideas how to update the SCHEMA from one to many databases?
There is a function in SQL Management Studio that works. In SMS use CTRL-ALT-G. This brings up 'Registered Servers'. Under Local Groups you can create groups. Say one for testing and one for live. You then right-click on the local group you created and choose "New Server Registration". Under General tab you give it a name and then in Connection Properties tab, you select just one database. Keeping adding "New Server Registrations" for each database you want in the group. When done, just right click on your Group and choose New Query. Anything you put in there will run on ALL the databases in the group.
So, if all our databases are identical, and you need to make an update, use Redgate to do a Compare. Choose 'Create a Deployment Script' instead of 'Deploy Using SQL Compare' and copy the SQL. Right-click on the group and say "new query" paste and execute.
I'm assuming this is SQL Server since you specified "RedGate SQL Compare" and not "MySQL Compare". If it's not SQL Server, ignore this.
Without having to adopt a new toolset (or even pay RedGate for something) and since the database (not the data) is identical, you could set up a Central Management Server (Microsoft documentation on that here), register each individual SQL Server instance, build your deploy script (you can still use SQL Compare for this), and then use the CMS to simultaneously push the schema changes as you need them to all of the instances or to defined groups as you like.
This would assume you're using windows authentication for all the servers and that whoever does the deployments would have the same access across all of the servers, but it's a pretty decent solution for multi-server administration of this type in general and it's a solid feature that's been around for while (2008).
I work for Redgate, so I'd love to promote it even more, however, let's ignore it for the moment.
If you want to automate deployments to lots of servers at once, I'd suggest you look to tooling like Azure DevOps Pipelines or AWS DeveloperTools, or even a 3rd party product like Octopus or Jenkins. The idea is simple, use any tool you like, right up to just your keyboard, to create the artifacts needed for deployment (your T-SQL scripts for SQL Server). Then, the agents in one of these flow control tools does the heavy lifting of ensuring that script gets deployed to multiple locations. Because you can configure these agents with independent security, you don't have to have the same levels of security yourself that you'd need to control stuff through SSMS or the Central Management Server. Further, this method allows for very easy parallel execution. The only way you can do that yourself is through some pretty extensive PowerShell (or Python) work.
As much as I'd like to promote Redgate as part of this solution, it's actually not necessary (it's just better). You can generate the necessary artifacts any way you want. The important point is being able to control exactly how they get deployed, dealing with tracking the successful and failed deployments, varying levels of necessary security, all this stuff. That's exactly what tools like those I mention before are intended to do.
Also, yeah, this is a ton of work. Automating deployments is absolutely the way to go. However, it's not without labor. Instead of spending your time doing manual processes, prone to error, repetitive, boring and slow, you spend time, and effort, automating stuff. It's not so much that work gets eliminated, rather it gets reoriented. Then, you get all the benefits of that automation. However, you do have to maintain it, grow it, expand it, and deal with issues within it. All work.

Possibilities for external database with MS Access 2010

This question is quite general, however, i can not find a good answer for it.
What are the possibilities for using an external database with MS Access?
I see that MySQL can be used, but I would have to setup a ODBC connection and install drivers on every machine. The issue is that I have a software developed in MS Access that uses a lot of data, and it gets very slow at processing the data when i include a lot of data.
The software analyzes data from wind turbines, so it is used by different customers and it may contain a lot of different turbines with 50,000+ rows in each data set.
I would like these turbine data to be stored in a separate file that is pointed to by MS Access, so I include the software + whatever turbine data wanted.
As it is now, i have a lot of Access database files where the data is included in the software. It becomes impossible to keep track of - Especially when I do an edit to the source code of the software, which is do a lot these days.
Another issue is that the users may only have Access Runtime.
What are my options here? Is the best method to use the Access Link function?
Best regards, Emil.
Edit:
SQL's - Can they be combined? :
SELECT q_DataLimited.YAW001, q_DataLimited.YAW002
FROM q_DataLimited
WHERE (((q_DataLimited.YAW002)>Degree_dsp() And (q_DataLimited.YAW002)<Degree_dsp_high()));
And
SELECT Count(q_WindRose_PCU.YAW001) AS CountOfYAW0011
FROM q_WindRose_PCU;
Edit 2:
Public Degree As Long
Public Function Degree_dsp() As Long
Degree_dsp = Degree * 20
End Function
I have the degree as a counter outside the function in a form being:
For Degree = 0 To 17
DoCmd.OpenQuery "q_WindRose_PCU"
DoCmd.Close
Next Degree
Edit 3:
How to combine a query and the append of it to a table?
SELECT q_PowerBinned.Bin, Avg(q_PowerBinned.POW001) AS AvgOfPOW001, StDev(q_PowerBinned.POW001) AS StDevOfPOW001, Avg(q_PowerBinned.WSP001) AS AvgOfWSP001, StDev(q_PowerBinned.WSP001) AS StDevOfWSP001, Avg(q_PowerBinned.POW002) AS AvgOfPOW002, StDev(q_PowerBinned.POW002) AS StDevOfPOW002, Avg(q_PowerBinned.WSP002) AS AvgOfWSP002, StDev(q_PowerBinned.WSP002) AS StDevOfWSP002, Count(q_PowerBinned.Bin) AS CountOfBin
FROM q_PowerBinned
GROUP BY q_PowerBinned.Bin;
And then the append of the above to a table:
INSERT INTO t_Average_Stored ( Bin, PowAvg001, WindAvg001, PowAvg002, WindAvg002, n_samples, PowDev001, WindDev001, PowDev002, WindDev002 )
SELECT q_Average_Temp.Bin, q_Average_Temp.AvgOfPOW001, q_Average_Temp.AvgOfWSP001, q_Average_Temp.AvgOfPOW002, q_Average_Temp.AvgOfWSP002, q_Average_Temp.CountOfBin, q_Average_Temp.StDevOfPOW001, q_Average_Temp.StDevOfWSP001, q_Average_Temp.StDevOfPOW002, q_Average_Temp.StDevOfWSP002
FROM q_Average_Temp;
I see already a few suggestions in the comments, but I am going to answer the general question you posted. In short, the possibilities are endless.
MS Access, and Excel for that matter, have excellent external data tools that allow you to connect to almost any external data source and leverage on regular SQL-based databases or even use OLAP cubes to do your analysis. Access itself should be powerful enough to handle the data sets you mention. Even Access 2010 should be able to handle millions of records with relative ease.
MS Access does have a significant limitation, which is the 2GB file size. Once your database reaches 2GB, everything goes out the window and you are very likely to get data corruption. This is a well known issue, but I don't think you are anywhere near these limits.
Before considering an upgrade, though, there are a few things to suggest:
Analyze the structure of your data and your database. Perhaps your tables are too big (lots of columns) and unnecessarily redundant. It may make sense to process the raw data you receive to split it into different tables that reduce the redundancy and improve performance.
Look into indexing some key fields in your tables. This is heavily dependent on the type of analysis you do and what queries are most common. Read up on indexes and how to use them and explore some options with actual datasets. You may be surprised how queries that used to take minutes to run become almost instantaneous when the right indexes are created and maintained.
Analyze your queries for performance. If I remember correctly, MS Access 2010 had a performance analyzer, which could improve your queries to make them run more efficiently.
If you have already looked into the items above and you decide you really need to take a step up, one fairly easy path (and inexpensive) is to install SQL Server Express, which you can download for free from Microsoft. Access was made to talk to SQL Server and the performance is many times better. You can run SQL Server Express in your personal pc and use it as a back-end for Access, or you could actually install it in a networked pc and use it as a server (behind a firewall, of course, NEVER connected to the Internet). In this setup you can access your data from several PCs.
One key thing to keep in mind once you start using Access as a front end, is that you want to push the processing to the back end, not keep it in Access. The best way to do this is to create what Access calls pass-through queries. These queries are written in the backend's native SQL language and are sent to the back end server for processing. Only the processed data comes back. If you don't do this, for example by creating the queries in the visual editor in Access instead, the raw data will be sent to Access and then Access will try to create your results. This, as you can imagine, can actually be a lot slower than your initial situation, so don't do it.
If you are not a SQL expert and need a visual editor, there is a tool that you can download from Microsoft: SQL-Server Management Studio Express. The query editor is not that different from Access and will allow you to create queries in a visual manner, but in Transact-SQL (the language of SQL Server). You can also manage your SQL Server Express with this tool and maintain your data in this manner (import, export, etc). You can create the SQL statements you need in this editor and then copy and paste into the pass-through queries in Access. The data will be available for you in the program you are familiar with, but with the power of a much bigger database engine behind the scenes.
Since I do not want to sound like a Microsoft shill, I definitely want to mention other options for external data that could be equally or even more powerful than SQL Server Express. The only reason I mentioned these is because you are already familiar with Microsoft products and the learning curve is a bit less steep. Also, most things should work together out of the box.
The first option that comes to mind is SQLite, which is a high performing database that is actually file-based. It is very small, yet very powerful and fast, and it is ideal for a locally based application like what you mention. There are also lots of graphical interfaces for SQLite and you can connect to it via ODBC from Access. Again, you want to run everything using pass-through queries and let SQLite pick up the load. SQLite is Open Source and it is free.
If you are keen on having "a real database server", then MySQL is probably the next step up. Also Open Source and free, it is very popular, which means lots of places to get support and different graphical interfaces to choose from.
Any search for Open Source Database will give you even more options to try and choose from.
One key thing to keep in mind: if you install any database server in your PC, it will become a server, and will start advertising its services in your local network or on the internet if you bring it to a local Starbucks. Be careful with that, learn how to start/stop the services in your PC, and make sure you turn them off when you are not behind a firewall. There are many exploits for different database servers and you will get quickly detected once your PC starts advertising its newly acquired abilities.
Just to close, there is no difference in the performance of Access and the runtime. Just the ability to edit the queries and so on. Whatever front end you create in Access, your users will be able to utilize in the same manner.

should i advocate migrating from access to (my)sql

We have a windows MFC app that is written against an access database on a company server. The db is not that big: 19 MB. There are at most 2-3 users accessing it at any one time. It is used in a factory environment where access speed (or lack thereof) over the intranet becomes noticeable as it is part of the manufacturing time for our widgets.
The scenario is this: as each widget is completed, it gets a record in the db.. by the end of the year, the db is larger and searching for a record takes longer and longer. The solution so far has been to manually move older records to an archival table about once a year.
We are reworking other portions of this app right now, and it would be a good time to move to another db if we are going to do it.
It is my understanding that if we were using sql, the search time would not go up as the table gets bigger because the entire .mdb does not have to be sent over the network each time. Is this correct? Does anyone have any insight about whether it could be worth it to go to the trouble (time and money) of migrating to a new db, or should I just add more functionality to the application we have now, and maybe automatically purge the older records from time to time, and add additional facilities to the app to get at the older records when needed?
Thanks for any wisdom you can share..
Since your database is small and very few users, I could not make a solid case for migration. I would definetly set up an script to archive old records on a more frequent basis (don't archive into same db, this would somewhat defeat the purpose).
But also make sure two things are correct as well.
INDEXES. If your queries start slowing down, make sure you have proper indexes
http://support.microsoft.com/kb/304272
Your network connection between computers is fast. Maybe upgrade to gigabit cards and router? Possibly put the db on a scsi drive (raid 10 for speed and redundancy)
Throwing advanced technology at simple problems is an expensive way to go and not always the answer!
First of all, the information that the whole table and the whole database is transferred across the network is simply incorrect. If the queries are indexed, then the search times should not go up that much over time.
As others have mentioned spending the time + money to setup and maintain and then have someone maintain and manage and support that database server is certainly a possibility here. However, keep in mind that simply migrating a JET based application to sql server in many cases will run slower, and in fact sql server is slower then JET when no network is involved.
So, I would take some time to ascertain why things slow down so much, and also check into how indexing is setup.
So, just keep in mind that it is pure folklore and myth that the whole tables and whole database is transferred over the network. This concept is ONLY DUE to most people really not having any computer training and not knowing and understanding how the JET data engine works.
I would probably move to either Microsoft SQL Server 2008 R2 Express Edition (free) or MySQL (free) if there is both funding and time to put in a data access layer. Because you will be making requests of a remote server and not operating on data at the local workstation this move is very involved from the development standpoint.
However you should analyze whether or not its more cost effective to perform your archival process quarterly or monthly, and just move the archive database to SQL Server 2008 R2 Express Edition. (You can install the Microsoft SQL Server Management Studio client tools on workstations and query the archival database for faster reports on historical data without rewriting your entire production application; similar solutions exist for using MySQL or other OSS/free RDBMS).
I have cilents with 300 mb databases although they should be upsizing to SQL Server for other reasons. 19 Mb is relatively small. If performance is bad enough that archiving speeds things up then check the indexes to the tables for all your sorting and selection fields. Albert gave you a good URL there to check.
Entire MDB files do not go down the wire. Unless you are missing indexes.
Instead of shipping the DB over the network to the client and then performing queries, you could instead write a small wrapper on the server that handles requests, looks up the result in the Access DB (using SQL + the Access ODBC driver), and returns the result. This avoids the overhead of a large migration you might not need and still gets rid of the basic problem the users are experiencing.
Moving to a "proper" database solution is the best long term solution, but if your needs scale linearly and slowly over the next 30 years, it's hard to justify an expensive migration. That said, if you expect to really ramp up, or want to be more "future-proof", migrating now will likely save money/time.
It is my understanding that if we were
using sql, the search time would not
go up as the table gets bigger because
the entire .mdb does not have to be
sent over the network each time. Is
this correct?
This general idea is true for almost all databases. The idea of a database is to separate your application from the actual data. The data resides in a database server. Your application doesn't.
Does anyone have any insight about
whether it could be worth it to go to
the trouble (time and money) of
migrating to a new db
Yes. Having proposed this many times. It's expensive. It's complicated. Your MS-Access database will never get better or faster.
Other database servers will (and can) get faster and more sophisticated. After all, you're not sending .MDB files through a network anymore. The limitations are reduced. You're working with standard SQL through ODBC. Any database will work at the end of ODBC. You can fire vendors to find better, faster, cheaper products. Once you stop using Access you have choices.
Either stop using Access now or plan to suffer with it forever. And remake this decision every year until the end of time.

What is the fastest way for me to take a query and turn it into a refreshable graph of the results set?

I often find myself writing one off queries to either answer someone's question or trouble shoot something and I would like to be able to quickly expose the on demand refreshable results of the query graphically so that I can share these results to others without having to go through the process of creating an SSRS report and publishing it to a reporting services server.
I have thought about using excel to do this or maybe running a local SSRS server but both of these options are still labor intensive and I cannot justify the time it would take to do these since no one has officially requested that I turn this data into a report.
The way I see it the business I work for has invested money in me creating these queries that often return potentially useful data that other people in the organization might want but since it isn't exposed in any way and I don't know that this data is something they want and they may not even realize they want this data, the potential value of the query is not realized. I want to increase the company's return on investment on all these one off queries that I and other developers write by exposing their results graphically so that they can be browsed by others and then potentially turned into more formalized SSRS reports if they provide enough value to justify the development of the report.
What is the fastest way for me to take a query and turn it into a refreshable graph of the results set?
Why dont you simply use what you may already have. Excel...you can import data via an ODBC / Oracle / SQL Connection. Get Data..and bam you can run the query and format it right in the spreadsheet and provide sorting etc. All you need to supply is the database name and user name and password to connect to the db.
JonH is right regarding Excel's built in ODBC support, but I have had tons of trouble with this. In my case, the ODBC connection required the client software to be installed so that it could use the encryption methods, etc. Also, even if that were not the case, the user (I believe) would still have to manually install and set up an ODBC connection.
Now if you just want something on your machine to do the queries and refresh them, JohH's solution is great and my caveats are probably irrelavent. But if you want other users to have access, you should consider having a middle-man app (basically a PHP script, assuming a web server is an option for you), that does a query, transforms the results into XML, and outputs it as "report-xyz.xml". You can then point anybody running a newer version of Excel to that address and they can very easily import the data into Excel with no overhead. (basically a kind of web service).
Keep in mind, I don't think you should have a web script that will allow users to make queries to your Database server! You would have some admin page where you make pass the query in and a new xml file with the results gets made. So my idea is also based on the idea that you want to run the same queries over and over without any specifics passed in. (if that were the case, I'd look into just finding a pre-built web services bridge for your database that already has security features built in. Then you could have users make the limited changes allowed.)

Single or multiple databases

SQL Server 2008 database design problem.
I'm defining the architecture for a service where site users would manage a large volume of data on multiple websites that they own (100MB average, 1GB maximum per site). I am considering whether to split the databases up such that the core site management tables (users, payments, contact details, login details, products etc) are held in one database, and the database relating to the customer's own websites is held in a separate database.
I am seeing a possible gain in that I can distribute the hardware architecture to provide more meat to the heavy lifting done in the websites database leaving the site management database in a more appropriate area. But I'm also conscious of losing the ability to directly relate the sites to the customers through a Foreign key (as far as I know this can't be done cross database?).
So, the question is two fold - in general terms should data in this sort of scenario be split out into multiple databases, or should it all be held in a single database?
If it is split into multiple, is there a recommended way to protect the integrity and security of the system at the database layer to ensure that there is a strong relationship between the two?
Thanks for your help.
This question and thus my answer may be close to the gray line of subjective, but at the least I think it would be common practice to separate out the 'admin' tables into their own db for what it sounds like you're doing. If you can tie a client to a specific server and db instance then by having separate db instances, it opens up some easy paths for adding servers to add clients. A single db would require you to monkey with various clustering approaches if you got too big.
[edit]Building in the idea early that each client gets it's own DB also just sets the tone for how you develop when it is easy to make structural and organizational changes. Discovering 2 yrs from now you need to do it will become a lot more painful. I've worked with split dbs plenty of times in the past and it really isn't hard to deal with as long as you can establish some idea of what the context is. Here it sounds like you already have the idea that the client is the context.
Just my two cents, like I said, you could be close to subjective on this one.
Single Database Pros
One database to maintain. One database to rule them all, and in the darkness - bind them...
One connection string
Can use Clustering
Separate Database per Customer Pros
Support for customization on per customer basis
Security: No chance of customers seeing each others data
Conclusion
The separate database approach would be valid if you plan to support per customer customization. I don't see the value if otherwise.
You can use link to connect the databases.
Your architecture is smart.
If you can't use a link, you can always replicate critical data to the website database from the users database in a read only mode.
concerning security - The best way is to have a service layer between ASP (or other web lang) and the database - so your databases will be pretty much isolated.
If you expect to have to split the databases across different hardware in the future because of heavy load, I'd say split it now. You can use replication to push copies of some of the tables from the main database to the site management databases. For now, you can run both databases on the same instance of SQL Server and later on, when you need to, you can move some of the databases to a separate machine as your volume grows.
Imagine we have infinitely fast computers, would you split your databases? Of course not. The only reason why we split them is to make it easy for us to scale out at some point. You don't really have any choice here, 100MB-1000MB per client is huge.