Is it possible to Update an SQL Server Database and an Informix Database at the same time? - sql

I was wondering if it is possible to add/update/delete an SQL Server database table, as well as an Informix database table at the same time.
Both databases will have the same table (data and all), so the query would only change just based on which database it is going to. For some reason, we need the data inside both databases and kept up in real time.
Is it possible to do this with a SQL Trigger or maybe a SProc?
Any insight of how to do this, or a push in the right direction would be very much appreciated.

Doing a synchronous update, ie. a distributed transaction by using a linked server, possible for a trigger, while technically possible, I would definitely advise against it. Aaron brings the issue of how reliable XA in general is, but my point is different: availability. Your update in SQL Server will fail if it cannot connect and update in Informix. Downtime (patching, maintenance, not to mention disasters) of the Informix site will imply downtime of the SQL Server site, driving your five 9's toward nine 5's quite fast... This is why I strongly advocate decoupling the application of updates. Transactional Replication is such an example of decoupling and it supports heterogenous environments (ie. Informix client downstream to accept the changes).
You will have a delay of update visibility (state in SQL Server will be reflected in Informix after delay that can be milliseconds, seconds, minutes, even hours in a bad day). And the updates are one way, nothing flows back from Informix to SQL Server. But doing master-master replication in an heterogeneous environment is something that not even Chuck Norris would attempt, just saying.

Maintaining two different DBMS with a single transaction requires a transaction monitor such as the XA system to coordinate the transactions. There are such systems. The XA specification is typically the underlying standard. Both Microsoft's SQL Server and IBM's Informix work with such systems, and it is possible to have SQL Server and Informix controlled by the same transaction monitor. I have fewer qualms about the technical competency of such systems than the others who've answered; I share their concerns about whether it is appropriate for you.
Such systems are very heavyweight. If you want consistency, all transactions that modify the single table described in the question will need to use the same XA services (plural; likely one for insert, one for update, one for delete) to do so. Further, if the same transactions need to manage any other tables too, then you need to add and use services for those tables as well. It is this aspect that tends to make such systems difficult to manage.
Using a replication system with the potential for delay before the sites are consistent is probably better than trying for absolute synchronicity, unless there are cogent demands for such synchronicity.
If there really is a demand for absolute synchronicity, then use a transaction monitor.
Do not roll your own.
They are hard to get right. Handling all the special cases is tricky. And (under the hypothesis that you need absolute synchronicity) doing it wrong is costly but easy.

That depends on your definition of "possible". Technically, you can use a technique called "two-phase commit."
The idea is you send the data to both databases and then a "prepare commit" command which does everything necessary to commit the data except for committing it. If the prepare fails, the commit would fail too. If prepare succeeds, then commit must succeed.
Brilliant idea, doesn't work in practice. One common case is that you send the commit to both databases and one of them gets lost on the way (network outage). Happens rarely but when it happens, you have an inconsistent state and, since this step must not fail, no good way to clean up.
So my solution works like this:
You load the data into a new table which has two extra columns where you can say "server X has seen this record"
You add a job which copies all jobs for server X to server X and updates the respective column. Write the job in such a way that it can be aborted and restarted at any time (i.e. it must be able to cope with cases where data already exists on the target side).
That way, you can distribute the data to any number of servers in a consistent, fault tolerant way.

Related

Multiple application on network with same SQL database

I will have multiple computers on the same network with the same C# application running, connecting to a SQL database.
I am wondering if I need to use the service broker to ensure that if I update record A in table B on Machine 1, the change is pushed to Machine 2. I have seen applications that need to use messaging servers to accomplish this before but I was wondering why this is necessary, surely if they connect to the same database, any changes from one machine will be reflected on the other?
Thanks :)
This is mostly about consistency and latency.
If your applications always perform atomic operations on the database, and they always read whatever they need with no caching, everything will be consistent.
In practice, this is seldom the case. There's plenty of hidden opportunities for caching, like when you have an edit form - it has the values the entity had before you started the edit process, but what if someone modified those in the mean time? You'd just rewrite their changes with your data.
Solving this is a bunch of architectural decisions. Different scenarios require different approaches.
Once data is committed in the database, everyone reading it will see the same thing - but only if they actually get around to reading it, and the two reads aren't separated by another commit.
Update notifications are mostly concerned with invalidating caches, and perhaps some push-style processing (e.g. IM client might show you a popup saying you got a new message). However, SQL Server notifications are not reliable - there is no guarantee that you'll get the notification, and even less so that you'll get it in time. This means that to ensure consistency, you must not depend on the cached data, and you have to force an invalidation once in a while anyway, even if you didn't get a change notification.
Remember, even if you're actually using a database that's close enough to ACID, it's usually not the default setting (for performance and availability, mostly). You need to understand what kind of guarantees you're getting, and how to write code to handle this. Even the most perfect ACID database isn't going to help your consistency if your application introduces those inconsistencies :)

HA Database configuration that avoids split-brain issues?

I am looking for a (SQL/RDB) database setup that works something like this:
I will have 3+ databases in an active/active/active configuration
prior to doing any insert, the database will communicate with atleast a majority of the others, such that they all either insert at the same time or rollback (transaction)
this way I can write and read from any of the databases, and always get the same results (as long as the field wasn't updated very recently)
note: this is for a use case that will be very read-heavy and have few writes (and delay on the writes is an OK situation)
does anything like this exist? I see all sorts of solutions with database HA configurations, but most of them suggest writing to a primary node or having a passive backup
alternatively I could setup a custom application, and have each application talk to exactly 1 database, and achieve a similar result, but I was hoping something similar would already exist
So my questions is: does something like this exist? if not, are there any technical/architectural reasons why not?
P.S. - I will NOT be using a SAN where all databases can store/access the same data
edit: more clarifications as far as what I am looking for:
1. I have no database picked out yet, but I am more familiar with MySQL / SQL Server / Oracle, so I would have a minor inclination towards on of those
2. If a majority of the nodes are down (or a single node can't communicate with the collective), then I expect all writes from that node to fail, and accept that it may provide old/outdated information
failure / recover scenario expectations:
1. A node goes down: it will query and get updates from the other nodes when it comes back up
2. A node loses connection with the collective: it will provide potentially old data to read request, and refuse any writes
3. A node is in disagreement with the data stores in others: majority rule
4. 4. majority rule does not work: go with whomever has the latest data (although this really shouldn't happen)
5. The entries are insert/update/read only, i.e. there will be no deletes (except manually ofc), so I shouldn't need to worry about an update after a delete, however in that case I would choose to delete the record and ignore the update
6. Any other scenarios I missed?
update: I the closest I see to what I am looking for seems to be using a quorum + 2 DBs, however I would prefer if I could have 3 DBs instead, such that I can query any of them at any time (to further distribute the reads, and also to keep another copy of the data)
You need to define "very recently". In most environments with replication for inserts, all the databases will have the same data within a few seconds of an insert (and a few seconds seems pessimistic).
An alternative approach is a "read-from-one/write-to-all" approach. In this case, reads are spread through the system. Writes are then sent to all nodes by the application (or a common layer that the application uses).
Remember, though, that the issue with database replication is not how it works when it works. The issue is how it recovers when it fails and even how failures are identified. You need to decide what happens when nodes go down, how they recover lost transactions, how you decide that nodes are really synchronized. I would suggest that you peruse the documentation of the database that you are actually using and understand the replication mechanisms provided by that platform.

Should I create separate SQL Server database for each user?

I am working on Asp.Net MVC web application, back-end is SQL Server 2012.
This application will provide billing, accounting, and inventory management. The user will create an account by signup. just like http://www.quickbooks.in. Each user will create some masters and various transactions. There is no limit, user can make unlimited records in the database.
I want to keep stable database performance, after heavy data load. I am maintaining proper indexing and primary keys in it, but there would be a heavy load on the database, per user.
So, should I create a separate database for each user, or should maintain one database with UserID. Add UserID in each table and making a partition based on UserID?
I am not an expert in SQL Server, so please provide suggestions with clear specifications.
Please inform me if there is any lack of information.
A DB per user is what happens when customers need to be able pack up and leave taking the actual database with them. Think of a self hosted wordpress website. Or if there are incredible risks to one user accidentally seeing another user's data, so it's safer to rely on the servers security model than to rely on remembering to add the UserId filter to all your queries. I can't imagine a scenario like that, but who knows-- maybe if the privacy laws allowed for jail time, I would rather data partitioned by security rules rather than carefully writing WHERE clauses.
If you did do user-per-database, creating a new user will be 10x more effort. While INSERT, UPDATE and so on stay the same from version to version, with each upgrade the syntax for database, user creation, permission granting and so on will evolve enough to break those scripts each SQL version upgrade.
Also, this will multiply your migration headaches by the number of users. Let's say you have 5000 users and you need to add some new columns, change a columns data type, update a trigger, and so on. Instead of needing to run that change script 1x, you need to run it 5000 times.
Per user Dbs also probably wastes disk space. Each of those databases is going to have a transaction log, sitting idle taking up the minimum log space.
As for load, if collectively your 5000 users are doing 1 billion inserts, updates and so on per day, my intuition tells me that it's going to be faster on one database, unless there is some sort of contension issue (everyone reading and writing to the same table at the same time and the same pages of the same table). Each database has machine resources (probably threads and memory) per database doing housekeeping, so these extra DBs can't be free.
Anyhow, the best thing to do is to simulate the two architectures and use a random data generator to simulate load and see how they perform.
It's not an easy answer to give.
First, there is logical design to be considered. Then you have integrity, security, management and performance (in this very order).
A database is a logical unit of data, self contained. Ideally, you should be able to take a database, move it to another instance, probably change the connection strings and be running again.
All the constraints are database-level. No foreign keys can exist referencing some object outside the database.
So, try thinking in these terms first.
How would you reliably prevent one user messing up the other user's data? Keep in mind that it's just a matter of time before someone opens an excel sheet and fire up queries on the database bypassing your application. Row level security in SQL Server is something you don't want to deal with.
Multiple databases mean that all management tasks should be scripted out and executed on all databases. Yes, there is some overhead to it, but once you set it up it's just the matter of monitoring. If a database goes suspect, it's a single customer down, not all of them. You can even have different versions for different customes if each customer have it's own database. Additionally, if you roll an upgrade, you can do it per customer, so the inpact will be much less.
Performance is the least relevant factor here. Of course, it really depends on how many customers and how much data, but proper indexing will solve these issues. Scale-out is much easier with multiple databases.
BTW, partitioning, as you mentioned it, is never a performance booster, it's simply a management feature, allowing for faster loading and evicting of data from a table.
I'd probably put each customer in separate database, but it's up to you eventually to make a decision for yourself. Hope I've helped some with this.

Strategy for keeping separate Databases in Sync

I have a NoSQL database that we are using for data processing, as it can be used for my application faster than SQL can. I'm treating our NoSQL database almost like a cache of information, with the SQL being the authority of data, and the NoSQL store being updated with changes. Right now this is being done through our application, so when a request comes in for a change, it is made in the SQL database, and the NoSQL database. This is failing at times as sometimes the NoSQL update fails, or other situations cause the NoSQL database to get out of sync.
I could do a batch update every X minutes, however it is a lot of information in the data stores, and it would take hours to ensure that they are in sync. We have some timestamps to do a difference of what has been changed, but this is not always accurate.
I'm wondering what some recommended strategy for keeping a data store(secondary database cache) in sync with my main store are?
I know I've done with this with messaging in the past - specifically JMS with ActiveMQ. I would send the updates to a NoSQL store (Mongo) by using a queue. This way messages could accumulate in the queue and if the connection to the NoSQL store ever got severed, it could pick up where it left off.
It worked really well because ActiveMQ was really stable and simple to work with.
I've always seen this done with diffs like you mentioned. You introduce date fields all over and then keep track of the latest sync. The nice thing about this approach is that it easily allows you to replay transactions by modifying the last sync date.
One last piece of advice ... write good tools around pumping data from point A to point B (in this case SQL to NoSQL). I wrote several tools to bulk load the NoSQL store from SQL at my last job and it made life easy if anything got really out of sync. Between scripts and bulk loading processes, I could always recover.

Using Sql Server Replication

We are using Replication and seem to be having endless problems with it. It seems to shut down for unknown reasons. It needs to be shut down to remove a column and only starts back up half the time. Does anyone have any advice on how to properly use replication or some alternatives to it.
Edit:
We are using Sql Server 2005, We cannot use database mirroring as we used the other database for reporting. As far as I am aware you cannot query from a mirrored database.
If you need just couple of tables from your DB for reports, replication is more useful, but you also can set up log shipping with secondary server in STAND BY mode (especially if you need significant part of your data for reports), then you can run reports on secondary server. You just have to remember that log shipping will interfere with transaction log backups, so you have to use the same folder with log backup files for both processes.
I would think the combination of database mirroring and database snapshots will solve your issues.
First, database mirroring is very easy to setup and I have never had any problems with it (using it for the past 4+ years).
Second, creating a database snapshot on your failover server will allow you to run reports. You can setup a sql agent job to drop and re-create the snapshot on whatever acceptable interval you like.
Of course this is all dependent on if you need your reports to run on real-time data or if they can be delayed somewhat.
Here are a list of the problems that I have had to resolve to get replication working:
1) The replication sometimes lies to me and tells me this, even when its working fine.
"The server 'Bob' is not a Subscriber. (.Net SqlClient Data Provider)" I have tried to re-initialise it thinking that it was broken and it never was...
2) It can take a little while to restart itself, especially if your remote DB is on the other side of the planet, which it is in my case. If you are on a slow network connection, or it is not 100% reliable, then you can have problems. Also, the jobs which restart the process can sometimes take a while to run, which also delays things further.
3) Some changes require full re-initalisation which involves sending a new snapshot out. If you don't have your permissions quite right, and you can re-initialise manually, but it doesn't happen automatically, then this can be a another reason for problems.
We have a SQL transactional replication which runs perfectly happily. You seem to say that it is when you are making schema changes to the publisher that you get problems. Each time we do a schema change we drop the publication, subscription and the subscription database. Do the change, then re-build it all. We can do this becuase we can tolerate the time it takes to re-apply the snapshot. There are ways to apply schema changes to the publication and have them propogate to the subscriber. Take a look at sp_register_custom_scripting. We have made this work once, so I can give some more information about it if you need.
As #Jason says, you can report from a mirrored database by using a snapshot. Beware that the snapshot will take up space, and cause more work for the mirror server. Although how much space will depend on how much data is changing and how big your original database is. We do use a snapshot on a mirrored database for occasional reports because our entire database is not replicated.
log shipping http://msdn.microsoft.com/en-us/library/ms187103.aspx
What version of SQL Server are you using?
We're using replication now for a particular solution, and it seems to just work, day in, day out.
I would examine your event log's, and SQL Server logs to see if you can determine why it is shutting down, and why it doesn't start up.
Are you possibly patching the servers, or are you having network errors?
The alternatives to replication are log shipping, or database mirroring.
I personally prefer Database Mirroring, but it really depends what you're trying to do, as some of these aren't appropriate for certain situations.
We also have used SQL transactional replication. We had the same pains with updating schema, which requires dropping the publication on all servers, performing the updates, and then reinitializing replication, and hoping for the best. Sometimes it would not initialize, or a node would fall behind and we'd get little warning for it. A few times we even lost all the stored procedure execute permissions causing pretty much total failure on the websites.
We have a rather large database so reinitialization could take quite some time, meaning all updates had to be done at 2am on Sunday - not exactly when we're awake and alert and able to use all our faculties to deal with a problem that might arise.
We are ditching replication in favor of failover clustering on SQL 2008, but it can still be done all the way back to SQL 2000.
http://technet.microsoft.com/en-us/library/cc917693.aspx