Database design ideas for "master and slave" nodes - sql

We are currently in design phase of a product. The idea is that we have a master (calling it) which contains all user information including user/registration/role/licence etc. and we have a slave database (calling it) which contains main application related data.
Some columns from the master database will be used in the slave database. for e.g. userid will be used everywhere in the slave database. There will be multiple versions of salve in different tiers (depending on customers subscription). So for e.g. Some customer will have a dedicated slave databases for their application.
Also some data/tables/columns from slave will be used in master.
How do we manage this scenario so that we can have maximum referential integrity (I know it will not be possible all the time) without using linked servers. (We dont want to use linked servers because for improper design it can be abused and can effect performance as a result).
Or is this a bad idea. Just have single database design (no master/slave) with different nodes. And customer's data in different nodes depending on their subscription? The problem that I see with this is that the registration/user tables are now fragmented in different database nodes. So for e.g. userA will be in database01 and so on.
Any idea?

Related

Azure SQL Replication

I have an application that, for performance reasons, will have completely independent standalone instances in several Azure data centers. The stack of Azure IaaS and PaaS components at each data center will be exactly the same. Primarily, there will be a front end application and a database.
So let's say I have the application hosted in 4 data centers. I would like to have the data coming into each Azure SQL database replicate it's data asynchronously to all of the other 3 databases, in an eventually consistent manner. Each of these databases needs to be updatable.
Does anyone know if Active Geo-Replication can handle this scenario? I know I can do this using a VM and IaaS, but would prefer to use SQL Azure.
Thanks...
Peer-to-peer tranasaction replication supports what you're asking for, to some extent - I'm assuming that's what you're referring to when you mention setting it up in IaaS, but it seems like it would be self defeating if you're looking to it for a boost in write performance (and against their recommendations):
From https://msdn.microsoft.com/en-us/library/ms151196.aspx
Although peer-to-peer replication enables scaling out of read operations, write performance for the topology is like that for a single node. This is because ultimately all inserts, updates, and deletes are propagated to all nodes. Replication recognizes when a change has been applied to a given node and prevents changes from cycling through the nodes more than one time. We strongly recommend that write operations for each row be performed at only node, for the following reasons:
If a row is modified at more than one node, it can cause a conflict or even a lost update when the row is propagated to other nodes.
There is always some latency involved when changes are replicated. For applications that require the latest change to be seen immediately, dynamically load balancing the application across multiple nodes can be problematic.
This makes me think that you'd be better off using Active Geo Replication - you get the benefit of PaaS and not having to manage your own VMs, not having to manage TR, which gets messy, and if the application is built to deal with "eventual consistency" in the UI, you might be able to get away with slight delays in the secondaries being up to date.

Syncing Postgres Database Instances

I have a queer situation. I am managing an e-commerce site built on Django with Postgresql. It has two versions - English and Japanese. Because of a release that has brought a huge number of users, the site (specifically Postgres) is overloaded and crashing. The only safe solution which I can think of is to put these two separately on two separate servers so that En and Jp traffic gets their own dedicated server. Now, the new server is ready but during the time of domain propagation, and during half-propagated stages (new one being seen from some countries and old one from some) there will be transactions on both. Users are buying digital stuff in hundreds of numbers every minute. So, there is no way to turn the server off for a turnover.
Is there a way to sync the two databases at a later stage (because if both share a database, the new server will be pointless). The bottleneck is Postgres, and has already been tuned for maximum possible connections on this server, and kernel.shmmax is at its limit. DB pooling also will need time to setup and some downtime as well, which am not permitted to do at the moment. What I mean by sync is that once full propagation occurs, I wish to unify the DB dump files from both and make one which has all records of both synced in time. The structure is rather complex so many tables will need sync. Is this do-able ..?
Thanks in advance !

Database good system decoupling point?

We have two systems where system A sends data to system B. It is a requirement that each system can run independently of the other and neither will blow up if the other is down. The question is what is the best way for system A to communicate with system B while meeting the decoupling requirement.
System B currently has a process that polls data in a db table and processes any new rows that have been inserted.
One proposed design is for system A to just insert data into system b's db table and have system B process the new rows by the existing process. Question is does this solution meet the requirement of decoupling the two systems? Is a database considered part of a system B which might become unavailable and cause system A to blow up?
Another solution is for system A to put data into an MQ queue and have a process that would read from MQ and then insert into system B's database. But is this just extra overhead? Ultimately is an MQ queue any more fault tolerant than a db table?
Generally speaking, database sharing is a close coupling and not to be preferred except possibly for speed purposes. Not only for availability purposes, but also because system A and B will be changed and upgraded at several points in their future, and should have minimal dependencies on each other - message passing is an obvious dependency, whereas shared databases tend to bite you (or your inheritors) on the posterior when least expected. If you go the database sharing route, at least make the sharing interface explicit with dedicated tables or views.
There are four common levels of integration:
Database sharing
File sharing
Remote procedure call
Message passing
which can be applied and combined in various situations, with different availability and maintainability. You have an excellent overview at the enterprise integration patterns site.
As with any central integration infrastructure, MQ should be hosted in an environment with great availability, full failover &c. There are other queue solutions which allow you to distribute the queue coordination.
Use Queues for communication. Do not "pass" data from System A to System B through the database. You're using the database as a giant, expensive, complex message queue.
Use a message queue as a message queue.
This is not "Extra" overhead. This is the best way to decouple systems. It's called Service Oriented Architecture (SOA) and using messages is absolutely central to the design.
An MQ queue is far simpler than a DB table.
Don't compare "fault tolerance" because an RDBMS uses huge (almost unimaginable) overheads to achieve a reasonable level of assurance that your transaction finished properly. Locking. Buffering. Write Queues. Storage Management. Etc. Etc.
A reliable message queue implementation uses some backing store to keep the queue's state. The overhead is much, much less than an RDBMS. The performance is much better. And it's much, much simpler to interact with.
In SQL Server I would do this through an SSIS package or a job (depending on the number of records and the complexity of what I was moving). Other databases also have ETL solutions. I like the ETL solution becasue I can keep logs of what was changed and what errors were processed, I can send records which for some reason won't go to the other system (data structures are rarely the same between two databases) to a holding table without killing the rest of the process. I can also make changes to the data as it flows to adjust for database differences (things like lookup table values, say the completed status in db1 is 5 and it is 7 in db2 or say db2 has a required field that db1 does not and you have to add a default value to the filed if it is null). If one or the other servver is down the job running the SSIS package will fail and neither system will be affected, so it keeps the datbases decoupled as using triggers or replication would not.

Database Replication or Mirroring?

What is the difference between Replication and Mirroring in SQL server 2005?
In short, mirroring allows you to have a second server be a "hot" stand-by copy of the main server, ready to take over any moment the main server fails. So mirroring offers fail-over and reliability.
Replication, on the other hand, allows two or more servers to stay "in sync" - that means the secondary servers can answer queries and (depending on setup) actually change data (it will be merged in the sync). You can also use it for local caching, load balancing, etc.
Mirroring is a feature that creates a copy of your database at bit level. Basically you have the same, identical, database in two places. You cannot optionally leave out parts of the database. You can have only one mirror, and the 'mirror' is always offline (it cannot be modified). Mirroring works by shipping the database log as is being created to the mirror and apply (redo-ing) the log on the mirror. Mirroring is a technology for high availability and disaster recoverability.
Replication is a feature that allow 'slices' of a database to be replicated between several sites. The 'slice' can be a set of database objects (ie. tables) but it can also contain parts of a table, like only certain rows (horizontal slicing) or only certain columns to be replicated. You can have multiple replicas and the 'replicas' are available to query and even can be updated. Replication works by tracking/detecting changes (either by triggers or by scanning the log) and shipping the changes, as T-SQL statements, to the subscribers (replicas). Replication is a technology for making data available at off sites and to consolidate data to central sites. Although it is sometimes used for high availability or for disaster recoverability, it is an artificial use for a problem that mirroring and log shipping address better.
There are several types and flavours of replication (merge, transactional, peer-to-peer etc.) and they differ in how they implement change tracking or update propagation, if you want to know more details you should read the MSDN spec on the subject.
Database mirroring is used to increase database uptime and reliability.
Replication is used primarily to distribute portions of your primary database -- the publisher -- to one or more subscriber databases. This is often done to make data available (typically for read only) on remote servers so that remote clients can access the data locally (to them) rather than directly from the publisher across a slower WAN connection. Although, as the previous posts indicate, there are more complex scenarios where updates are permitted on the subscribers. It also can have the benefit of reducing the I/O load on the publisher.

Multi-tenancy with SQL/WCF/Silverlight

We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?
Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.
We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)
What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.
I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks
Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.