Same hardware cluster - hardware

If I have two databases and views in one in which both databases are JOIN'ed or UNION'ed is this an issue for for GCSql? This feature according to MySQL only requires that both databases remain within the same hardware cluster.
I am not totally clear on what constitutes a hardware cluster, but how does that relate to google SQL instances, etc?

Each Google Cloud SQL instances has a single MySQL instance at any one time. The data is replicated to multiple locations when that single MySQL instance writes it to persistent storage -- this means that the instance can failover to a new location if there is a problem.
There isn't any hardware clustering in the sense used here.

Related

What is the main difference between Active Geo Replication and Auto Failover Groups for Azure SQL DB

I would like to know what is the difference between Active Geo Replication and Auto Failover groups in Azure SQL DB ? I read that in Auto Failover groups, the secondary database is always created on a secondary region, but active geo-replication can happen between same region also. So when one should use compared to the other?
According to MSFT documentation - the Auto-failover groups "is a declarative abstraction on top of the existing active geo-replication feature, designed to simplify deployment and management of geo-replicated databases at scale". BCDR is the biggest use case - manual or automatic failover of SQL data to another region.
The auto-failover group feature imposes some limitations while adding convenience -
A listener concept enables your apps to take advantage of the same endpoint to your SQL, while with geo-replication your app is responsible for connection strings manipulation to target required SQL instance
On another hand, geo-replication supports multiple RO targets including in the same region, while failover group supports only two SQL instances in different regions, in which one is RW and another is RO
As validly pointed in another answer, SQL managed instances only support failover groups via vNet peering
There is little difference between Active Geo Replication and Auto Failover groups.
Active geo-replication is not supported by Azure SQL Managed Instance but Auto Failover groups is supported.
Active geo-replication replicates changes by streaming database transaction log. It is unrelated to transactional replication, which replicates changes by executing DML (INSERT, UPDATE, DELETE) commands. It seems that Active geo-replication is more lightweight and efficient.
Active-geo-replication document
Auto-failover-group document

Managing data in two relational databases in a single location

Background: We currently have our data split between two relational databases (Oracle and Postgres). There is a need to run ad-hoc queries that involve tables in both databases. Currently we are doing this in one of two ways:
ETL from one database to another. This requires a lot of developer
time.
Oracle foreign data wrapper on our
Postgres server. This is working, but the queries run extremely
slowly.
We already use Google Cloud Platform (for the project that uses the Postgres server). We are familiar with Google BigQuery (BQ).
What we want to do:
We want most of our tables from both these databases (as-is) available at a single location, so querying them is easy and fast. We are thinking of copying over the data from both DB servers into BQ, without doing any transformations.
It looks like we need to take full dumps of our tables on a periodic basis (daily) and update BQ since BQ is append-only. The recent availability of DML in BQ seems to be very limited.
We are aware that loading the tables as is to BQ is not an optimal solution and we need to denormalize for efficiency, but this is a problem we have to solve after analyzing the feasibility.
My question is whether BQ is a good solution for us, and if yes, how to efficiently keep BQ in sync with our DB data, or whether we should look at something else (like say, Redshift)?
WePay has been publishing a series of articles on how they solve these problems. Check out https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka.
To keep everything synchronized they:
The flow of data starts with each microservice’s MySQL database. These
databases run in Google Cloud as CloudSQL MySQL instances with GTIDs
enabled. We’ve set up a downstream MySQL cluster specifically for
Debezium. Each CloudSQL instance replicates its data into the Debezium
cluster, which consists of two MySQL machines: a primary (active)
server and secondary (passive) server. This single Debezium cluster is
an operational trick to make it easier for us to operate Debezium.
Rather than having Debezium connect to dozens of microservice
databases directly, we can connect to just a single database. This
also isolates Debezium from impacting the production OLTP workload
that the master CloudSQL instances are handling.
And then:
The Debezium connectors feed the MySQL messages into Kafka (and add
their schemas to the Confluent schema registry), where downstream
systems can consume them. We use our Kafka connect BigQuery connector
to load the MySQL data into BigQuery using BigQuery’s streaming API.
This gives us a data warehouse in BigQuery that is usually less than
30 seconds behind the data that’s in production. Other microservices,
stream processors, and data infrastructure consume the feeds as well.

SQL Mirroring or Failover Clustering VS Azure built in infrastructure

I read in a few places that SQL Azure data is automatically replicated and the Azure platform provides redundant copies of the data, Therefore SQL Server high availability features such as database mirroring and failover cluster aren't needed.
Has anyone got a chance to investigate deeper into this? Are all those availability enhancements really not needed in Azure? Thanks!
To clarify, I'm talking about SQL as a service and not a VM hosted SQL.
The SQL Database service (database-as-a-service) is a multi-tenant database service, and your databases are triple-replicated within the data center, providing durable storage. The service itself, being large-scale, provides high availability (since there are many VMs running the service itself, along with replicated data). Nothing is needed in terms of mirroring or failover clusters. Having said that: If, say, your particular database became unavailable for a period of time, you'll need to consider how you'll handle that situation (perhaps sync'ing to another SQL Database, maybe even in another data center).
If you go with SQL Database (DBaaS), you'll still need to work out your backup strategy, and possibly syncing with another DC (or on-premises database server) for DR purposes.
More info on SQL Database fault tolerance is here.
Your desired detail is probably contained in this MSDN article of Business Continuity and Azure SQL Database (see: http://msdn.microsoft.com/en-us/library/windowsazure/hh852669.aspx). At the most basic level Azure SQL Database will keep three replicas of your database - one primary and two secondary.
While this helps with BCP / DR scenarios you may also wish to investigate ways to backup your database so you have point-in-time restore capabilities. More information on backup / restore can be found here: http://msdn.microsoft.com/en-us/library/windowsazure/jj650016.aspx

Online and local sql database synchronization

According to my system i have maintained two databases in LAN and online db.But i want to synchronize these two databases. I hope to do this things using microsoft sync frame work.
.http://msdn.microsoft.com/en-us/library/ee819079.aspx
Can i do sync local and online sql db using this? or any suitable method for do this.thank you
Sync Framework is designed for occasionally connected systems, eg. a laptop that can access the corporate network every other day and update its database, but needs to work when it has no corpnet access too. The pairing of Sync Framework is usually a central DB (SQL Server) and local embedded SQL Server Compact or SQL Express on the devices (laptops, phones, tablets etc).
IF the databases are always connected (eg. two DBs in two servers, with 24x7 connectivity between them, even if over Internet) then the appropriate technology is replication. Either Merge or Transactional. Theoretically replication also works when disconnect periods are expected, but Sync Framework is much better at it, and most importantly Sync Framework is not strongly dependent on DNS names as replication is (very important for occasionally connected systems).
Synchronizing the database is a vague term, you have to consider if you want a Master-Slave replication shcme or a Master-Master (the later being very difficult to achieve) and you have to consider what do you want replicated from the database. You also need to consider if more partners will be later added (more databases to 'synchronize'). And you have to be way more careful now about schema changes.

handling data between remote instances

We have a hr system that holds employee data and have many remote databases that use this data. Currently we use a mixture of copying the data across periodically to the remote databases and pulling the data across using views at runtime. Im curious as to which option you think is best. My personal preference is to copy the data across periodically as it removes the dependency from the master databases. However it seems both have pros and cons
Whats the best practice for this?
Thanks
p.s we have a mixture of sql2000, 2005 and s008 servers
Part of the answer will depend on what level of latency is acceptable for the other systems that use the HR data. Is a day behind OK? an Hour? or does it need to be current?
Each instance could result in a different solution.
I prefer a data pull instead of a push. The remote decides when it needs its data and you can encapsulate all that logic on the server where it belongs. In a push, you have to keep processes on the HR server in synch with the demands of the subsystem.
I have reservations about multiple remote databases querying a source system directly. If some latency is not an issue, build a process on the HR system to snapshot the required data into some local tables (or a data warehouse?) and have all remotes query this data. At the very least, build local views against the HR source and only allow remote servers rights to those.
Are you doing this across a linked server? If so, I recommend creating synonyms on the remote that point to the HR source across the link. This will allow you to move source data locations around and only have to change your synonym definition.