What database replication model do I need? - ruby-on-rails-3

I am on Rails 3 with a local Postgres database. What we want to do is replicate the entire database onto a second server in real time. We are thinking of using Octopus.
I'm confused about what model I'm looking for and how the master-slave model applies.

Postgres 9.1 and later comes with streaming replication built in (for master-slave configurations). Check out http://www.postgresql.org/docs/9.2/static/warm-standby.html#STREAMING-REPLICATION for more information on configuration and setup.
There are other third-party solutions for configuration, but I'd start there and see if that meets your needs.

Related

Can Apache Ignite update when 3rd party SQL Server database changed something directly?

Can I get some advice on whether it is possible to proceed like the steps below?
SQL Server data is loaded in Ignite Cluster
The data in SQL Server has been changed.
-> Is there any other way to reflect this changed data without reloading the data from SQL Server?
When used as a cache in front of the database, when changes are made directly to the DB without going through the Ignite Cluster, can the already loaded cache data be directly reflected in the Ignite cache?
Is it possible to set only the value to change without loading the data again?
If possible, which part should I set? Please.
I suppose the real question is - how to propagate changes applied to SQL Server first to the Apache Ignite cluster. And the short answer is - you need to do it by yourself, i.e. you need to implement some synchronization logic between the two databases. This should not be a complex task if most of the data updates come through Ignite and SQL Server-first updates are rare.
As for the general approach, you can check for the Change Data Capture (CDC) pattern implementations. There are multiple articles on how you can achieve it using external tools, for sample, CDC Between MySQL and GridGain With Debezium or this video.
It's worth mentioning that Apache Ignite is currently working on its own native implementation of CDC.
Take a look at Ignite's external storage integration, and the read/write through features. See: https://ignite.apache.org/docs/latest/persistence/external-storage
and https://ignite.apache.org/docs/latest/persistence/custom-cache-store
examples here: https://github.com/apache/ignite/tree/master/examples/src/main/java/org/apache/ignite/examples/datagrid/store

How to transfer Data from One SQL server to another with out transactional replication

I have a database connected with website, data from website is inserting in that Database, i need to transfer data from that database to another Primary Database (SQL) on another server in real time (minimum latency).
I can not use transactional replication in this case. What are the other alternates to achieve this? Can i integrate DataStreams like Apache kafka etc with SQL server?
Without more detail it's hard to give a full answer. There's what's technically possible, and there's architecturally what actually makes sense :)
Yes you can stream from RDBMS to Kafka, and from Kafka to RDBMS. You can use the Kafka Connect JDBC source and sink. There are also CDC tools (e.g. Attunity, GoldenGate, etc) that support integration with MS SQL and other RDBMS)
BUT…it depends why you want the data in the second database. Do you need an exact replica of the first? If so DB-DB replication may be a better option. Kafka's a great option if you want to process the data elsewhere and/or persist it in another store. But if you just want MS SQL-MS SQL…Kafka itself may be overkill.

Managing data in two relational databases in a single location

Background: We currently have our data split between two relational databases (Oracle and Postgres). There is a need to run ad-hoc queries that involve tables in both databases. Currently we are doing this in one of two ways:
ETL from one database to another. This requires a lot of developer
time.
Oracle foreign data wrapper on our
Postgres server. This is working, but the queries run extremely
slowly.
We already use Google Cloud Platform (for the project that uses the Postgres server). We are familiar with Google BigQuery (BQ).
What we want to do:
We want most of our tables from both these databases (as-is) available at a single location, so querying them is easy and fast. We are thinking of copying over the data from both DB servers into BQ, without doing any transformations.
It looks like we need to take full dumps of our tables on a periodic basis (daily) and update BQ since BQ is append-only. The recent availability of DML in BQ seems to be very limited.
We are aware that loading the tables as is to BQ is not an optimal solution and we need to denormalize for efficiency, but this is a problem we have to solve after analyzing the feasibility.
My question is whether BQ is a good solution for us, and if yes, how to efficiently keep BQ in sync with our DB data, or whether we should look at something else (like say, Redshift)?
WePay has been publishing a series of articles on how they solve these problems. Check out https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka.
To keep everything synchronized they:
The flow of data starts with each microservice’s MySQL database. These
databases run in Google Cloud as CloudSQL MySQL instances with GTIDs
enabled. We’ve set up a downstream MySQL cluster specifically for
Debezium. Each CloudSQL instance replicates its data into the Debezium
cluster, which consists of two MySQL machines: a primary (active)
server and secondary (passive) server. This single Debezium cluster is
an operational trick to make it easier for us to operate Debezium.
Rather than having Debezium connect to dozens of microservice
databases directly, we can connect to just a single database. This
also isolates Debezium from impacting the production OLTP workload
that the master CloudSQL instances are handling.
And then:
The Debezium connectors feed the MySQL messages into Kafka (and add
their schemas to the Confluent schema registry), where downstream
systems can consume them. We use our Kafka connect BigQuery connector
to load the MySQL data into BigQuery using BigQuery’s streaming API.
This gives us a data warehouse in BigQuery that is usually less than
30 seconds behind the data that’s in production. Other microservices,
stream processors, and data infrastructure consume the feeds as well.

Azure cloud. One database per (asp.net registered) client

Good morning,
I am using an asp.net framework with an azure client database.
I am now creating another server on Azure to host databases. On this server, for each customer registering on the website (for which 1 entry is created in my first database), I need to create a database with 8 tables - identical for each customer.
What would be the best thing to map the ASP.NET ID to a new database? Which framework would you recommend?
Thanks
Rather than running a VM where you're going to have to manage a SQL Server installation and write a bunch of code to handle a database per tenant scenario, I highly, highly, highly recommend taking a look at Azure SQL's multi-tenant sharding support. All of this code is already written for you. And it's not that you're paying for one DB per client - check out elastic pooling.
You can read the docs here.
Also note, this option will scale very well.
I have done this three different ways: a database per client where I wrote my own code to manage sharding, a single database with a separate schema per client (a huge pain in the rear), and using Azure SQL sharding support. It's not just the issue of correctly separating client data. You also need to think about querying for reporting across all client databases, and managing schema changes. Under the first two options, if you change a schema, you get to modify N client databases. Azure SQL's sharding tools will manage this for you.

wso2cep : Data Storage in addition to display

I was wondering if in addition to process and display data on dashboard in wso2cep, can I store it somewhere for a long period of time to get further information later? I have studied there are two types of tables used in wso2cep, in-memory and rdbms tables.
Which one should I choose?
There is one more option that is to switch to wso2das. Is it a good approach?
Is default database is fine for that purpose or I should move towards other supported databases like sql, orcale etc?
In-memory or RDBMS?
In-memory tables will internally use java collections structures, so it'll get destroyed once the JVM is terminated (after server restart, data won't be available). On the other hand, RDBMS tables will persist data permanently. For your scenario, I think you should proceed with RDBMS tables.
CEP or DAS?
CEP will only provide real-time analytics, where DAS provides batch analytics (with Spark SQL) in addition to real-time analytics. If you have a scenario which require batch processing, incremental processing, etc ... You can go ahead with DAS. Note that, migration form CEP to DAS is quite simple (since the artifacts are identical).
Default (H2) DB or other DB?
By default WSO2 products use embedded H2 DB as data source. However, it's recommended to use MySQL or Oracle in production environments.