how to druid metadata storage replication and clustering - replication

I am using derbydb as metastorage. However, I'm unable to find any material that either explains it in a decent fashion or is able to convince me that replication doesn't exist for Derby.

Related

How to transfer Data from One SQL server to another with out transactional replication

I have a database connected with website, data from website is inserting in that Database, i need to transfer data from that database to another Primary Database (SQL) on another server in real time (minimum latency).
I can not use transactional replication in this case. What are the other alternates to achieve this? Can i integrate DataStreams like Apache kafka etc with SQL server?
Without more detail it's hard to give a full answer. There's what's technically possible, and there's architecturally what actually makes sense :)
Yes you can stream from RDBMS to Kafka, and from Kafka to RDBMS. You can use the Kafka Connect JDBC source and sink. There are also CDC tools (e.g. Attunity, GoldenGate, etc) that support integration with MS SQL and other RDBMS)
BUT…it depends why you want the data in the second database. Do you need an exact replica of the first? If so DB-DB replication may be a better option. Kafka's a great option if you want to process the data elsewhere and/or persist it in another store. But if you just want MS SQL-MS SQL…Kafka itself may be overkill.

What type of google database for a deals based website?

I am trying to find out what makes the most sense for my type of database structure.
A breakdown of what it is and what I intend to do is.
A deals based website using strong consistency that will be needing to update existing linked foreign keys to new parents in a scenario where an alias such as 'Coke' is not linked up to its actual data 'Coca-Cola'.
I will be creating a price over time for these products and should be able to handle large amounts of data with little performance issues over time.
I initially began with Google's BigTable but quickly realised that without a relational part of it, it will fail on any cascading updates.
I don't want to spend plenty of time researching and learning all of these different types to later realise it isn't what I wanted. The most important aspect for me is the cascading update and ensuring it can handle a vertically large data structure for the price over time trends.
Additionally, because this is from scratch, I would be more interested in price and scalability than existing compatibility.
Cloud SQL is a fully-managed database service that makes it easy to
set up, maintain, manage, and administer your relational PostgreSQL
BETA and MySQL databases in the cloud. Cloud SQL offers high
performance, scalability, and convenience. Hosted on Google Cloud
Platform, Cloud SQL provides a database infrastructure for
applications running anywhere.
check this out it might help - https://cloud.google.com/sql/
Googles Cloud SQL service provides a fully managed relational database service. It supports PostgreSQL and MySQL.
Google also provides the Cloud Spanner service. It also provides a fully managed relational database service. Additionally Cloud Spanner provides a distributed relational database. It is better suited for mission critical systems.

Best Practice for Purging data from db2 sql service

What is the best practice to be used on Bluemix for purging of data from the db2 storage service? Say we want to purge a large amount of data, say a million entries of a particular communication to customers ?
You may look into this tutorial that describes the data purge algorithm for DB2.
http://www.ibm.com/developerworks/data/library/techarticle/dm-1501data-purge-db2/index.html
However, as SQL Database is a fully managed service, you will not be able to follow the exact instructions as described. For example, you will not be able to tune db cfg and dbm cfg for optimal performance. Also note that you will not have access to a shell script, so you may have to enter individual SQL command individually through a client like data studio.
On the other hand, if you are using the DB2 on Cloud service, it would be able to follow the above instructions.

What database replication model do I need?

I am on Rails 3 with a local Postgres database. What we want to do is replicate the entire database onto a second server in real time. We are thinking of using Octopus.
I'm confused about what model I'm looking for and how the master-slave model applies.
Postgres 9.1 and later comes with streaming replication built in (for master-slave configurations). Check out http://www.postgresql.org/docs/9.2/static/warm-standby.html#STREAMING-REPLICATION for more information on configuration and setup.
There are other third-party solutions for configuration, but I'd start there and see if that meets your needs.

GAE Cloud SQL and High Replication Datastore

With HRD and BigTable, you are forced to deal with eventual consistency for all queries that are not ancestor queries. Your code has to be robust enough to cope with the fact that the results may be stale.
With Google's launching of Cloud SQL, they put in a disclaimer: ( https://developers.google.com/cloud-sql/faq#hrapps )
"We recommend that you use Google Cloud SQL with
High Replication App Engine applications. While you can use use
Google Cloud SQL with applications that
do not use high replication, doing so might impact performance."
What does this mean? Does this mean that there are the same eventual consistency issues using SQL with HRD? There is no concept of entity groups in SQL, however could this mean that particular SQL queries in particular circumstances deliver stale results?
This would mean that Google's implementation of the SQL atomic transactional contract would be broken and SQL would not function as users of relational databases would expect.
If this is not the case, what are the concerns for having a master/slave or HRD model with SQL and why would Google give you the option of choosing a model with poorer performance?
(from forum)
The Cloud SQL and Data store systems are independent. You can use one or both as you see fit for your app.
We recommend using HRD apps because that type of app will be colocated with Cloud SQL. Master slave apps are served from a different set of datacenters where cloud sql does not have presence. It will work but it will be slower.
Quotes from documentation:
"Google Cloud SQL is, simply put, a MySQL instance that lives in the cloud. It has all the capabilities and functionality of MySQL"
TO answer your question, the high replication/master-slave options for a relational DB are not to do with consistency but with other factors like latency at peak loads and availability for write when there is a planned maintenance. For a high replication datastore the latency is low even if load spikes, and they are available for write even while there is a maintenance planned. Check the comparison at http://code.google.com/appengine/docs/java/datastore/hr/
And second part of question as to why would google offer a master-slave option which is not full proof. Answer is so that people who don't need complete uptime and want to try out GAE can use it.