Hortonworks schema registry cluster mode - schema

I'm using Hortonworks schema registry with NIFI and things are working fine. I have installed Hortonworks schema registry on a single node and I'm afraid if that machine goes down what will happen to my NIFI flows. I have seen in Hortonworks schema registry architecture that we can use mysql, PostgreSql and In-Memory storage for storing schema. AFAIK none of them are distributed system. Is there any way to achieve cluster mode for high availability?

Sure, you can do active-active or active-passive replication for MySQL and Postgres, but that is left up to you to implement, as Hortonworks will likely forward you to the respective documentation on each tool, and that is the reason why the documentation for these tools doesn't guide you towards these design decisions in itself, as you should be aware of the drawbacks when having a SPoF
The Schema Registry itself is just a web-app, so you could put it behing your favorite reverse proxy, or within a container orchestrator, such as Docker support in HDP 3.x

Related

Keep ActiveMQ running when losing connection to database

I have an instance of ActiveMQ 5.16.4 running that is using MySQL as a persistent data storage. Recently the MySQL server had some issues, and ActiveMQ lost its connection to MySQL. That caused multiple Spring microservices to throw errors because ActiveMQ wasn't working.
Is it possible to have master/slave ActiveMQ running where master and slave uses separate persistence storage?
I have done some research and found "pure master slave", but it says that it is deprecated and not recommend to use and will be removed in 5.8. It says to use shared storage which I am trying to avoid (cause my problem is what if storage itself is down).
What are my options to keep running ActiveMQ if it loses connection to database?
If you're using ActiveMQ "Classic" (i.e. 5.x) then your only option is to use shared storage between the master and the slave. This could be a shared file system or a relational database. This, of course, is a single point of failure.
However, there are both file system and database technologies that can mitigate this risk. For example you could use a replicated file system (e.g. Ceph or GlusterFS) or a replicated database (e.g. MySQL).
You might also consider using ActiveMQ Artemis (i.e. the next-generation broker from ActiveMQ) which supports replication natively.

Roll back Gcloud Redis upgrade

I like to upgrade the redis memorystore instance in our gcloud because 5.x (at least in Github) appears to have reached its end of life. It's being use for simple key value pairs, so I don't expect anything unexpected during the upgrade to 6.x. However management is nervous and wants a way to rollback the upgrade if there are issues. Is there a way to do this? The documentation appears to say that rollback is not possible. I plan to do the usual backup and then upgrade. The instance is just the basic.
In order to Upgrade the redis memorystore instance, follow the best practices mentioned in the Public Documentation as the following :
We recommend exporting your instance data before running a version upgrade operation.
Note that upgrading an instance is irreversible. You cannot downgrade the Redis version of a Memorystore for a Redis instance.
For Standard Tier instances, to increase the speed and reliability of your version upgrade operation, upgrade your instance during
periods of low instance traffic. To learn how to monitor instance
traffic, see Monitoring Redis instances.
As mentioned in the documentation which recommends you to enable RDB Snapshots.
Memorystore for Redis is primarily used as an in-memory cache. When
using Memorystore as a cache, your application can either tolerate
loss of cache data or can very easily repopulate the cache from a
persistent store.
However, there are some use cases where downtime for a Memorystore
instance, or a complete loss of instance data, can cause long
application downtimes. We recommend using the Standard Tier as the
primary mechanism for high availability. Additionally, enabling RDB
snapshots on Standard Tier instances provides extra protection from
failures that can cause cache flushes. The Standard Tier provides a
highly available instance with multiple replicas, and enables fast
recovery using automatic failover if the primary fails.
In some scenarios you may also want to ensure data can be recovered
from snapshot backups in the case of catastrophic failure of Standard
Tier instances. In these scenarios, automated backups and the ability
to restore data from RDB snapshots can provide additional protection
from data loss. With RDB snapshots enabled, if needed, a recovery is
made from the latest RDB snapshot.
For more information, you can refer to the documentation related to version upgrade behavior.

What's Amazon Web Services *native* offering is closest to Apache Kudu?

I am looking for a native offering, such as any of the RDS solutions, Elastic Cache, Amazon Redshift, not something that I would have to host myself.
From the Apache Kudu: https://kudu.apache.org/ :
Kudu provides a combination of fast inserts/updates and efficient columnar
scans to enable multiple real-time analytic workloads across a single storage
layer. As a new complement to HDFS and Apache HBase, Kudu gives architects the
flexibility to address a wider variety of use cases without exotic workarounds.
As I understand it, Kudu is a columnar distributed storage engine for tabular data that allows for fast scans and ad-hoc analytical queries but ALSO allows for random updates and inserts. Every table has a primary key that you can use to find and update single records...
Second answer after question was revised.
The answer is Amazon EMR running Apache Kudu.
Amazon EMR is Amazon's service for Hadoop. Apache Kudu is a package that you install on Hadoop along with many others to process "Big Data".
If you are looking for a managed service for only Apache Kudu, then there is nothing. Apache Kudu is an open source tool that sits on top of Hadoop and is a companion to Apache Impala. On AWS both require Amazon EMR running Hadoop version 2.x or greater.

Postgresql9.6 Sharding

I have a database topology as in the picture attached below and postgresql9.6 is installed in all the machines. It is master-slave architecture with sharding. I have successfully configured master-slave replication and automatic failover using repmgr/repmgrd. But i am confused on how to achieve sharding in this scenario. I have tried citus extension, but for the table structure i have citus does not support sharding, here is the link. Can anyone suggest me how can i achieve sharding in this scenario.
Database topology:

minio: What is the cluster architecture of minio.io object storage server?

I have searched minio.io for hours but id dosn't provide any good information about clustering, dose it has rings and instance are connected? or mini is just for single isolated machine. And for running a cluster we have to run many isolated instance of it and the our app choose to which instance we write?
if yes:
When I write a file to a bucket does minio replicate it between multi server?
I is it like amazon s3, or openstack swift that support of storing multi copy of object in different servers (and not multi disk on the same machine).
Here is the document for distributed minio: https://docs.minio.io/docs/distributed-minio-quickstart-guide
From what I can tell, minio does not support clustering with automatic replication across multiple servers, balancing, etcetera.
However, the minio documentation does say how you can set up one minio server to mirror another one:
https://gitlab.gioxa.com/opensource/minio/blob/1983925dcfc88d4140b40fc807414fe14d5391bd/docs/setup-replication-between-two-sites-running-minio.md
Minio also Introduced Continuous Availability and Active-Active Bucket Replication. CheckoutTheir active-active Replication Guide