Multiple microservices and Redis - one database vs node per application in cloud - redis

I would like to know what is the best practice for using Redis in cloud (Google Memorystore in my case, Standard Tier) for multiple microservices/applications. From what I have researched so far following options are available:
Use single cluster and database, scaled horizontally for all the microservices. This seems most cost-effective as I will use the exact amount of nodes I will need for the whole system. The data isolation is impacted here, but I can reduce the impact e.g. by prefixing the keys with the microservice name.
Use separate clusters and databases for each microservice. In this case the isolation is better, the scaling of the needed cluster will impact a single microservice only, but this doesn't seem cost effective, as many nodes may be underloaded (e.g. microservice M1 utilizes 50% capacity of a node, microservice M2 utilizes 40% capacity of a node so in case 1 both microservices would by served only by a single node).
In theory I could use multiple databases to isolated data in a single cluster, but as far as I have read this is not supported by Redis (and using multiple databases on a single node causes performance issues).
I am leaning towards option 1., but perhaps I am missing something?

Not sure about best practices, I will tell you my experience.
In general I would go with Option #2.
Each microservices gets it's own redis instance or cluster.
Redis clusters follow their own microservice life. Ex they might get respawned when you redeploy or restart a service.
You might pay a bit more but you gain in resiliency and maintenance hassle.

Related

Load balanced instances of Moqui using the same DB instance

Is this configuration in Moqui possible? Everything I've seen on the subject of multiple instances (e.g. this question and the framework doc pages) involves per-instance databases, rather than a common shared data set.
We need the same data available in each application instance (and a consistent cache) so that we can load balance end-users across multiple instances. We will be supporting users world-wide, so we may potentially need to create application instances closer to the user's actual location in order to reduce latency; we also want to ensure we can make best use of elastic horizontal scaling in cloud-based deployments.
Multi-tenant and the newer multi-instance variation on that are the opposite of what you're looking for. They are for large numbers of small instances, not a single large distributed instance with multiple application server instances running against the same database.
For clustering support by default Moqui uses Hazelcast, though that is done through a series of interfaces that can be implemented with other distributed computing tools. Here is the component needed to run a multi-server cluster with Hazelcast:
https://github.com/moqui/moqui-hazelcast
The most important aspects of clustering are cache invalidation for the entity (database) caches and web session replication. It also supports other tools for distributing workload and data as mentioned in the readme.
For distribution across multiple data centers or geographical regions there are much bigger issues. Moqui Framework is primarily for transactional applications like accounting, inventory management, etc that need strict transactional consistency. Big data or NoSQL style eventual consistency and other similar approaches do not do well with ERP and other transactional applications, there is no way to use locks and such in the database to protect against double spend of funds, double reservation or issuance of inventory, etc.
Consider the challenge of distributed relational transactional databases, ie multi-master database clusters. With multi-master setups a transaction must propagate to and commit on all master nodes before it can be considered committed. This has performance impacts even if all master nodes are on the same local network, and unreasonable performance impact if the master nodes are in different data centers or geographical regions.
The main solution to this is geographical sharding at the application level, usually mirroring the structure of a large business with geographic divisions. Moqui has some tool level support for this sort of thing using Entity Sync or other tools to feed data from geographic regions to a central server (or cluster) where reporting, etc can be done. There is no OOTB Entity Sync or other configuration for this sort of deployment, it's not something there has been demand for yet. This only makes sense for extremely large global corporations, not a market where Moqui has any use to my knowledge.
If you're looking at doing something like ecommerce and need the ecommerce sites distributed more widely the problem is easier than coordinating inventory or accounting across multiple global entities. For that just have separate ecommerce instances in different data centers feeding order/etc data to the Moqui ERP instance, very much like any typical external ecommerce application.

Does Redis support strong consistency

I am looking at porting a Java application to .NET, the application currently uses EhCache quite heavily and insists that it wants to support strong consistency (http://ehcache.org/documentation/get-started/consistency-options).
I am would like to use Redis in place of EhCache but does Redis support strong consistency or just support eventual consistency?
I've seen talk of a Redis Cluster but I guess this is a little way off release yet.
Or am I looking at this wrong? If Redis instance sat on a different server altogether and served two frontend servers how big could it get before we'd need to look at a Master / Slave style affair?
A single instance of Redis is consistent. There are options for consistency across many instances. #antirez (Redis developer) recently wrote a blog post, Redis data model and eventual consistency, and recommended Twemproxy for sharding Redis, which would give you consistency over many instances.
I don't know EhCache, so can't comment on whether Redis is a suitable replacement. One potential problem (porting to .NET) with Twemproxy is it seems to only run on Linux.
How big can a single Redis instance get? Depends on how much RAM you have.
How quickly will it get this big? Depends on how your data looks.
That said, in my experience Redis stores data quite efficiently. One app I have holds info for 200k users, 20k articles, all relationships between objects, weekly leader boards, stats, etc. (330k keys in total) in 400mb of RAM.
Redis is easy to use and fun to work with. Try it out and see if it meets your needs. If you do decide to use it and might one day want to shard, shard your data from the beginning.
Redis is not strongly consistent out of the box. You will probably need to apply 3rd party solutions to make it consistent. Here is a quote from docs:
Write safety
Redis Cluster uses asynchronous replication between nodes, and last failover wins implicit merge function. This means that the last elected master dataset eventually replaces all the other replicas. There is always a window of time when it is possible to lose writes during partitions. However these windows are very different in the case of a client that is connected to the majority of masters, and a client that is connected to the minority of masters.
Usually you need to have synchronous replication to achieve strong consistence in a distributed partitioned systems.

Are services, such as redis or activemq, also highly available?

I have a doubt that every service could be also highly available.
I want to use redis and activemq service and I want to avoid single point of failure. I also need to keep writing data to the redis and activemq server.
I found many articles about MySQL high availability, but only a few about other database solutions, so my question is if there is a common high availability solution suite for many products?
High availability is one of the principles in CAP theorem and many NoSQL database systems favor rather availability at the expense of data consistency. Replication is often used to achieve high availability for reads, but writes might depend on the type replication being used. Try to look at current redis replication docs or upcoming redis cluster presentation for more information on this stuff.

Distributed Database Computing - Is it really possible within the RDBMS paradigm?

I am asking this in the context of NoSQL - which achieves scalability and performance without being expensive.
So, if I needed to achieve massively parallel distributed computing across databases ...
What are the various methodologies available today (within the RDBMS paradigm) to achieve distributed computing with high-scalability?
Does database clustering & mirroring contribute in any way towards distributed computing?
I guess you are asking about scalability of RDBMS databases. Talking about NoSQL databases based on ( amazon dynamo, BigTable ) are a whole another topic. I am talking about HBase, Cassandra etc. There are also commerical products like Oracle Coherence thats more like a distributed cache and key value store , to put it crudely.
going back to rdbms,
Sharding
to scale RDBMS one can do cusstom sharding. Sharding is a technique where you have multiple table is possibly multiple hosts. And then you decide in a certain fashion to assign certain rows to certain tables. For example you can say that rows 1-1M goes to table1, 1M-2M goes to table2 etc. But, this is a difficult process from an administration point of view. A lot of large scale websites scale by relying on sharding. Other techniques worth mentioning are partioning and mysql federation and mysql cluster.
MPP databases
Then there are databases are there very RDBMS which does distribution and scaling for you. Terradata is the most successful of these companies. I believe they used postgres core code at some point. A significant number of fortune 500 companies and a lot of the airlines use Terradata. But, its ridiculously expensive. There are newer companies like greenplum, vertica, netezza.
Unless you're a very big company with extreme scalability requirements, you can horizontally and ACID scale up your DB by building a cluster of identical RDBMS instances and synchronizing them with JTA transactions.
Take a look to this Java/JDBC based article the JEPLayer framework is used but you can use straight JDBC and JTA code.
Within the RDBMS paradigm: Sharding.
Outside the RDBMS paradigm: Key-value stores.
My pick: (I come from an RDBMS background) Key-value stores of the tabluar type - HBase.
Within the RDBMS paradigm, sharding will not get you far.
Use the RDBMS paradigm to design your model, to get your project up and running.
Use tabular key-value stores to SCALE OUT.
Sharding:
A good way to think about sharding is to see it as user-account-oriented
DB design.
The all schema entities touched by a user-account are kept on one host.
The assignment of user to host happens when the user creates an account.
The least loaded host gets that user.
When that user signs on after account creation, he gets connected
to the host that has his data.
Each host has a set of user accounts.
The problem with this approach is that if the host gets hosed,
a fraction of users will be blacked out.
The solution to this is have a replicated standby host that
becomes the primary when the primary host encounters problems.
Also, it's a fairly rigid setup for processes where the design does
not change dramatically.
From the user standpoint, I've noticed that web sites
with a sharded DB backend are not as quick to "turn on a dime"
to create different business models on their platform.
Contrast this with web sites that have truly distributed
key-value stores. These businesses can host any range of
services. Their platform is just that - a platform.
It's not relational and it does have an API interface,
but it just seems to work.

Is there a way to shard and replicate neo4j data?

I am considering the option of neo4j for some of the new projects I am working for. For the given data needs (inherently graph based) neo4j fits well and a quick prototype is giving good response time for me. What I want to understand is how to scale a neo4j deployment. Specifically:
How do I shard my data across neo4j deployments. Since neo4j is deployed on a single machine, there is a limit to how much data I can store in a single machine and hence I would like to know how to distribute it. Clearly if I split it on users, then relationships between disconnected users (across shards) cannot be maintained.
How do I replicate the neo4j data? I am potentially thinking of putting up a sql-like-setup with masters used for write and slaves used for reads so that we can both scale up our potentially readers and writers, but also have a backup of our data in real time. I understand that all the neo4j data is stored in a filesystem - which is inherently non-replicatable. Is there a way I can do it here? Perhaps, something akin to a mysql bin log?
sharding is as of now not handled by Neo4j itself, but by the domain, much as you describe. Neo4j 2.0 is going to target that problem.
For replication, Online Backup is working and real High Availability with Master failover is in the works, using ZooKeeper to track the cluster nodes and elect new masters, etc.
Any more details on your app sharding requirements? What domain etc?