Why does Redis, a datastore, have Pub/Sub features? My first thought is that it's the wrong layer to implement such a thing. But maybe I need to think outside the box.
Redis is defined as data structure server. Redis provides multiple functionality like memcache, queue, pubsub etc. This is very useful for a cloudapp/webstack where 3 components RabbitMQ(queuing) + XMPP(pubsub) + Memcache can be currently replaced with redis. Queuing is not as feature rich as RabbitMQ though.
That would be true if it was about feeds for end users to subscribe to. Actually it's closer to the concept of events or database triggers - a process that knows the internals of the datastore keeps a connection open and does something when a change happens.
Related
I have a use case in which Microservice A has to do some heavy computation periodically and stores the result in Cache (redis) - something like k8s cron job.
Microservice B depends on the Cache written by A.(B only reads. never modifies cache).
But it looks like db is being shared here. Is this a good design?
(This aws doc shows 2 different services using same redis)
The contents of redis should be treated as ephemeral, not permanent. It's a cache. There is nothing wrong with your design as long as your microservices, especially Microservice B, behave gracefully if they do not find what they expect in redis.
This is actually a very common practice in projects using Redis (for example its the exact way you setup Redis to act as a message broker. One end writes the message and the other reads it.)
Databases are meant do be shared, especially in the modern days where a program can consist of hundreds of micro-parts.
You shouldn't have any issues related to Redis, BUT you HAVE to implement a fallback mechanism for Micro-service B, to handle the case in which no value is found, for example using a timeout and then read again, or getting some default value and using that.
Say I have like 1000 VMs with different services running on them with different technologies used like python, NET, java and different middleware like rabbitmq, redis etc.
How can I dynamically handle the interactions between the services and provide scalability?
For Example, say I have Service A which is pushing Data to a rabbitmq then the data is processed by service B while fetching additional data from Service C. You see at the end I have a decentralized system which is pulling data somewhere and pushing it somewhere else... a total mess! Scale it up to 2000 microservices omg XD.
The moment I change one thing a lot of other systems are affected.
Do you know something maybe like an ESB where I can couple two services together with a message transform adapter in the middle of it and I can change dependenciesat runtime? Like the stream doesn't end in service F anymore and does end in G for example?
I think microservices are a good idea because they can be stateless, can scale, can easily be deployed as a container. But I don't know a good tool/program for managing the data flow. The rabbitmq doesn't support enough enterprise integration patterns. Do you have any advice?
How can I dynamically handle the interactions -
See if using an existing EIP pattern solves your problem to implement the logistics
Depending on how your design shapes up, you may need to use Distributed Lock Management
Or maybe your application is simple enough to use a Consul K/V store as a semaphore & a simple mosquitto topic based bus.
Provide scalability
What is the solution you are trying to scale? AMQP, Consul, "microservices" in themselves are very scalable & distributed
However, to scale your thought process & devops, you need to find a way to see things as patterns that help you split the problem & tackle the complexity
Do you know something maybe like an ESB where I can couple two services together with a message transform adapter in the middle of it and I can change dependenciesat runtime?
Read up on EIP. ESBs are just one of the many ways you can solve your problem. RTFM, & get some perspective.
But I don't know a good tool/program for managing the data flow.
Ask yourself if your problem is related to distributed workflow management, or if a data pipeline is what you are really looking for
Look at Spark, Storm, Luigi, Airflow - they all have a different purpose - but you will know what to do with them if you manage to read up on everything else in this post ;)
Currently I'm working on a distributed test execution and reporting system. I'm planning to use Redis PUB/SUB as a message queue and message distribution system.
I'm new to Redis, so I'm trying to read as many docs as I can and play around with it. One of the most important topics is high availability. As I said, I'm not an expert, but I'm aware of the possible options - using Sentinel, replication, clustering, etc.
What's not clear for me is how the Pub/Sub feature and the HA options are related each other. What's the best practice to build a reliable messaging system with Redis? By reliable I mean if my Redis message broker is down there should be some kind of a backup node (a slave?) that should be able to take over this role.
Is there a purely server-side solution? Or do I need to create a smart wrapper around the Redis client to handle this? Will a Sentinel-driven setup help me?
Doing pub sub in Redis with failover means thinking about additional factors in the client side. A key piece to understand is that subscriptions are per-connection. If you are subscribed to a channel on a node and it fails, you will need to handle reconnect and resubscribe. Because subscriptions are done at the connection level it is not something which can be replicated.
Regarding the details as to how it works and what you can expect to see, along with ways around it see a post I made earlier this year at https://objectrocket.com/blog/how-to/reliable-pubsub-and-blocking-commands-during-redis-failovers
You can lower the risk surface by subscribing to slaves and publishing to the master, but you would then need to have non-promotable slaves to subscribe to and still need to handle losing a slave - there is just as much chance to lose a given slave as there is a master.
IMO, PUB/SUB is not a good choice, may be disque (comes from antirez, author of the Redis) fits better:
Disque, an in-memory, distributed job queue
I was just wondering why would you use a something like RabbitMQ instead of a persistent store especially a document store like MongoDB? Arent they kinda the same? Whats the benefit of something like RabbitMQ over a database?
Would anyone who used something like RabbitMQ elaborate on the benefits?
RabbitMQ is a message broker software aka a queue and not a NoSql database!
While the trend goes towards storing more and more data in scaled-up queues as well as processing data at real time and thus obliterating the need for additional data storage, queues are not to be confused with databases:
most queues don't persist data indefinitely.
the data in queues is not available on demand by the use of queries, but accessed via an automatically triggered consumer mechanism.
the architectural intention behind queues differs tremendously from that of databases. Their purpose in a system's architecture is not data storage, but system integration and data distribution. For more good information on queue architecture, please check this article from the Kafka guys.
I have a question that is bugging me quite heavily. What is the Redis pub/sub feature actually used for? I can only think of inter-process communication over TCP (either locally or distributed), however not much else.
Can someone please prove me wrong.
It's an easy way to plug into an event stream, generally between processes or machines. For instance, an user creates a published event. One process handles updating the database from the event, another updates user stats, another global stats, another updates the text search database, etc. They're all loosely coupled by subscribing to the channel. You can add new processes for testing updates and monitoring the system. It's a little different from a message queue in that there's no storing messages until they're processed, but Redis has other structures for those sorts of jobs.
a real use case in my experience.
Lets say you have a web application deployed on 4 different servers(nodes,virtual machines) mostly on your virtual private cloud.
The web application maintains an in memory java map for its static data cache which occasionally changes .
Now every time the data changes in your database you would need all your servers to update there own in memory caches,this is the problem.
one way is to maintain all the static data in redis or any other cache on a separate server and the cache updates based on a scheduler.But here to access the static content which occasionally changes you need a scheduler and a separate cache server like redis or memcached etc. and each server points to this external cache.
Using pubsub of redis here:
all servers subscribe to redis channel and if redis publishes the message when ever there is an update,addition,deletion of the data as a message to all of its subscribers.On receiving the message object and its type of update(ADD,REMOVED,UPDATED) each server updates its in memory static data map.