Maybe I am missing something, this seems too simple. Is it possible to make redis durable by having a master redis node duplicate data to a slave redis node?
My situation I have a REST endpoint which upon recieving a request from a client sticks the payload it in a redis queue and then returns a success (HTTP 200) to the client. If that queue goes down before the message is processed and before fsync occured, I've lost that payload and no one knows about it.
I was wondering, if instead I could simply write to two redis queues (in different zones) one the master and one the slave. When I write to the 'master' redis will then automatically write the same element in the slave queue and only then does the endpoint return a HTTP 200 to the client.
Is this possible? Redis would (i) need a way to write to a slave and (ii) have a synchronous API or awaitable API which will only return once there is confirmation the payload has been written to both the master and slave. The key here is that redis allows the caller to know that the slave has received the event.
If the client doesn't get a HTTP 200 they know they should try sending it again. Feel like there are caveats I'm not seeing.
Thanks
Is this possible?
Short answer. NO, it's NOT possible.
Redis would (i) need a way to write to a slave
Redis can replicate the data to slave. However, the replication is async, which means it will return response to client before the data has been written to slave.
(ii) have a synchronous API or awaitable API which will only return once there is confirmation the payload has been written to both the master and slave.
Since Redis 3.0, it supports WAIT command, which will block the client until the write operations of this client have been replicated to the given number of slaves.
This might mitigate the problem, at least you can ensure the the write operation have been replicated to serval nodes. However, you still might lose your data. Because the slave might also be down before it persists data to disk.
Related
Is the message order of pubsub messages in a redis cluster in any way guaranteed?
We are using a Redis cluster (v3.2.8) with 5 master nodes, each with one slave connected & we noticed that we sometimes get pubsub messages in wrong order when publishing to one specific master for one specific channel and being subscribed to slave nodes for that channel.
I could not find any statements related to pubsub message order in cluster on redis.io nor on the redis-github repo.
First of all, if you are using PUBLISH, then it is blocking and returns only after messages have been delivered, so yes the order is guaranteed.
There are 2 problematic cases that I see: Pipelining and Client disconnection.
Pipelining
From the documentation
While the client sends commands using pipelining, the server will be forced to queue the replies, using memory.
So, if a queue is used, the order should be guaranteed.
Client disconnection
I can't find it in the documentation, but if the client is not connected or subscribed when the message is published, then it wont receive anything. So in this case, there is no guarantee.
If you need to persist messages, you should use an a list instead.
We have some Redis keys with a given TTL that we would like to subscribe to and take action upon once the TTL expires (a la job scheduler).
This works well in a single-host environment, and when you subscribe in ServiceStack, using its Redis client, to '__keyspace#0__:expired', that service will pick it and take action. That's fantastic...
... until you have a high-availability topology set up, with more than one API instance in that cluster. Then every single host appears to be picking up on that message and potentially doing things with it.
I know keyspace notifications don't work exactly the same as traditional pub/sub or messaging-layer events, but is there a way to perform some kind of acknowledgement on these kinds of events, so that, at the end of the day, only one host will carry on with the task?
Otherwise, is there a way to delay a message publishing?
Thanks!
As describe in https://redis.io/topics/notifications
very node of a Redis cluster generates events about its own subset of the keyspace as described above. However, unlike regular Pub/Sub communication in a cluster, events' notifications are not broadcasted to all nodes. Put differently, keyspace events are node-specific. This means that to receive all keyspace events of a cluster, clients need to subscribe to each of the nodes.
So client should create separate connection to each node to get redis keyspace notification.
My understanding of your question: You need an event based unicast notification whenever a key is expired.
This solution will be helpful to you if above assumption is correct. It's kind of crude solution but works!
Solution:
You need to put(may be using a service/thread) the expired keys in the Redis List/queue. Then blocking B*POP operation from the client instances on this list/queue will give you what you want!
How does it work?
Let's assume, a single background thread will continuously push the expired keys into a redis list/queue. The cluster of API instances will be calling blocking pop on this list/queue.
Since, blocking pop operation on each item of redis list will be consumed by only one client, only one API instance will the get the notification of expired key!!!
Ref:
List pop operation: https://redis.io/commands/lpop
Similar problem with pub/sub: Competing Consumer on Redis Pub/Sub supported?
I'm reading here, and I see a warning stating that PUB/SUB subscribers in Redis should not issue other commands:
A client subscribed to one or more channels should not issue commands,
although it can subscribe and unsubscribe to and from other channels.
I have two questions:
Why is this limitation?
For the scope of the paragraph, what's a client? A whole process? A Redis connection? A complete Redis instance? Or is it a bad idea in general to issue commands and subscribe to channels, and the admonition goes for every and any scope I can think of?
A client, in this case, is an instance of a connection to Redis. An application could well have multiple clients, each with different responsibilities or as a way to provide higher degrees of parallelism to the application.
What they are suggesting here, however, is that you use an individual client (think 'connection') to handle your incoming subscription messages and to react to those messages as its sole responsibility. The reason it's recommended not to make calls with this connection is because while it is waiting on incoming messages from subscribed channels, the client is in a blocked state.
Trying to make a call on a given client won't work while it's awaiting response from a blocking call.
Can someone please explain what is going on behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirrored fashion when publishing to a slave node?
From what I read, it seems that all actions other than publishes go only to the master and the master then broadcasts the effect of the actions to the slaves(this is from the documentation). Form my understanding it means a consumer will always consume message from the master queue. Also, if I send a request to a slave for consuming a message, that slave will do an extra hop by getting to the master for fetching that message.
But what happens when I publish to a slave node? Will this node do the same thing of sending first the message to the master?
It seems there are so many extra hops when dealing with slaves, so it seems you could have a better performance if you know only the master. But how do you handle master failure? Then one of the slaves will be elected master, so you have to know where to connect to?
Asking all of this because we are using RabbitMQ cluster with HAProxy in front, so we can decouple the cluster structure from our apps. This way, whenever a node goes done, the HAProxy will redirect to living nodes. But we have problems when we kill one of the rabbit nodes. The connection to rabbit is permanent, so if it fails, you have to recreate it. Also, you have to resend the messages in this cases, otherwise you will lose them.
Even with all of this, messages can still be lost, because they may be in transit when I kill a node (in some buffers, somewhere on the network etc). So you have to use transactions or publisher confirms, which guarantee the delivery after all the mirrors have been filled up with the message. But here another issue. You may have duplicate messages, because the broker might have sent a confirmation that never reached the producer (due to network failures, etc). Therefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner.
Is there a way of avoiding this? Or I have to decide whether I can lose couple of messages versus duplication of some messages?
Can someone please explain what is going on behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirrored fashion when publishing to a slave node?
This blog outlines exactly what happens.
But what happens when I publish to a slave node? Will this node do the same thing of sending first the message to the master?
The message will be redirected to the master Queue - that is, the node on which the Queue was created.
But how do you handle master failure? Then one of the slaves will be elected master, so you have to know where to connect to?
Again, this is covered here. Essentially, you need a separate service that polls RabbitMQ and determines whether nodes are alive or not. RabbitMQ provides a management API for this. Your publishing and consuming applications need to refer to this service either directly, or through a mutual data-store in order to determine that correct node to publish to or consume from.
The connection to rabbit is permanent, so if it fails, you have to recreate it. Also, you have to resend the messages in this cases, otherwise you will lose them.
You need to subscribe to connection-interrupted events to react to severed connections. You will need to build in some level of redundancy on the client in order to ensure that messages are not lost. I suggest, as above, that you introduce a service specifically designed to interrogate RabbitMQ. You client can attempt to publish a message to the last known active connection, and should this fail, the client might ask the monitor service for an up-to-date listing of the RabbitMQ cluster. Assuming that there is at least one active node, the client may then establish a connection to it and publish the message successfully.
Even with all of this, messages can still be lost, because they may be in transit when I kill a node
There are certain edge-cases that you can't cover with redundancy, and neither can RabbitMQ. For example, when a message lands in a Queue, and the HA policy invokes a background process to copy the message to a backup node. During this process there is potential for the message to be lost before it is persisted to the backup node. Should the active node immediately fail, the message will be lost for good. There is nothing that can be done about this. Unfortunately, when we get down to the level of actual bytes travelling across the wire, there's a limit to the amount of safeguards that we can build.
herefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner.
You can handle this a number of ways. For example, setting the message-ttl to a relatively low value will ensure that duplicated messages don't remain on the Queue for extended periods of time. You can also tag each message with a unique reference, and check that reference at the consumer level. Of course, this would require storing a cache of processed messages to compare incoming messages against; the idea being that if a previously processed message arrives, its tag will have been cached by the consumer, and the message can be ignored.
One thing that I'd stress with AMQP and Queue-based solutions in general is that your infrastructure provides the tools, but not the entire solution. You have to bridge those gaps based on your business needs. Often, the best solution is derived through trial and error. I hope my suggestions are of use. I blog about a number of RabbitMQ design solutions here, including the issues you mentioned, here if you're interested.
In a web application, if I need to write an event to a queue, I would make a connection to redis to write the event.
Now if I want another backend process (say a daemon or cron job) to process the or react the the publishing of the event in redis, do I need a persistant connection?
Little confused on how this pub/sub process works in a web application.
Basically in Redis there are two different messaging models:
Fire and Forget / One to Many: Pub/Sub. At the time a message is PUBLISH-ed all the subscribers will receive it, but this message is then lost forever. If a client was not subscribed there is no way it can get it back.
Persisting Queues / One to One: Lists, possibly used with blocking commands such as BLPOP. With lists you have a producer pushing into a list, and one or many consumers waiting for elements, but one message will reach only one of the waiting clients. With lists you have persistence, and messages will wait for a client to pop them instead of disappearing. So even if no one is listening there is a backlog (as big as your available memory, or you can limit the backlog using LTRIM).
I hope this is clear. I suggest you studying the following commands to understand more about Redis and messaging semantics:
LPUSH/RPUSH, RPOP/LPOP, BRPOP/BLPOP
PUBLISH, SUBSCRIBE, PSUBSCRIBE
Doc for this commands is available at redis.io
I'm not totally sure, but I believe that yes, pub/sub requires a persistent connection.
For an alternative I would take a peek at resque and how it handles that. Instead of using pub/sub it simply adds an item to a list in redis, and then whatever daemon or cron job you have can use the lpop command to get the first one.
Sorry for only giving a pseudo answer and then a plug.