lock redis key when two processes are accessing the same key - redis

Application 1 set a value in Redis.
And we have two instance of application 2 which are running and we would like only one instance should read this value from Redis (please note application 2 takes around 30 sec to 1 min process data )
Can Instance-1 application 2 acquire lock redis key which is created by application 1 , so that instance-2 of application 2 will not read and do the same operation ?

No, there's no concept of record lock in Redis. If you need to achieve some sort of locking you have to use another set of data structures to mimic that behavior. For example
List: You can use a list and then POP the item from the list or...
Redis Stream: Using Redis Stream with ConsumerGroup so that each consumer in your Group only sees a portion of the whole data the needs to be processed and it guarantees you that, when an item is delivered to a consumer, it is not going to be delivered to another one.

Related

Storing time intervals efficiently in redis

I am trying to track server uptimes using redis.
So the approach I have chosen is as follows:
server xyz will keep on sending my service ping indicating that it was alive and working in the last 30 seconds.
My service will store a list of all time intervals during which the server was active. This will be done by storing a list of {startTime, endTime} in redis, with key as name of the server (xyz)
Depending on a user query, I will use this list to generate server uptime metrics. Like % downtime in between times (T1, T2)
Example:
assume that the time is T currently.
at T+30, server sends a ping.
xyz:["{start:T end:T+30}"]
at T+60, server sends another ping
xyz:["{start:T end:T+30}", "{start:T+30 end:T+60}"]
and so on for all pings.
This works fine , but an issue is that over a large time period this list will get a lot of elements. To avoid this currently, on a ping, I pop the last element of the list, check if it can be merged with the latest time interval. If it can be merged, I coalesce and push a single time interval into the list. if not then 2 time intervals are pushed.
So with this my list becomes like this after step 2 : xyz:["{start:T end:T+60}"]
Some problems I see with this approach is:
the merging is being done in my service, and not redis.
incase my service is distributed, The list ordering might get corrupted due to multiple readers and writers.
Is there a more efficient/elegant way to handle this , like maybe handling merging of time intervals in redis itself ?

How to define TTL for redis streams?

I have two micro services and I need to implement reliable notifications between them. I thought about using redis streams -
serviceA will send a request to serviceB with an identifier X.
Once serviceB is done doing the work serviceA asked for, it'll create/add to a stream (the stream is specific for X) a new item to let it know it's done.
ServiceA can send multiple requests, each request may have a different identifier. So it'll block for new elements in different streams.
My question is how can I delete streams which are no longer needed, depending on their age. For example I'd like to have streams that were created over a day ago deleted. Is this possible?
If it's not, I'd love to hear any ideas you have as to how not to have unneeded streams in redis.
Thanks
There's no straight forward way to delete older entries based on the TTL/age. You can use a combination of XTRIM/XDEL with other commands to trim the stream.
Let's see how we can use XTRIM
XTRIM stream MAXLEN ~ SIZE
XTRIM trims the stream to a given number of items, evicting older items (items with lower IDs) if needed.
You generate the stream size every day or periodically based on your delete policy and store it somewhere using XLEN command
Run a periodic job that would call XTRIM as
XTRIM x-stream MAXLEN ~ (NEW_SIZE - PREVIOUS_SIZE)
For example, yesterday stream size was 500 now it's 600 then we need to delete 500 entries so we can just run
XTRIM x-stream MAXLEN ~ 100
You can use different policies for deletion for example daily, weekly, twice a week, etc.
XDEL stream ID [ID...]
Removes the specified entries from a stream, and returns the number of entries deleted, that may be different from the number of IDs passed to the command in case certain IDs do not exist.
So what you can do is whenever Service B consumes the event than the service itself can delete the stream entry as service B knows the stream ID, but this will not work as soon as you start using the consumer group. So I would say use Redis set or Redis map to track the acknowledge stream ids and run a periodic sweep job to clean up the stream.
For example
Service A sends a stream item with ID1 to service B
Service B acknowledges the stream item after consuming the items in the map
ack_stream = { ID1: true }, you can track other data e.g. count in case of the consumer group.
A sweep job would run at periodically like 1 AM daily that reads all the elements of ack_stream and filters out all items that require deletion. Now you can call XDEL commands in batch with the set of stream ids.

Apache Pulsar topic replication with increase in cluster size

I want to understand how the namespace/topic replication works in Apache Pulsar and what affect does the change in cluster size have on the replication factor of the existing and new namespaces/topics.
Consider the following scenario:
I am starting with a single node with the following broker configuration:
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=1
# Number of copies to store for each message
managedLedgerDefaultWriteQuorum=1
# Number of guaranteed copies (acks to wait before write is complete)
managedLedgerDefaultAckQuorum=1
After a few months I decide to increase the cluster size to two with the following configuration for the new broker:
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=2
# Number of copies to store for each message
managedLedgerDefaultWriteQuorum=2
# Number of guaranteed copies (acks to wait before write is complete)
managedLedgerDefaultAckQuorum=2
In the above scenario what will be the behaviour of the cluster:
Does this change the replication factor(RF) of the existing topics?
Do newly created topics have the old RF or the new specified RF?
How does the namespace/topic(Managed Ledger) -> Broker ownership work?
Please note that the two broker nodes have different configurations at this point.
TIA
What you are changing is the default replication settings (ensemble, write, ack). You shouldn't be using different defaults on different brokers, because then you'll get inconsistent behavior depending on which broker the client connects to.
The replication settings are controlled at namespace level. If you don't explicitly set them, you get the default settings. However, you can change the settings on individual namespaces using the CLI or the REST interface. If you start with settings of (1 ensemble, 1 write, 1 ack) on the namespace and then change to (2 ensemble, 2 write, 2 ack), then the following happens:
All new topics in the namespace use the new settings, storing 2 copies of each message
All new messages published to existing topics in the namespace use the new settings, storing 2 copies. Messages that are already stored in existing topics are not changed. They still have only 1 copy.
An important point to note is that the number of brokers doesn't affect the message replication. In Pulsar, the broker just handles the serving (producing/consuming) of the message. Brokers are stateless and can be scaled horizontally. The messages are stored on Bookkeeper nodes (bookies). The replication settings (ensemble, write, ack) refer to Bookkeeper nodes, not brokers. Here is an diagram from the Pulsar website that illustrates this:
So, to move from a setting of (1 ensemble, 1 write, 1 ack) to (2 ensemble, 2 write, 2 ack), you need to add a Bookkeeper node to your cluster (assuming you start with just 1), not another broker.

Redis - any way to trigger an event when a value is no longer being actively written to?

I have a use case where I'm streaming and processing live data into an Elasticache Redis cluster. In essence, I want to kick off an event when all events of a certain type have completed (i.e. the size of a value is no longer growing over the course of 60 seconds).
For example:
foo [event1]
foo [event1, event2]
foo [event1, event2]
foo [event1, event2] -> triggers some event if this key/value is constant for 60 seconds.
Is this at all possible?
I would suggest that as part of all "changing" commands also set a key with a 60-second ttl. You can then subscribe to the expiration of that key using redis keyspace notifications.

Autoincrement in Redis

I'm starting to use Redis, and I've run into the following problem.
I have a bunch of objects, let's say Messages in my system. Each time a new User connects, I do the following:
INCR some global variable, let's say g_message_id, and save INCR's return value (the current value of g_message_id).
LPUSH the new message (including the id and the actual message) into a list.
Other clients use the value of g_message_id to check if there are any new messages to get.
Problem is, one client could INCR the g_message_id, but not have time to LPUSH the message before another client tries to read it, assuming that there is a new message.
In other words, I'm looking for a way to do the equivalent of adding rows in SQL, and having an auto-incremented index to work with.
Notes:
I can't use the list indexes, since I often have to delete parts of the list, making it invalid.
My situation in reality is a bit more complex, this is a simpler version.
Current solution:
The best solution I've come up with and what I plan to do is use WATCH and Transactions to try and perform an "autoincrement" myself.
But this is such a common use-case in Redis that I'm surprised there is not existing answer for it, so I'm worried I'm doing something wrong.
If I'm reading correctly, you are using g_message_id both as an id sequence and as a flag to indicate new message(s) are available. One option is to split this into two variables: one to assign message identifiers and the other as a flag to signal to clients that a new message is available.
Clients can then compare the current / prior value of g_new_message_flag to know when new messages are available:
> INCR g_message_id
(integer) 123
# construct the message with id=123 in code
> MULTI
OK
> INCR g_new_message_flag
QUEUED
> LPUSH g_msg_queue "{\"id\": 123, \"msg\": \"hey\"}"
QUEUED
> EXEC
Possible alternative, if your clients can support it: you might want to look into the
Redis publish/subscribe commands, e.g. cients could publish notifications of new messages and subscribe to one or more message channels to receive notifications. You could keep the g_msg_queue to maintain a backlog of N messages for new clients, if necessary.
Update based on comment: If you want each client to detect there are available messages, pop all that are available, and zero out the list, one option is to use a transaction to read the list:
# assuming the message queue contains "123", "456", "789"..
# a client detects there are new messages, then runs this:
> WATCH g_msg_queue
OK
> LRANGE g_msg_queue 0 100000
QUEUED
> DEL g_msg_queue
QUEUED
> EXEC
1) 1) "789"
2) "456"
3) "123"
2) (integer) 1
Update 2: Given the new information, here's what I would do:
Have your writer clients use RPUSH to append new messages to the list. This lets the reader clients start at 0 and iterate forward over the list to get new messages.
Readers need to only remember the index of the last message they fetched from the list.
Readers watch g_new_message_flag to know when to fetch from the list.
Each reader client will then use "LRANGE list index limit" to fetch the new messages. Suppose a reader client has seen a total of 5 messages, it would run "LRANGE g_msg_queue 5 15" to get the next 10 messages. Suppose 3 are returned, so it remembers the index 8. You can make the limit as large as you want, and can walk through the list in small batches.
The reaper client should set a WATCH on the list and delete it inside a transaction, aborting if any client is concurrently reading from it.
When a reader client tries LRANGE and gets 0 messages it can assume the list has been truncated and reset its index to 0.
Do you really need unique sequential IDs? You can use UUIDs for uniqueness and timestamps to check for new messages. If you keep the clocks on all your servers properly synchronized then timestamps with a one second resolution should work just fine.
If you really do need unique sequential IDs then you'll probably have to set up a Flickr style ticket server to properly manage the central list of IDs. This would, essentially, move your g_message_id into a database with proper transaction handling.
You can simulate auto-incrementing a unique key for new rows. Simply use DBSIZE to get the current number of rows, then in your code, increment that number by 1, and use that number as the key for the new row. It's simple and atomic.