Is there any option to use redis.expire more elastically? - redis

I got a quick simple question,
Assume that if server receives 10 messages from user within 10 minutes, server sends a push email.
At first I thought it very simple using redis,
incr("foo"), expire("foo",60*10)
and in Java, handle the occurrence count like below
if(jedis.get("foo")>=10){sendEmail();jedis.del("foo");}
but imagine if user send one message at first minute and send 8 messages at 10th minute.
and the key expires, and user again send 3 messages in the next minute.
redis key will be created again with value 3 which will not trigger sendEmail() even though user send 11 messages in 2 minutes actually.
we're gonna use Redis and we don't want to put receive time values to redis.
is there any solution ?

So, there's 2 ways of solving this-- one to optimize on space and the other to optimize on speed (though really the speed difference should be marginal).
Optimizing for Space:
Keep up to 9 different counters; foo1 ... foo9. Basically, we'll keep one counter for each of the possible up to 9 different messages before we email the user, and let each one expire as it hits the 10 minute mark. This will work like a circular queue. Now do this (in Python for simplicity, assuming we have a connection to Redis called r):
new_created = False
for i in xrange(1,10):
var_name = 'foo%d' % i
if not (new_created or r.exists(var_name)):
r.set(var_name, 0)
r.expire(var_name, 600)
new_created = True
if not r.exists(var_name): continue
r.incr(var_name, 1)
if r.get(var_name) >= 10:
send_email(user)
r.del(var_name)
If you go with this approach, put the above logic in a Lua script instead of the example Python, and it should be quite fast. Since you'll at most be storing 9 counters per user, it'll also be quite space efficient.
Optimizing for speed:
Keep one Redis Sortet Set per user. Every time a user sends a message, add to his sorted set with a key equal to the timestamp and an arbitrary value. Then just do a ZCOUNT(now, now - 10 minutes) and send an email if that's greater than 10. Then ZREMRANGEBYSCORE(now - 10 minutes, inf). I know you said you didn't want to keep timestamps in Redis, but IMO this is a better solution, and you're going to have to hold some variant on timestamps somewhere no matter what.
Personally I'd go with the latter approach because the space differences are probably not that big, and the code can be done quickly in pure Redis, but up to you.

Related

Storing time intervals efficiently in redis

I am trying to track server uptimes using redis.
So the approach I have chosen is as follows:
server xyz will keep on sending my service ping indicating that it was alive and working in the last 30 seconds.
My service will store a list of all time intervals during which the server was active. This will be done by storing a list of {startTime, endTime} in redis, with key as name of the server (xyz)
Depending on a user query, I will use this list to generate server uptime metrics. Like % downtime in between times (T1, T2)
Example:
assume that the time is T currently.
at T+30, server sends a ping.
xyz:["{start:T end:T+30}"]
at T+60, server sends another ping
xyz:["{start:T end:T+30}", "{start:T+30 end:T+60}"]
and so on for all pings.
This works fine , but an issue is that over a large time period this list will get a lot of elements. To avoid this currently, on a ping, I pop the last element of the list, check if it can be merged with the latest time interval. If it can be merged, I coalesce and push a single time interval into the list. if not then 2 time intervals are pushed.
So with this my list becomes like this after step 2 : xyz:["{start:T end:T+60}"]
Some problems I see with this approach is:
the merging is being done in my service, and not redis.
incase my service is distributed, The list ordering might get corrupted due to multiple readers and writers.
Is there a more efficient/elegant way to handle this , like maybe handling merging of time intervals in redis itself ?

StackExchange.Redis - events from last N minutes

I am quite struggling with task to keep track about user interactions with articles for past N minutes.
Client that I have to use to access Redis instance is StackExchange.Redis.
Example:
User likes Article#111.
When API makes request, I have to know exact number of times Article#111 was liked for the past N minutes.
For now, let's say that N=10.
Any guidance in solving this is appreciated :)
you can use sorted set for that.
you can add to a key like article:<id>:<interactionType> (interactionType if you have multiple interactions) and value is <userId>
in order to get the no. of times article 1 was like in the past N minutes, you can do
ZCOUNT article:1:likes <last-N-minutes-linux-timestamp> <current-time-stamp>

REDIS usecase using large keys with small values

I have a use-case for using redis that is a little bit different.
In my MySQL I have an entity, let's call it HumanEntity. this HumanEntity has many to many relations.
HumanEntity.Urls - Many URLs per HumanEntity.
HumanEntity.UserNames - Many UserNames per HumanEntity.
HumanEntity.Phones ...
HumanEntity.Emails ...
in a normal one hour, the application creates hundreds of these many values.
The use-case is that, the application receives an HTTP call (100 per one second) with a HumanEntity value (Url or UserName or Phone or Email).
I need to scan my MySQL (1,000,000 records) and return back the HumanEntity.Id(integer) .
Since its ok to have some latency in the data integrity I thought about REDIS.
Can I store the values as a Redis key and the and the HumanEntity.Id(integer) as the value.
My API needs to return back the HumanEntity.Id(integer).
does it make sense to have such long key and such short value? The URL, for example, maybe 1500 bytes and the value can be 1 byte.
What is the best redis method to implement that?
Thanks
If the values are not unique then you may have some problem. Phones, emails or usernames maybe unique for user but i am not sure about url or any other property stored in your database. You may overwrite the value of an identifier with another user's.
If you don't have any problem like that; you may proceed with string types, Time complexity of GET and SET is O(1) - that's the best you may get.
In some cases such as checking whether the user used any coupon, you may use long(let's say 64 chars) user id as key, and 1 as value and use EXISTS to determine it. So it's valid to use long key and short value.

bitcoin block solving, all nonces used but no hit

I'm trying to understand how bitcoin block solving attempts works.
I see a nonce is a 32-bit number, so around 4 billion values to try.
Also, I saw a famous mining pool having 500 Ph/s power at hand. And I found there one particular block solved in 40 minutes.
So, that is (40 x 3600) x (500 x 10^15) = 7.2 x 10^22 hashes calculated
on that pool, to solve one block.
That means the nonces has been "cycled" 16763 billion times during those 40 minutes.
So I'm wondering what are those 16763 billion more things done after each nonce cycle? ("1 cycle of nonces" is going from 0 to 4294967295) ?
I see that we can change the timestamp at a certain proportion, and the merkel root hash also.
Aren't merkel hashes and timestamps more strict to calculate and use than nonces?
Those 16763 billion things are changes of the timestamp and merkel only? Can we have as much unique merkel hashes re-generated and timestamps changes as needed?
Can you give me examples? sorry if my view is a bit biased, I'm starting with this.
Apparently, I've found that when the nonces have cycled (overflow), an extraNonce value is incremented, and that requires the Merkel hash to be recalculated based on that extraNonce value.
a link here

Autoincrement in Redis

I'm starting to use Redis, and I've run into the following problem.
I have a bunch of objects, let's say Messages in my system. Each time a new User connects, I do the following:
INCR some global variable, let's say g_message_id, and save INCR's return value (the current value of g_message_id).
LPUSH the new message (including the id and the actual message) into a list.
Other clients use the value of g_message_id to check if there are any new messages to get.
Problem is, one client could INCR the g_message_id, but not have time to LPUSH the message before another client tries to read it, assuming that there is a new message.
In other words, I'm looking for a way to do the equivalent of adding rows in SQL, and having an auto-incremented index to work with.
Notes:
I can't use the list indexes, since I often have to delete parts of the list, making it invalid.
My situation in reality is a bit more complex, this is a simpler version.
Current solution:
The best solution I've come up with and what I plan to do is use WATCH and Transactions to try and perform an "autoincrement" myself.
But this is such a common use-case in Redis that I'm surprised there is not existing answer for it, so I'm worried I'm doing something wrong.
If I'm reading correctly, you are using g_message_id both as an id sequence and as a flag to indicate new message(s) are available. One option is to split this into two variables: one to assign message identifiers and the other as a flag to signal to clients that a new message is available.
Clients can then compare the current / prior value of g_new_message_flag to know when new messages are available:
> INCR g_message_id
(integer) 123
# construct the message with id=123 in code
> MULTI
OK
> INCR g_new_message_flag
QUEUED
> LPUSH g_msg_queue "{\"id\": 123, \"msg\": \"hey\"}"
QUEUED
> EXEC
Possible alternative, if your clients can support it: you might want to look into the
Redis publish/subscribe commands, e.g. cients could publish notifications of new messages and subscribe to one or more message channels to receive notifications. You could keep the g_msg_queue to maintain a backlog of N messages for new clients, if necessary.
Update based on comment: If you want each client to detect there are available messages, pop all that are available, and zero out the list, one option is to use a transaction to read the list:
# assuming the message queue contains "123", "456", "789"..
# a client detects there are new messages, then runs this:
> WATCH g_msg_queue
OK
> LRANGE g_msg_queue 0 100000
QUEUED
> DEL g_msg_queue
QUEUED
> EXEC
1) 1) "789"
2) "456"
3) "123"
2) (integer) 1
Update 2: Given the new information, here's what I would do:
Have your writer clients use RPUSH to append new messages to the list. This lets the reader clients start at 0 and iterate forward over the list to get new messages.
Readers need to only remember the index of the last message they fetched from the list.
Readers watch g_new_message_flag to know when to fetch from the list.
Each reader client will then use "LRANGE list index limit" to fetch the new messages. Suppose a reader client has seen a total of 5 messages, it would run "LRANGE g_msg_queue 5 15" to get the next 10 messages. Suppose 3 are returned, so it remembers the index 8. You can make the limit as large as you want, and can walk through the list in small batches.
The reaper client should set a WATCH on the list and delete it inside a transaction, aborting if any client is concurrently reading from it.
When a reader client tries LRANGE and gets 0 messages it can assume the list has been truncated and reset its index to 0.
Do you really need unique sequential IDs? You can use UUIDs for uniqueness and timestamps to check for new messages. If you keep the clocks on all your servers properly synchronized then timestamps with a one second resolution should work just fine.
If you really do need unique sequential IDs then you'll probably have to set up a Flickr style ticket server to properly manage the central list of IDs. This would, essentially, move your g_message_id into a database with proper transaction handling.
You can simulate auto-incrementing a unique key for new rows. Simply use DBSIZE to get the current number of rows, then in your code, increment that number by 1, and use that number as the key for the new row. It's simple and atomic.