StackExchange.Redis - events from last N minutes - redis

I am quite struggling with task to keep track about user interactions with articles for past N minutes.
Client that I have to use to access Redis instance is StackExchange.Redis.
Example:
User likes Article#111.
When API makes request, I have to know exact number of times Article#111 was liked for the past N minutes.
For now, let's say that N=10.
Any guidance in solving this is appreciated :)

you can use sorted set for that.
you can add to a key like article:<id>:<interactionType> (interactionType if you have multiple interactions) and value is <userId>
in order to get the no. of times article 1 was like in the past N minutes, you can do
ZCOUNT article:1:likes <last-N-minutes-linux-timestamp> <current-time-stamp>

Related

Check the current position in a Redis list of some list element

I have a simple job queue on Redis where new jobs are pushed with RPUSH and consumed with BLPOP. The jobs are stringified JSON objects that have an id field among other things (the json string is parsed by the workers).
Each job takes some time to do, so there can be a meaningful wait time. I'd like to be able to find a job's current position in the queue, so that I can give an update to whatever is waiting on that job. That is, be able to do something like "your current position is 300... 250... 200... 100... 10... your job is now being processed".
It can be assumed that the list may grow long but never too long, i.e. possibly 1000 entries but not 1 million.
After looking through the docs a bit, it seems like this is maybe easier said than done. A possible naive solution seems to be to just loop through the list until the element is found. Are there any performance issues with calling LINDEX a couple hundred times at a time like that?
Would appreciate any suggestions on other ways this can be done (or confirmation that LINDEX is the only way). The whole structure (even the usage of a list, or addition of some helper map/list) can be changed if needed, only requirement is that it run on Redis.
You can use a sorted set and a counter to more elegantly solve the problem.
Push a job
Call INCR counter to get a counter.
Use the counter as score of the job, and call ZADD jobs counter job-name
Pop a job
Call BZPOPMIN jobs to get the first unprocessed job.
Get job position
Call ZRANK jobs job-name to get the rank of the job, e.g. the current position of the job.

Implementing bursts or spikes detection in a counter using Redis

I wanna use Redis to keep track of certain numbers. Basically, they're counters. Is there a way to use Redis to sort of track the rate at which these counters increase?
For example, let's say a counter is being incremented at a rate of 10 per minute for the most of the time but suddenly it's being incremented at a rate of 40 per minute. How can I detect that?
You cannot do that directly, but you can do that with a sorted set for example, with a bit of client side, or Lua based processing.
Let's say that you use a sorted set, for each time window you increment the value:
ZINCRBY mykey timestamp 1
Then you have a simple counter per timestamp.
When you want to analyze it, you can take a range by time with ZRANGE or ZREVRANGE, getting the scores by using WITHSCORES, and do some processing on the differences for detecting anomalies. There are many ways to do it, here's a link with a few pointers: https://stats.stackexchange.com/questions/152644/what-algorithm-should-i-use-to-detect-anomalies-on-time-series

Is there any option to use redis.expire more elastically?

I got a quick simple question,
Assume that if server receives 10 messages from user within 10 minutes, server sends a push email.
At first I thought it very simple using redis,
incr("foo"), expire("foo",60*10)
and in Java, handle the occurrence count like below
if(jedis.get("foo")>=10){sendEmail();jedis.del("foo");}
but imagine if user send one message at first minute and send 8 messages at 10th minute.
and the key expires, and user again send 3 messages in the next minute.
redis key will be created again with value 3 which will not trigger sendEmail() even though user send 11 messages in 2 minutes actually.
we're gonna use Redis and we don't want to put receive time values to redis.
is there any solution ?
So, there's 2 ways of solving this-- one to optimize on space and the other to optimize on speed (though really the speed difference should be marginal).
Optimizing for Space:
Keep up to 9 different counters; foo1 ... foo9. Basically, we'll keep one counter for each of the possible up to 9 different messages before we email the user, and let each one expire as it hits the 10 minute mark. This will work like a circular queue. Now do this (in Python for simplicity, assuming we have a connection to Redis called r):
new_created = False
for i in xrange(1,10):
var_name = 'foo%d' % i
if not (new_created or r.exists(var_name)):
r.set(var_name, 0)
r.expire(var_name, 600)
new_created = True
if not r.exists(var_name): continue
r.incr(var_name, 1)
if r.get(var_name) >= 10:
send_email(user)
r.del(var_name)
If you go with this approach, put the above logic in a Lua script instead of the example Python, and it should be quite fast. Since you'll at most be storing 9 counters per user, it'll also be quite space efficient.
Optimizing for speed:
Keep one Redis Sortet Set per user. Every time a user sends a message, add to his sorted set with a key equal to the timestamp and an arbitrary value. Then just do a ZCOUNT(now, now - 10 minutes) and send an email if that's greater than 10. Then ZREMRANGEBYSCORE(now - 10 minutes, inf). I know you said you didn't want to keep timestamps in Redis, but IMO this is a better solution, and you're going to have to hold some variant on timestamps somewhere no matter what.
Personally I'd go with the latter approach because the space differences are probably not that big, and the code can be done quickly in pure Redis, but up to you.

Counting/Querying events with multiple keys and dates in Redis

I'm trying to figure out how to handle my data structure within Redis. What I am trying to accomplish is how I can count events with two parameters, and then query Redis for that data by date. Here's an example: events come in with two different parameters, let's call them site and event type, and also with the time that event occurred. From there, I will need to be able to query Redis for how many events occurred over a span of dates, grouped together by site and event type.
Here's a brief example data set:
Oct 3, 2012:
site A / event A
site A / event A
Site B / event A
Oct 4, 2012:
site B / event B
site A / event A
Site B / event A
... and so on.
In my query I would like to know the total number of events over the date span, which will be a span of five weeks. In the example above, this would be something like:
site A / event A ==> 3 events
site B / event A ==> 2 events
site B / event B ==> 1 event
I have looked at using Redis' Sorted Set feature, Hashes, and so on. It seems Sorted Set is the best way to do it, but querying the data with Redis' ZUNIONSTORE command seems like not such a great fit because these events will span five weeks. That makes for at least 35 arguments to the ZUNIONSTORE command.
Any hints, thoughts, ideas, etc?
Thanks so much for your help.
Contrary to a typical RDBMS or MongoDB, Redis has no rich query language you can use. With such stores, you accumulate the raw data in the store, and then you can use a query to calculate statistics. Redis is not adapted to this model.
With Redis, you are supposed to calculate your statistics on-the-fly and store them directly instead of the raw data.
For instance, supposing we are only interested in statistics over a range of weeks, I would structure the data as follows:
because all the critera are discrete, simple hash objects can be used instead of zsets
one hash object per week
in each hash object, one counter per couple site,event. Optionally, one counter per site, and/or one counter per event.
So when an event occurs, I would pipeline the following commands to Redis:
hincrby W32 site_A:event_A 1
hincrby W32 site_A:* 1
hincrby W32 *:event_A 1
Please note there is no need to initialize those counters. HINCRBY will create them (and the hash object) if they do not exist.
To retrieve the statistics for one week:
hgetall W32
In the statistics, you have the counters per site/event, per site only, per event only.
To retrieve the statistics for several weeks, pipeline the following commands:
hgetall W32
hgetall W33
hgetall W34
hgetall W35
hgetall W36
and perform the aggregation on client-side (quite simple if the language supports associative arrays such as map, dictionary, etc ...).

How to calculate blocks of free time using start and end time?

I have a Ruby on Rails application that uses MySQL and I need to calculate blocks of free (available) time given a table that has rows of start and end datetimes. This needs to be done for a range of dates, so for example, I would need to look for which times are free between May 1 and May 7. I can query the table with the times that are NOT available and use that to remove periods of time between May 1 and May 7. Times in the database are stored at a fidelity of 15 minutes on the quarter hour, meaning all times end at 00, 15, 30 or 45 minutes. There is never a time like 11:16 or 10:01, so no rounding is necessary.
I've thought about creating a hash that has time represented in 15 minute increments and defaulting all of the values to "available" (1), then iterating over an ordered resultset of rows and flipping the values in the hash to 0 for the times that come back from the database. I'm not sure if this is the most efficient way of doing this, and I'm a little concerned about the memory utilization and computational intensity of that approach. This calculation won't happen all the time, but it needs to scale to happening at least a couple hundred times a day. It seems like I would also need to reprocess the entire hash to find the blocks of time that are free after this which seems pretty inefficient.
Any ideas on a better way to do this?
Thanks.
I've done this a couple of ways. First, my assumption is that your table shows appointments, and now you want to get a list of un-booked time, right?
So, the first way I did this was like yours, just a hash of unused times. It's slow and limited and a little wasteful, since I have to re-calculate the hash every time someone needs to know the times that are available.
The next way I did this was borrow an idea from the data warehouse people. I build an attribute table of all time slots that I'm interested in. If you build this kind of table, you may want to put more information in there besides the slot times. You may also include things like whether it's a weekend, which hour of the day it's in, whether it's during regular business hours, whether it's on a holiday, that sort of thing. Then, I have to do a join of all slots between my start and end times and my appointments are null. So, this is a LEFT JOIN, something like:
SELECT *
FROM slots
WHERE ...
LEFT JOIN appointments
WHERE appointments.id IS NULL
That keeps me from having to re-create the hash every time, and it's using the database to do the set operations, something the database is optimized to do.
Also, if you make your slots table a little rich, you can start doing all sorts of queries about not only the available slots you may be after, but also on the kinds of times that tend to get booked, or the kinds of times that tend to always be available, or other interesting questions you might want to answer some day. At the very least, you should keep track of the fields that tell you whether a slot should be one that is being filled or not (like for business hours).
Why not have a flag in the row that indicates this. As time is allocated, flip the flag for every date/time in the appropriate range. For example May 2, 12pm to 1pm, would be marked as not available.
Then it's a simple matter of querying the date range for every row that has the availability flagged set as true.