Redis: How do I count the elements in a stream in a certain range?

Redis: How do I count the elements in a stream in a certain range? - redis

Bussiness Objective
I'm creating a dashboard that will depend on some time-series and I'll use Redis to implement it. I'm new to using Redis and I'm trying to use Redis-Streams to count the elements in a stream.
XADD conversation:9:chat_messages * id 2583 user_type Bot
XADD conversation:9:chat_messages * id 732016 user_type User
XADD conversation:9:chat_messages * id 732017 user_type Staff
XRANGE conversation:9:chat_messages - +
I'm aware that I can get the total count of the elements using the XLEN command like this:
XLEN conversation:9:chat_messages
but I want to also know the elements in a period, for example:
XLEN conversation:9:chat_messages 1579551316273 1579551321872
I know I can use LUA to count those elements but I want some REALLY fast way to achieve this and I know that using Redis markup will be the fastest way.
Is there any way to achieve this with a straight forward Redis command? Or do I have to write a Lua script to do this?
Additional information
I'm limited by AWS' ElastiCache to use the only Redis 5.0.6, I cannot install other modules such as the RedisTimeSeries module. I'd like to use that module but it's not possible at the moment.

While the Redis Stream data structure doesn't support this, you can use a Sorted Set alongside it for keeping track of message ranges.
Basically, for each message ID you get from XADD - e.g. "1579551316273-0" - you need to do a ZADD conversation:9:ids 0 1579551316273-0. Then, you can use ZLEXCOUNT to get the "length" of a range.

Sorry, there is no commands-way to achieve this.
Your best option with Redis Streams would be to use a Lua script. You will get O(N) with N being the number of elements being counted, instead of O(log N) if a command existed.
local T = redis.call('XRANGE', KEYS[1], ARGV[1], ARGV[2])
local count = 0
for _ in pairs(T) do count = count + 1 end
return count
Note the difference between O(N) and O(log(N)) is significant for a large N, but for a chat application, if tracked by conversation, this won't make that big of a difference if chats have hundreds or even thousands of entries, once you account total command time including Round Trip Time which takes most of the time. The Lua script above removes network-payload and client-processing time.
You can switch to sorted sets if you really want O(log N) and you don't need consumer groups and other stream features. See How to store in Redis sorted set with server-side timestamp as score? if you want to use Redis server timestamp atomically.
Then you can use ZCOUNT which is O(log(N)).
If you do need Stream features, then you would need to keep the sorted set as a secondary index.

Related

How do I get records in a Redis stream by position instead of ID?

With XREAD, I can get count from a specific ID (or first or last). With XRANGE, I can get a range from key to key (or from first - or to last +). Same in reverse with XREVRANGE.
How do I get by position? E.g. "give me the last 10 records in order"? XREVRANGE can do it with XREVRANGE stream + - COUNT 10, although it will be backwards in order, and XRANGE will give me the first 10 with XRANGE stream - + COUNT 10, but how do I:
get the last X in order?
get an arbitrary X in the middle, e.g. the 10 from position 100-109 (inclusive), when the stream has 5000 records in it?

Redis Streams currently (v6.0) do not provide an API for accessing their records via position/offset. The Stream's main data structure (a radix tree) can't provide this type of functionality efficiently.
This is doable with an additional index, for example a Sorted Set of the ids. A similar discussion is at https://groups.google.com/g/redis-db/c/bT00XGMq4DM/m/lWDzFa3zBQAJ
The real question is, as #sonus21 had asked, what's the use case.

It is an existing API that allows users to retrieve records from index
x to index y. It has a memory backend (just an array of records) and a
file one, want to put Redis in it as well.
We can just use Redis List to get the items randomly and at any offset using LRANGE
Attaching consumers to Redis LIST is easy but it does not come with an ack mechanism like a stream, you can build an ack mechanism in your code but that could be tricky. There could be many ways to build an ack mechanism in Redis, see one of them here Rqueue.
Ack mechanism is not always required but if you just want to consume from the LIST, you can use BLPOP/LPOP commands to consume elements from the list, that could be your simple consumer, but using BLPOP/LPOP commands would remove entries from the LIST so your offset would be dynamic, it will not always start from the 0. Instead of using BLPOP/LPOP, if you use LRANGE and track the offset in a simple key like my-consumer-offset, using this offset you can build your consumer that would always get the next element from the list based on the current offset.
Using LIST and STREAM to get random offset and streaming feature. Most important part would be you should use Consumer group so that streams are not trimmed. During push operation you should add element to STREAM as well as LIST. Once elements are inserted your consumer can work without any issue in a consumer group, for random offset you use LIST to get elements. Stream can grow as it'd be in append only mode, so you would need to periodically trim stream. See my other answer for deleting older entries

How to limit count of items in Redis sorted sets

In my case I upload a lot of records to Redis sorted set, but I need to store only 10 highest scored items. I don't have ability to influence on the data which is uploaded (to sort and to limit it before uploading).
Currently I just execute
ZREMRANGEBYRANK key 0 -11
after uploading finish, but such approach looks not very optimal (it's slow and it will be better if Redis could handle that).
So does Redis provide something out of the box to limit count of items in sorted sets?

Nopes, redis does not provide any such functionality apart from ZREMRANGEBYRANK .
There is a similar problem about keeping a redis list of constant size, say 10 elements only when elements are being pushed from left using LPUSH.
The solution lies in optimizing the pruning operation.
Truncate your sorted set, once a while, not everytime
Methods:
Run a ZREMRANGEBYRANK with 1/5 probability everytime, using a random integer.
Use redis pipeline or Lua scripting to achieve this , this would even save the two network calls happening at almost every 5th call.
This is optimal enough for the purpose mentioned.
Algorithm example:
ZADD key member1 score1
random_int = some random number between 1-5
if random_int == 1: # trim sorted set with 1/5 chance
ZREMRANGEBYRANK key 0 -11

Implementing bursts or spikes detection in a counter using Redis

I wanna use Redis to keep track of certain numbers. Basically, they're counters. Is there a way to use Redis to sort of track the rate at which these counters increase?
For example, let's say a counter is being incremented at a rate of 10 per minute for the most of the time but suddenly it's being incremented at a rate of 40 per minute. How can I detect that?

You cannot do that directly, but you can do that with a sorted set for example, with a bit of client side, or Lua based processing.
Let's say that you use a sorted set, for each time window you increment the value:
ZINCRBY mykey timestamp 1
Then you have a simple counter per timestamp.
When you want to analyze it, you can take a range by time with ZRANGE or ZREVRANGE, getting the scores by using WITHSCORES, and do some processing on the differences for detecting anomalies. There are many ways to do it, here's a link with a few pointers: https://stats.stackexchange.com/questions/152644/what-algorithm-should-i-use-to-detect-anomalies-on-time-series

What Redis data type fit the most for following example

I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently

A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.

Redis: fan out news feeds in list or sorted set?

I'm caching fan-out news feeds with Redis in the following way:
each feed activity is a key/value, like activity:id where the value is a JSON string of the data.
each news feed is currently a list, the key is feed:user:user_id and the list contains the keys of the relevant activities.
to retrieve a news feed I use for example: 'sort feed:user:user_id by nosort get * limit 0 40'
I'm considering changing the feed to a sorted set where the score is the activity's timestamp, this way the feed is always sorted by time.
I read http://arindam.quora.com/Redis-sorted-sets-and-lists-Pertaining-to-Newsfeed which recommend using lists because of the time complexity of sorted sets, but by keep using lists I have to take care of the insert order,
inserting a past story requires to iterate through the list and finding the right index to push to. (which can cause new problems in distributed environments).
should I keep using lists or go for sorted sets?
is there a way to retrieve the news feed instantly from a sorted set, (like with the sort ... get * command for a list) or does it have to be zrange and then iterating through the results and getting each value?

Yes, sorted sets are very fast and powerful. They seem a much better match for your requirements than SORT operations. The time complexity is often misunderstood. O(log(N)) is very fast, and scales just fine. We use it for tens of millions of members in one sorted set. Retrieval and insertion is sub-millisecond.
Use ZRANGEBYSCORE key min max WITHSCORES [LIMIT offset count] to get your results.
Depending on how you store the timestamps as 'scores', ZREVRANGEBYSCORE might be better.
A small remark about the timestamps: Sorted set SCORES which don't need a decimal part should be using 15 digits or less. So the SCORE has to stay in the range -999999999999999 to 999999999999999. Note: These limits exist because Redis server actually stores the score (float) as a redis-string representation internally.
I therefore recommend this format, converted to Zulu Time: -20140313122802 for second-precision. You may add 1 digit for 100ms-precision, but no more if you want no loss in precision. It's still a float64 by the way, so loss of precision could be fine in some scenarios, but your case fits in the 'perfect precision' range, so that's what I recommend.
If your data expires within 10 years, you can also skip the three first digits (CCY of CCYY), to achieve .0001 second precision.
I suggest negative scores here, so you can use the simpler ZRANGEBYSCORE instead of the REV one. You can use -inf as the start score (minus infinity) and LIMIT 0 100 to get the top 100 results.
Two sorted set members (or 'keys' but that's ambiguous since the sorted set is also a key in itself) may share a score, that's no problem, the results within an identical score are alphabetical.
Hope this helps, TW
Edit after chat
The OP wanted to collect data (using a ZSET) from different keys (GET/SET or HGET/HSET keys). JOIN can do that for you, ZRANGEBYSCORE can't.
The preferred way of doing this, is a simple Lua script. The Lua script is executed on the server. In the example below I use EVAL for simplicity, in production you would use SCRIPT EXISTS, SCRIPT LOAD and EVALSHA. Most client libraries have some bookkeeping logic built-in, so you don't upload the script each time.
Here's an example.lua:
local r={}
local zkey=KEYS[1]
local a=redis.call('zrangebyscore', zkey, KEYS[2], KEYS[3], 'withscores', 'limit', 0, KEYS[4])
for i=1,#a,2 do
r[i]=a[i+1]
r[i+1]=redis.call('get', a[i])
end
return r
You use it like this (raw example, not coded for performance):
redis-cli -p 14322 set activity:1 act1JSON
redis-cli -p 14322 set activity:2 act2JSON
redis-cli -p 14322 zadd feed 1 activity:1
redis-cli -p 14322 zadd feed 2 activity:2
redis-cli -p 14322 eval '$(cat example.lua)' 4 feed '-inf' '+inf' 100
Result:
1) "1"
2) "act1JSON"
3) "2"
4) "act2JSON"

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas