How to keep the objects generated in recent 5 minutes using redis? - redis

I don't want to use keys * command because it is O(N).
Is it possible to keep the newest Objects in redis?

Not using KEYS is definitely the way to go. Use a Sorted Set to store key names when you create them and set the score to the time of creation. You can the fetch key names by their creation time with ZRANGEBYSCORE and don't forget trimming the older keys from it using ZREMRANGEBYSCORE.

Related

Use set or just create keys in redis to check existence?

I can think of two ways of checking existence using redis:
Use the whole database as a 'set', and just SET a key and checking existence by GETing it (or using EXISTS as mentioned in the comment by #Sergio Tulentsev)
Use SADD to add all members to a key and check existence by SISMEMBER
Which one is better? Will it be a problem, compared to the same amount of keys in a single set, if I choose the first method and the number of keys in a database gets larger?
In fact, besides these two methods, you can also use the HASH data structure with HEXISTS command (I'll call this method as the third solution).
All these solutions are fast enough, and it's NOT a problem if you have a large SET, HASH, or keyspace.
So, which one should we use? It depends on lots of things...
Does the key has value?
Keys of both the first and the third solution can have value, while the second solution CANNOT.
So if there's no value for each key, I'd prefer the second solution, i.e. SET solution. Otherwise, you have to use the first or third solution.
Does the value has structure?
If the value is NOT raw string, but a data structure, e.g. LIST, SET. You have to use the first solution, since HASH's value CAN only be raw string.
Do you need to do set operations?
If you need to do intersection, union or diff operations on multiple data sets, you should use the second solution. Redis has built-in commands for these operations, although they might be slow commands.
Memory efficiency consideration
Redis takes more memory-efficient encoding for small SET and HASH. So when you have lots of small data sets, take the second and the third solution can save lots of memory. See this for details.
UPDATE
Do you need to set TTL for these keys?
As #dizzyf points out in the comment, if you need to set TTL for these keys, you have to use the first solution. Because items of HASH and SET DO NOT have expiration property. You can only set TTL for the entire HASH or SET, NOT their elements.

Find all keys expiring within next X hours

Is there a way to fetch all keys who are about to expire within the next X hours?
I see that the scan method only seem to pattern match, and I can't seem to find any other commands which lets me do this.
Redis does not provide this capability (yet). You can, however, keep a Sorted Set where the elements are the key names and the scores are their expiry timestamp - this will allow you to query (ZRANGEBYSCORE) as you wish, at the price of maintaining that data structure.
AFAIK not possible without a full scan of keys. There is no command or group of commands which can provide that information.
KEYS combined with TTL or PTTL may be the only option, but requires full scan. Redis pipeline will improve the performance.

How to clear the values of a key in Redis HyperLogLog

I'm using Redis implementation of HyperLogLog to count distinct values for given keys.
The keys are based on hour window. After the calendar hour changes, I want to reset the count of incoming values. I don't see any direct API for 'clearing' up the values through Jedis.
SET cannot be used here because it would corrupt the hash. Is there a way to correctly "reset" the count for a given key?
Use the DEL command to delete the key, which will effectively reset the count.

Finding Redis data by last update

I'm new to Redis and I want to use the following scheme:
key: EMPLOYEE_*ID*
value: *EMPLOYEE DATA*
I was thinking of adding a time stamp to the end of the key, but I'm not sure if that'll even help. Basically I want to be able to get a list of employees who are the most stale ie having been updated. What's the best way to accomplish this in Redis?
Keep another key with the data about employees (key names) and the update's timestamp - the best candidate for that is a Sorted Set. To maintain that key's data integrity, you'll have update it with pertinent changes whenever you update one the employees' keys.
With that data structure in place, you can easily get the keys names of the recently-updated employees with the ZRANGE command.
Have you tried to filter by expiration time? You could set the same expiration to all keys and update the expiration each time the key is updated. Then with a LUA script you could iterate through the keys and filter by expiration time. Those with smaller expiration time are those who are not updated.
This would work with some assumptions, it depends on how your system works. Also the approach is O(N) with respect to the number of employees. So if on one side you can save space, it will not scale well with the number of entries and the frequency of scan.

Using Redis to get recent visitors at website

I need to have a list of recent visitors at website (authorized users that opened a page in last N minutes). To do it I implemented a code that handles all page calls and sends a pair (user_id, timestamp) to the storage. I don't want to update database table for it each time, so I want to use a cache for it. I can store python dictionary in cache as one object, fetch and update it, but it is not very efficient. I tried to look at Redis data structures, from one side Hash looks good (user_id -> timestamp) but looks like I can't use Redis efficiently to fetch all uids based on range of timestamp. So I'll need to fetch all keys and values, iterate on keys and check related values. Also looks like there is no command in Redis to remove multiple keys from hash. Is it possible to handle such data structure using Redis build-in structures? Thanks!
Instead of Hashes, look into using Sorted Sets.
Keep your ids as the set's members and use the timestamp (epoch) as score. Retrieve recent visitors based on timestamp with Z[REV]RANGEBYSCORE and delete old visitors with ZREMBYSCORE.