I'm using Redis implementation of HyperLogLog to count distinct values for given keys.
The keys are based on hour window. After the calendar hour changes, I want to reset the count of incoming values. I don't see any direct API for 'clearing' up the values through Jedis.
SET cannot be used here because it would corrupt the hash. Is there a way to correctly "reset" the count for a given key?
Use the DEL command to delete the key, which will effectively reset the count.
Related
I have incoming data which i have to aggregate for some time and when the key expires process the data.
I have tried using redis keyspace notifications but it only gives the key.
Is there a better way to handle this scenario ?
Instead of setting an expiry, aggregate the data into a list or set depending on your use case. Put a timestamp in the key itself. For example, if you want to aggregate data for 1 hour, your key can be mydata:2018-26-06-1300, mydata:2018-26-06-1400, mydata:2018-26-06-1500 and so on.
Then you simply run a cron job every hour, read all the values from the key, and delete the key when you are done.
I'm new to Redis and I want to use the following scheme:
key: EMPLOYEE_*ID*
value: *EMPLOYEE DATA*
I was thinking of adding a time stamp to the end of the key, but I'm not sure if that'll even help. Basically I want to be able to get a list of employees who are the most stale ie having been updated. What's the best way to accomplish this in Redis?
Keep another key with the data about employees (key names) and the update's timestamp - the best candidate for that is a Sorted Set. To maintain that key's data integrity, you'll have update it with pertinent changes whenever you update one the employees' keys.
With that data structure in place, you can easily get the keys names of the recently-updated employees with the ZRANGE command.
Have you tried to filter by expiration time? You could set the same expiration to all keys and update the expiration each time the key is updated. Then with a LUA script you could iterate through the keys and filter by expiration time. Those with smaller expiration time are those who are not updated.
This would work with some assumptions, it depends on how your system works. Also the approach is O(N) with respect to the number of employees. So if on one side you can save space, it will not scale well with the number of entries and the frequency of scan.
I don't want to use keys * command because it is O(N).
Is it possible to keep the newest Objects in redis?
Not using KEYS is definitely the way to go. Use a Sorted Set to store key names when you create them and set the score to the time of creation. You can the fetch key names by their creation time with ZRANGEBYSCORE and don't forget trimming the older keys from it using ZREMRANGEBYSCORE.
I need to have a list of recent visitors at website (authorized users that opened a page in last N minutes). To do it I implemented a code that handles all page calls and sends a pair (user_id, timestamp) to the storage. I don't want to update database table for it each time, so I want to use a cache for it. I can store python dictionary in cache as one object, fetch and update it, but it is not very efficient. I tried to look at Redis data structures, from one side Hash looks good (user_id -> timestamp) but looks like I can't use Redis efficiently to fetch all uids based on range of timestamp. So I'll need to fetch all keys and values, iterate on keys and check related values. Also looks like there is no command in Redis to remove multiple keys from hash. Is it possible to handle such data structure using Redis build-in structures? Thanks!
Instead of Hashes, look into using Sorted Sets.
Keep your ids as the set's members and use the timestamp (epoch) as score. Retrieve recent visitors based on timestamp with Z[REV]RANGEBYSCORE and delete old visitors with ZREMBYSCORE.
I need to insert 6925144 unique keys to redis, where each key contains hash of data.
I use the script in ruby that were published in the main page.
The whole insertion takes ~3 min DBSIZE after insertion is 1277553, but I expected it to be 6925144.
I am not sure that redis misses some records, may be calculation of DBSIZE is different for HASH, or may be 1277553 is just some natural limitation.
What is the best and the easiest way to check the consistence of the insertion.
If you are sure about the number of records there should be the diff in DBSIZE is most likely real.
Keep an eye on the logs when doing the insert.
Do you have duplicates in the dataset? Need of a merge-function?
Does the ID get correctly encoded into a unique string or 32/64-bit int?