How to set TTL on Rocks DB properly?

How to set TTL on Rocks DB properly? - ttl

I am trying to use Rocks DB with TTL. The way I initialise rocks db is as below:
options.setCreateIfMissing(true).setWriteBufferSize(8 * SizeUnit.KB)
.setMaxWriteBufferNumber(3) .setCompressionType(CompressionType.LZ4_COMPRESSION).setKeepLogFileNum(1);
db = TtlDB.open(options, this.dbpath, 10, false);
I have set TTL to 10 seconds. But, the key value pairs are not being deleted after 10 seconds. Whats happening here?

That's by design:
This API should be used to open the db when key-values inserted are meant to be removed from the db in a non-strict 'ttl' amount of time therefore, this guarantees that key-values inserted will remain in the db for at least ttl amount of time and the db will make efforts to remove the key-values as soon as possible after ttl seconds of their insertion
-- from the RocksDB Wiki-page on TTL.
That means values are only removed during compaction, and staleness is not checked during reads.
One of the good things about RocksDB is that their source is quite readable. The files you would want to look at are the header and source for TtlDb. In the header you will find the compaction which removes stale values (the compaction's Filter-contract is documented well in its header). In the TtlDb source you verify for yourself that Get does not do any checks whether or not the value is stale. It just strips the timestamp (which just gets appended to the value on insert).

Related

Redis Snapshot-Configuration: How are multiple changes on the same key counted?

in Redis you can configure the creation of snapshots, e.g. "save 60 10" would save the database after 60 seconds if at least 10 keys were changed.
If the SAME key was changed 10 times, would a snapshot be saved? Or does this refer to 10 unique/different keys that have to be changed?
Thank you!

The documented config doesn't say anything about "if at least 10 keys were changed". It says the snapshot will happen if "the given number of write operations against the DB occurred". Simple commands like SET and DEL count as one write operation. More complicated commands like HMSET and ZINTERSTORE might count as more than one write operation depending on the number of values they affect. Nothing takes into account the number of unique keys that were written to since the last snapshot.

REDIS usecase using large keys with small values

I have a use-case for using redis that is a little bit different.
In my MySQL I have an entity, let's call it HumanEntity. this HumanEntity has many to many relations.
HumanEntity.Urls - Many URLs per HumanEntity.
HumanEntity.UserNames - Many UserNames per HumanEntity.
HumanEntity.Phones ...
HumanEntity.Emails ...
in a normal one hour, the application creates hundreds of these many values.
The use-case is that, the application receives an HTTP call (100 per one second) with a HumanEntity value (Url or UserName or Phone or Email).
I need to scan my MySQL (1,000,000 records) and return back the HumanEntity.Id(integer) .
Since its ok to have some latency in the data integrity I thought about REDIS.
Can I store the values as a Redis key and the and the HumanEntity.Id(integer) as the value.
My API needs to return back the HumanEntity.Id(integer).
does it make sense to have such long key and such short value? The URL, for example, maybe 1500 bytes and the value can be 1 byte.
What is the best redis method to implement that?
Thanks

If the values are not unique then you may have some problem. Phones, emails or usernames maybe unique for user but i am not sure about url or any other property stored in your database. You may overwrite the value of an identifier with another user's.
If you don't have any problem like that; you may proceed with string types, Time complexity of GET and SET is O(1) - that's the best you may get.
In some cases such as checking whether the user used any coupon, you may use long(let's say 64 chars) user id as key, and 1 as value and use EXISTS to determine it. So it's valid to use long key and short value.

Redis: clean up members of ZSET

I'm currently studying Redis, and have the following case:
So what I have is a sorted set by google place id, for which all posts are sorted from recent to older.
The first page that is requested fetches posts < current timestamp.
When a cursor is sent to the backend, this cursor is a simple timestamp that indicates from where to fetch the next posts from the ZSET.
The query to retrieve posts by place id would be:
ZREVRANGEBYSCORE <gplaceId> <cur_timestamp> -INF WITHSCORES LIMIT <offset:timestamp as from where to fetch> <count:number of posts>
My question is what is the recommended way to clean up members of the ZSET.
Since I want to use Redis as a cache, I would prefer to limit the number of posts per place for example up until 50. When places get new posts when already 50 posts have been added to the set, I want to drop the last post from the set.
Of course I realise that I could make a manual check on every insert and perform operations as such, but I wonder if Redis has a recommended way of performing this cleanup. Alternatively I could create a scheduler for this, but I would prefer not having to do this.

Unfortunately Redis sorted set do not come with an out of the box feature for this. If sorted sets allowed a max size attribute with a configurable eviction strategy - you could have avoided some extra work.
See this related question:
How to specify Redis Sorted Set a fixed size?
In absence of such a feature, the two approaches you mentioned are right.
You can replace inserts with a transaction : insert, check size, delete if > 50
A thread that checks size of the set and trims it periodically

What Redis data type fit the most for following example

I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently

A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.

how to get next 1000 records the fastest way

I'm using Azure Table Storage.
Let's say i have a Partition in my Table with 10,000 records, and I would like to get records number 1000 to 1999. And next time i would like to get records number 4000 to 4999 etc.
What is the fastest way of doing that?
All I can find till now are two options, which I don't like very much:
1. run a query which returns all 10,000 records, and filter out what I want when I get all 10,000 records.
2. Run a query whichs returns 1000 records at a time, and use a continuation token to get the next 1000 records.
Is it possible to get a continuation token without downloading all corresponding records? It would be great if i can get Continuation Token 1, than get Continuation token 2, and with CT2 get records 2000 to 2999.

Theoretically you should be able to use continuation tokens without downloading the actual data for the first 1000 recors by closing the connection you have after the first request. And I mean closing it at TCP level. And before you read all data. Then open a new connection and use continuation token there. Two WebRequests will not do it since the HTTP implementation will likely use keep alive wchich means all your data is going to be read in the background even though you don't read it in your code. Actually you can configure your HTTP requests to not use keep alive.
However, another way is naturally if you know the RowKey and can search on that but I assume you don't know which row keys will be in each 1000 entity batch.
Last I would ask why you have this problem in the first place. And what your access pattern is. If inserts are common and getting these records is rare I wouldn't bother making it more efficient. if this is like a paging problem i would probably get all data on the first request and cache it (in the cloud). if inserts are rare but you need to run this query often I would consider making the insertion of data have one partion for every 1000 entities and rebalance as needed (due to sorting) as entities are inserted.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas