Redis Timeseries purge data by label - redis

I want to cache timeseries data stored in mysql.
For cache stategy I use next logic: split all data to a blocks by one hour, then when user request data I try to range from it from redis, then for all blocks that do not contain in result load a block from mysql db and push it to redis timeseries, if no data exist by this block - push single empty value at the end of block.
Problem: My algorithm implies that if at least one value selected for block therefore all block were chached. The problem occurs when redis reached memory limit and purge data.
Question: Can I specify purge strategy that guarantee if one value of block were purged then all data from block were purged?
I didn't find any solution.

Related

How to know the configured size of the Chronicle Map?

We use Chronicle Map as a persisted storage. As we have new data arriving all the time, we continue to put new data into the map. Thus we cannot predict the correct value for net.openhft.chronicle.map.ChronicleMapBuilder#entries(long). Chronicle 3 will not break when we put more data than expected, but will degrade performance. So we would like to recreate this map with new configuration from time to time.
Now it the real question: given a Chronicle Map file, how can we know which configuration was used for that file? Then we can compare it with actual amount of data (source of this knowledge is irrelevant here) and recreate a map if needed.
entries() is a high-level config, that is not stored internally. What is stored internally is the number of segments, expected number of entries per segment, and the number of "chunks" allocated in the segment's entry space. They are configured via ChronicleMapBuilder.actualSegments(), entriesPerSegment() and actualChunksPerSegmentTier() respectively. However, there is no way at the moment to query the last two numbers from the created ChronicleMap, so it doesn't help much. (You can query the number of segments via ChronicleMap.segments().)
You can contribute to Chronicle-Map by adding getters to ChronicleMap to expose those configurations. Or, you need to store the number of entries separately, e. g. in a file along with the ChronicleMap persisted file.

Best practice to store List<T> in StackExchange.Redis

I am trying to find best practice(efficient) way of storing set of List objects against ReportingDate key.
List could be serailised as Xml/DataContract or ProtoBuf....
And given some of the data could be big (for that slice of key):
I was wondering if there is any of getting data from redis cache in IEnum/streamed fashion? Atm we using ProtoBuf.NET to have file based cache. And we retrieve data into mem in streamed fashion (we also have an option of selecting what props/fields we want in that T object as ProtoBuf allows us to do it)
Is there any way can force (after some inactivity) certain part of the data to be offloaded from mem and back into file if it is not being used. But load it up again if it is called
Tnx
It sounds like you want a sorted set - see https://redis.io/topics/data-types#sorted-sets. You would use the date as the value, perhaps in epoch time (since it needs to be a number). SE.Redis supports all the operations you would expect to get ranges of values (either positional ranges - the first 20 records, etc; or absolute ranges bases on the value - all items between two dates expressed in the same unit). Look at the methods starting " SortedSet...".
The value can be binary, so protobuf-net is fine (you would serialize the value for each date separately). Just pass a byte[] as the value. You need to handle serialization separately to the redis library.
As for swapping data out: no. Redis has date-based expiration, but doesn't have hot and cold storage. It is either there, or it isn't. You could use scheduled tasks to purge or move data based on date ranges, again using any of the Z* (redis) or SortedSet* (SE.Redis) methods.
For the complete list of Z* operations, see: https://redis.io/commands#sorted_set. They should all be available in SE.Redis.

How to synchronise multiple writer on redis?

I have multiple writers overwriting the same key in redis. How do I guarantee that only the chosen one write last?
Can I perform write synchronisation in Redis withour synchronise the writers first?
Background:
In my system a unique dispatcher send works to do to various workers. Each worker then write the result in Redis overwrite the same key. I need to be sure that only the last worker that receive work from the dispatcher writes in Redis. 
Use an ordered set (ZSET): add your entry with a score equal to the unix timestamp, then delete all but the top rank.
A Redis Ordered set is a set, where each entry also has a score. The set is ordered according to the score, and the position of an element in the ordered set is called Rank.
In order:
Remove all the entries with score equal or less then the one you are adding(zremrangebyscore). Since you are adding to a set, in case your value is duplicate your new entry would be ignored, you want instead to keep the entry with highest rank. 
Add your value to the zset (zadd)
delete by rank all the entries but the one with HIGHEST rank (zremrangebyrank)
You should do it inside a transaction (pipeline)
Example in python:
# timestamp contains the time when the dispatcher sent a message to this worker
key = "key_zset:%s"%id
pipeline = self._redis_connection.db.pipeline(transaction=True)
pipeline.zremrangebyscore(key, 0, t)  # Avoid duplicate Scores and identical data
pipeline.zadd(key, t, "value")
pipeline.zremrangebyrank(key, 0, -2)
pipeline.execute(raise_on_error=True)
If I were you, I would use redlock.
Before you write to that key, you acquire the lock for it, then update it and then release the lock.
I use Node.js so it would look something like this, not actually correct code but you get the idea.
Promise.all(startPromises)
.bind(this)
.then(acquireLock)
.then(withLock)
.then(releaseLock)
.catch(handleErr)
function acquireLock(key) {
return redis.rl.lock(`locks:${key}`, 3000)
}
function withLock(lock) {
this.lock = lock
// do stuff here after get the lock
}
function releaseLock() {
this.lock.unlock()
}
You can use redis pipeline with Transaction.
Redis is single threaded server. Server will execute commands syncronously. When Pipeline with transaction is used, server will execute all commands in pipeline atomically.
Transactions
MULTI, EXEC, DISCARD and WATCH are the foundation of transactions in Redis. They allow the execution of a group of commands in a single step, with two important guarantees:
All the commands in a transaction are serialized and executed sequentially. It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the commands are executed as a single isolated operation.
A simple example in python
with redis_client.pipeline(transaction=True) as pipe:
val = int(pipe.get("mykey"))
val = val*val%10
pipe.set("mykey",val)
pipe.execute()

What Redis data type fit the most for following example

I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently
A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.

Target Based commit point while updating into table

One of my mappings is running for a really long time (2 hours).From the session log i can see the statment "Time out based commit poin" which is tking most of the time and Busy percentage for the SQL tranfsormation is very high(which is taking time,I ran the SQL query manually in DB,its working fine ).So, basically there is a router which splits the record between insert and update.And the update stream is taking long.It has a SQL transforamtion,Update statrtergy and aggregator.I added an sorter before aggregator but no luck.
Also changed comit interval ,Lins Sequential Buffer lenght and Maximum memory allowed by checking some of the other blogs.Could you please help me with this.
If possible try to avoid the transformations which are creating cache because in future if the input records increase. Cache size will also increase and decrease the throughput
1) Aggregator : Try to use the Aggregation in SQL override itself
2) Sorter : Try to do the same in the SQL Override itself
Generally SQL transformation is slow for huge data loads, because for each input record an SQL session is invoked and a connection is established to database and the row is fetched. Say for example there are 1 million records, 1 million SQL sessions are initiated in the backend and the database is called.
What the SQL transformation doing ? Is it just generating a Surrogate key or its fetching a value from a table based on derived value from the stream
For fetching a value from a table based on derived value from the stream:
Try to use lookup
For generating Surrogate key, Use Oracle Sequence instead
Let me know if its purpose is any thing other than that
Also do the below checks
Sort the session log on thread and just make a note of start and end times of
the following
1) lookup caches creation (time between Query issued --> First row returned --> Cache creation completed)
2) Reader thread first row return time
Regards,
Raj