I have a pipeline like this:
env.addSource(kafkaConsumer)
.keyBy { value -> value.f0 }
.window(EventTimeSessionWindows.withGap(Time.minutes(2)))
.reduce(::reduceRecord)
.addSink(kafkaProducer)
I want to expire keyed data with a TTL.
Some blog posts point that I need a ValueStateDescriptor for that.
I made one like this:
val desc = ValueStateDescriptor("val state", MyKey::class.java)
desc.enableTimeToLive(ttlConfig)
But how do I actually apply this descriptor to my pipeline so it will actually do the TTL expiry?
The pipeline you've described doesn't use any keyed state that would benefit from setting state TTL. The only keyed state in your pipeline is the contents of the session windows, and that state is being purged as soon as possible -- as the sessions close. (Furthermore, since you are using a reduce function, that state consists of just one value per key.)
For the most part, expiring state is only relevant for state you explicitly create, in which case you will have ready access to the state descriptor and can configure it to use State TTL. Flink SQL does create state on your behalf that might not automatically expire, in which case you will need to use Idle State Retention Time to configure it. The CEP library also creates state on your behalf, and in this case you should ensure that your patterns either eventually match or timeout.
Related
I am playing with redis stream and it is good so far.
I am trying to understand if there is anyway for me to expire the old events based on time or some other way.
I know that we can remove by event id. But I do not want to remember / store the event id which is difficult. Instead I am looking for a way remove the last 10K events or something like that.
This is possible as of Redis 6.2.
If you use the default event IDs (by passing * as an ID to XADD) they will begin with the UNIX timestamp of when the event was inserted, followed by a dash.
Then you can use XTRIM $stream_name MINID $timestamp to remove all events with an ID lower than '$timestamp', which is equivalent to all events older than the timestamp.
So far, there's no way to expire events by time. Instead, the only expire strategy is to expire events by keeping the latest N events. You can use the XTRIM command to evict old events.
Should i do that very time? Can stream be configured to retain the last N events ?
If you want to always keep the latest N events, you can call XADD command with MAXLEN option to get a capped stream. Also with ~ option, you can have better performance, but inaccurately expire events. Check the doc for detail.
UPDATE
Since Redis 6.2, XTRIM supports a new trimming strategy: MINID. With this strategy, Redis will evict entries whose ids are lower than the given threshold.
So if you use timestamp as entry id, e.g. the default, auto-generated id use Unix timestamp (in milliseconds) as part of the id, you can use this strategy to expire events based on time, i.e. remove events older than the given timestamp.
I have a use case where I'm streaming and processing live data into an Elasticache Redis cluster. In essence, I want to kick off an event when all events of a certain type have completed (i.e. the size of a value is no longer growing over the course of 60 seconds).
For example:
foo [event1]
foo [event1, event2]
foo [event1, event2]
foo [event1, event2] -> triggers some event if this key/value is constant for 60 seconds.
Is this at all possible?
I would suggest that as part of all "changing" commands also set a key with a 60-second ttl. You can then subscribe to the expiration of that key using redis keyspace notifications.
I am currently working on project with infinispan 8.1.3. I want to make sure that the node who created object must be owner of that entry all the time in distribution mode .Is there any option to meet my requirement??. I heard the flag LOCAL_MODE.but, it stores entry in local only .I dont know if that node down, local cahe entry will be shared to another node??. thanks
Don't use flags unless you exactly know what you're doing. Flag.CACHE_MODE_LOCAL means that you won't execute any RPC when doing that operation, but in case that the key does not route to this node, a write will result in a noop and read will return null.
It's not possible to tie the entry to the node exclusively - what would you do if this node crashes?
However, if the cluster is stable enough, there's the Key Affinity Service that will give you a key that belongs to this node. See next chapter about grouping, too, it might fit your use case.
EDIT: Instead moving data to the executing node, you can move the execution towards the data. With Grouping API you can find the data by the group, using
Address owningNode = cache.getAdvancedCache().getDistributionManager()
.getCacheTopology().getDistributionInfo(group).primary();
ClusterExecutor executor = cache.getCacheManager().executor()
.filterTargets(Collections.singleton(owningNode));
executor.submit(...)
I have multiple writers overwriting the same key in redis. How do I guarantee that only the chosen one write last?
Can I perform write synchronisation in Redis withour synchronise the writers first?
Background:
In my system a unique dispatcher send works to do to various workers. Each worker then write the result in Redis overwrite the same key. I need to be sure that only the last worker that receive work from the dispatcher writes in Redis.
Use an ordered set (ZSET): add your entry with a score equal to the unix timestamp, then delete all but the top rank.
A Redis Ordered set is a set, where each entry also has a score. The set is ordered according to the score, and the position of an element in the ordered set is called Rank.
In order:
Remove all the entries with score equal or less then the one you are adding(zremrangebyscore). Since you are adding to a set, in case your value is duplicate your new entry would be ignored, you want instead to keep the entry with highest rank.
Add your value to the zset (zadd)
delete by rank all the entries but the one with HIGHEST rank (zremrangebyrank)
You should do it inside a transaction (pipeline)
Example in python:
# timestamp contains the time when the dispatcher sent a message to this worker
key = "key_zset:%s"%id
pipeline = self._redis_connection.db.pipeline(transaction=True)
pipeline.zremrangebyscore(key, 0, t) # Avoid duplicate Scores and identical data
pipeline.zadd(key, t, "value")
pipeline.zremrangebyrank(key, 0, -2)
pipeline.execute(raise_on_error=True)
If I were you, I would use redlock.
Before you write to that key, you acquire the lock for it, then update it and then release the lock.
I use Node.js so it would look something like this, not actually correct code but you get the idea.
Promise.all(startPromises)
.bind(this)
.then(acquireLock)
.then(withLock)
.then(releaseLock)
.catch(handleErr)
function acquireLock(key) {
return redis.rl.lock(`locks:${key}`, 3000)
}
function withLock(lock) {
this.lock = lock
// do stuff here after get the lock
}
function releaseLock() {
this.lock.unlock()
}
You can use redis pipeline with Transaction.
Redis is single threaded server. Server will execute commands syncronously. When Pipeline with transaction is used, server will execute all commands in pipeline atomically.
Transactions
MULTI, EXEC, DISCARD and WATCH are the foundation of transactions in Redis. They allow the execution of a group of commands in a single step, with two important guarantees:
All the commands in a transaction are serialized and executed sequentially. It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the commands are executed as a single isolated operation.
A simple example in python
with redis_client.pipeline(transaction=True) as pipe:
val = int(pipe.get("mykey"))
val = val*val%10
pipe.set("mykey",val)
pipe.execute()
127.0.0.1:6379> keys *
1) "trending_showrooms"
2) "trending_hashtags"
3) "trending_mints"
127.0.0.1:6379> sort trending_mints by *->id DESC LIMIT 0 12
1) "mint_14216"
2) "mint_14159"
3) "mint_14158"
4) "mint_14153"
5) "mint_14151"
6) "mint_14146"
The keys are expired but the keys are inside set. I have to remove the expire keys automatically in redis
You can't set a TTL on individual members within the SET.
This blog post dives a bit deeper on the issue and provides a few workarounds.
https://quickleft.com/blog/how-to-create-and-expire-list-items-in-redis/
Hope that helps.
Please ready this page entirely: https://redis.io/topics/notifications
Summing up, you must have a sentinel program listening to PUB/SUB messages, and you must alter the redis.conf file to enable keyevent expire notifications:
in redis.conf:
notify-keyspace-events Ex
In order to enable the feature a non-empty string is used, composed of
multiple characters, where every character has a special meaning
according to the following table
E Keyevent events, published with __keyevent#<db>__ prefix.
x Expired events (events generated every time a key expires)
Then the sentinel program must listen to the channel __keyevent#0__:del, if your database is 0. Change the database number if using any other than zero.
Then when you subscribe to the channel and receive the key which is expiring, you simply issue a SREM trending_mints key to remove it from the set.
IMPORTANT
The expired events are generated when a key is accessed and is found
to be expired by one of the above systems, as a result there are no
guarantees that the Redis server will be able to generate the expired
event at the time the key time to live reaches the value of zero.
If no command targets the key constantly, and there are many keys with
a TTL associated, there can be a significant delay between the time
the key time to live drops to zero, and the time the expired event is
generated.
Basically expired events are generated when the Redis server deletes
the key and not when the time to live theoretically reaches the value
of zero.
So keys will be deleted due to expiration, but the notification is not guaranteed to occur in the moment TTL reaches zero.
ALSO, if your sentinel program misses the PUB/SUB message, well... that's it, you won't be notified another time! (this is also on the link above)