Side effects of Spring Data Redis partial updates on key expiration - redis

I'm using Redis 5.0 with Spring Data Redis 1.8.7 and Jedis connector. I'm setting a TTL for my keys and my use case requires me to update properties of the objects I'm persisting without renewing the TTL, so I'm using partial updates to achieve that.
I've noticed however that when I do a partial update the phantom copy created by Spring Data because of the TTL lags behind and does not include the updates. Another difference in behavior I've seen is that even though the phantom copy goes away when my key expires, it is not deleted when I manually delete a key if I've done any updates but it is deleted when no updates have been performed.
I'm not using key expiration notifications or secondary indexes at the moment, but I might in the future and I wonder if this is expected behavior when combining these two features and what side effects can occur because of their combined use.
Thanks!

Related

Auto Syncing for Keys in Apache Geode

I have an Apache Geode setup, connected with external Postgres datasource. I have a scenario where I define an expiration time for a key. Let's say after T time the key is going to expire. Is there a way so that the keys which are going to expire can make a call to an external datasource and update the value incase the value has been changed? I want a kind of automatic syncing for my keys which are there in Apache Geode. Is there any interface which i can implement and get the desired behavior?
I am not sure I fully understand your question. Are you saying that the values in the cache may possibly be more recent than what is currently stored in the database?
Whether you are using Look-Aside Caching, Inline Caching, or even Near Caching, Apache Geode combined with Spring would take care of ensuring the cache and database are kept in sync, to some extent depending on the caching pattern.
With Look-Aside Caching, if used properly, the database (i.e. primary System of Record (SOR), e.g. Postgres in your case) should always be the most current. (Look-Aside) Caching is secondary.
With Synchronous Inline Caching (using a CacheLoader/CacheWriter combination for Read/Write-Through) and in particular, with emphasis on CacheWriter, during updates (e.g. Region.put(key, value) cache operations), the DB is written to first, before the entry is stored (or overwritten) in the cache. If the DB write fails, then the cache entry is not written or updated. This is true each time a value for a key is updated. If the key has not be updated recently, then the database should reflect the most recent value. Once again, the database should always be the most current.
With Asynchronous Inline Caching (using AEQ + Listener, for Write-Behind), the updates for a cache entry are queued and asynchronously written to the DB. If an entry is updated, then Geode can guarantee that the value is eventually written to the underlying DB regardless of whether the key expires at some time later or not. You can persist and replay the queue in case of system failures, conflate events, and so on. In this case, the cache and DB are eventually consistent and it is assumed that you are aware of this, and this is acceptable for your application use case.
Of course, all of these caching patterns and scenarios I described above assume nothing else is modifying the SOR/database. If another external application or process is also modifying the database, separate from your Geode-based application, then it would be possible for Geode to become out-of-sync with the database and you would need to take steps to identify this situation. This is rather an issue for reads, not writes. Of course, you further need to make sure that stale cache entries does not subsequently overwrite the database on an update. This is easy enough to handle with optimistic locking. You could even trigger a cache entry remove on an DB update failure to have the cache refreshed on read.
Anyway, all of this is to say, if you applied 1 of the caching patterns above correctly, the value in the cache should already be reflected in the DB (or will be in the Async, Write-Behind Caching UC), even if the entry eventually expires.
Make sense?

Can i have redis master in one microservice domain and redis slave used by a different microservice as a way of sharing data?

I have three microservice that needs to communicate between each other. Microservice-1 is incharge of the data and the database(he writes and read to it). I will add a redis cache store for Microservice-1 to cache data there. I want to put a redis-slave for the other 2 microservices to reduce communication with the actual microservice, if the data is already in the cache store. Since all updates to the data, has to go thru the Microservice-1 and he will always update the cache, redis replication will make sure the other two microservices will get it too. Ofcourse, if the data is not in cache, it will call the Microservice-1 for the data, which will update the cache.
Am i missing something, with this approach ?
This will definitely work in the "sunny day" case.
But sometimes there are storms, and in storms there's a chance of losing cache coherency (i.e. the DB and Redis disagree on the data).
For example, lets say that you have Microservice-1 update the DB and then update Redis. What happens if there's a crash between updating the DB and updating Redis?
On the other hand, what if you reverse the ordering (update Redis and then the DB)? Now Redis could be updated and not the DB.
Neither of these in insurmountable, but absent a means of having a transaction which ensures that 0 or 2 of Redis and the DB are updated, there will always be a time window where the change is in one but not the other. In that situation, it's probably worth embracing eventual consistency (e.g. periodically scan the DB and update redis with recently updated records).
As an elaboration on that, a Command Query Responsibility Segregation with Event Sourcing (CQRS/ES) approach may prove useful: Microservice-1 gets split into two services, one which takes commands (requests to update) and another which handles queries. Instead of updating a row in a DB, the command service now appends (in a typical DB, an INSERT) an event which describes what changed. The query service can subscribe to those events and update Redis. Other microservices can also subscribe to the stream of events and update their own views (which can be remixed in any way they want) of Microservice-1's state.

Changed data for a key in Redis: how to figure out what changed it

The web application I am working with is written in Django and is using Redis to cache some data from Elasticsearch. Yesterday everything was working fine, but today it started to give me an error. I looked over the structure of the data redis is storing for the key and some of the deep inner values for keys were changed to lists instead of dicts (that they are supposed to be). So, overnight redis data was modified by someone or something. Now I need a way to figure out which code changed it. If I launch the app after doing redic-cli flushdb and start using it, navigating here and there everything is working fine, and I can't find any apparent wrong code this way. The data for redis is set only in one place in the app code and it is set correctly. I looked into redis.log but it does not say which key it modified and when but this data could be crucial here.
So, I need to find out which code mistakenly modified the key. It could be separately run code by someone, could be some hidden specific side of the app (I doubt it is the case), or some bug within the redis itself. Maybe I would need to introduce some kind of additional observer that would run each time keys are changed writing when and which key was modified in redis. I am stuck and not sure what I could do here. Any suggestions would be greatly appreciated.
A few things you may try with Redis:
MONITOR is a debugging command that streams back every command processed by the Redis server. You may then see what command is modifying your key, from what client connection.
Redis Keyspace Notifications allow you to subscribe to Pub/Sub channels in order to receive events affecting the Redis data set in some way. You can subscribe to the key of interest.
CLIENT LIST command returns information and statistics about the client connections server in a mostly human readable format.
As you are suspicious from another code or someone modifying your data, you may want to use Redis 6 with ACLs, to control what clients can do what operations on what keys.

Redis Persistence Partial

I have multiple keys in redis most of which are insignificant and can be lost in case my redis server goes down.
However I have one or two keys, which I cannot afford to lose.
Hence I would like that whenever the server restarts, redis reads only these few keys from its persistent storage, and keep on persisting these as and when they change.
Does redis have this feature? If yes what command makes a key persisted to file and how to differ b/w persisted and unpersisted keys.
If no, What approach can I use such that I need not make my own persistent file before writing to Redis
Limitations(If the answer is no)
I do not want to change client code that enters in redis.
I cannot add more servers to redis(if any such solution exists, would like to know about it though).
EDIT
Another reason I would not want to save most keys as persistence because it is huge data, hundreds of records per second- Most of which expires in 10 minutes.

Propagating data from Redis slave to a SQL database

I'm using Redis for storing simple key, value pairs; where, value is also of string type. In my Redis cluster, I've a master and two slaves. I want to propagate any of the changes to the data from one of the slaves to any other store (actually, oracle database). How can I do that reliably? The sink database only needs to be eventually consistent. Some delay is allowed.
Strategies I can think of:
a) Read the AOF file written by the slave machine and propagate the changes. (Requires parsing the AOF file and getting notified of every change to the file.)
b) Use rpoplpush. The reliable queue pattern provided. But, how to make the slave insert to that queue whenever it gets some set event from the master?
Any other possibility?
This is a very common problem faced by Redis developers. In a nutshell, it is the fact that:
Want to know all changes sinse last
Keep this change data atomic
I believe that any decision one way or another will be around these issues. So, yes AOF is one of best choises in this case, but here is not any production ready instruments for that. Yes, it is not very complex solution in case of one server but then using master/slave or cluster it can be very complex.
Using Keyspace notifications
Look's like Keyspace Notifications feature may be alternative. Keyspace notifications is a feature available since 2.8.0 and available in Redis cluster too. From original documentation:
Keyspace notifications allows clients to subscribe to Pub/Sub channels in order to receive events affecting the Redis data set in some way.Examples of the events that is possible to receive are the following:
All the commands affecting a given key.
All the keys receiving an LPUSH operation.
All the keys expiring in the database 0.
Events are delivered using the normal Pub/Sub layer of Redis, so clients implementing Pub/Sub are able to use this feature without modifications.
Because Redis Pub/Sub is fire and forget currently there is no way to use this feature if you application demands reliable notification of events, that is, if your Pub/Sub client disconnects, and reconnects later, all the events delivered during the time the client was disconnected are lost. This can be improved by duplicating the employees who serve this Pub/Sub channel:
The group of N workers subscribe to notification and put data to SET based "sync" list. This allow us control overhead and do not write same data to our sync list.
The other group of workers pop record with SPOP and write it other store.
Using manual update list
The other way is using special "sync" SET based list with every write operation (as i understand SET/HSET in your case). Something like:
MULTI
SET myKey value
SADD myKey
EXEC
Each time you modify your key you add key name to SET. So in other process or worker you can SPOP that key, read value and update source.
Also you can use RPOPLPUSH/LPOPRPUSH besides of SPOP in some kind of in progress list to protect your key would missed if worker failed. In this case each worker first RPOPLPUSH/LPOPRPUSH from sync set to in progress set, push data to storage and remove key from in progress set.