Changed data for a key in Redis: how to figure out what changed it

Changed data for a key in Redis: how to figure out what changed it - redis

The web application I am working with is written in Django and is using Redis to cache some data from Elasticsearch. Yesterday everything was working fine, but today it started to give me an error. I looked over the structure of the data redis is storing for the key and some of the deep inner values for keys were changed to lists instead of dicts (that they are supposed to be). So, overnight redis data was modified by someone or something. Now I need a way to figure out which code changed it. If I launch the app after doing redic-cli flushdb and start using it, navigating here and there everything is working fine, and I can't find any apparent wrong code this way. The data for redis is set only in one place in the app code and it is set correctly. I looked into redis.log but it does not say which key it modified and when but this data could be crucial here.
So, I need to find out which code mistakenly modified the key. It could be separately run code by someone, could be some hidden specific side of the app (I doubt it is the case), or some bug within the redis itself. Maybe I would need to introduce some kind of additional observer that would run each time keys are changed writing when and which key was modified in redis. I am stuck and not sure what I could do here. Any suggestions would be greatly appreciated.

A few things you may try with Redis:
MONITOR is a debugging command that streams back every command processed by the Redis server. You may then see what command is modifying your key, from what client connection.
Redis Keyspace Notifications allow you to subscribe to Pub/Sub channels in order to receive events affecting the Redis data set in some way. You can subscribe to the key of interest.
CLIENT LIST command returns information and statistics about the client connections server in a mostly human readable format.
As you are suspicious from another code or someone modifying your data, you may want to use Redis 6 with ACLs, to control what clients can do what operations on what keys.

Related

REDIS search does not return results consistently

I am reading events from Kafka and depositing into REDIS. Then, we read events using Python and in case we don’t find events we drop/re-create the index.
However, I noticed at times even after re-creating the index I still don’t find events.
I have couple of questions -
[Q1] Is re-indexing a good approach where we are continuously getting a huge flow of events?
[Q2] Also, I noticed during REDIS search at times I do get events and then at another instance query does not return results, can this be related to dropping / re-creating index?
[Q3] Is there a better standard approach to ensure JSON are deposited / retrieved consistenly.
[Q4] Is there an explanation as to why at times everything just seems to work fine continuously for several hours and then does not work at all for few hours.
I would appreciate alternate approaches for this simple use case as I am fairly new to REDIS

Here are some points. I hope it would be helpful.
[Q1]
Re-indexing can be a good approach if you are getting a huge flow of events, as it will help keep your data up-to-date. However, you also need to make sure that your Redis instance is able to handle the increased load otherwise Redis can become very slow when it is re-indexing and may not be able to keep up with the flow of event. And in that case, you may need to scale your Redis instance to handle the load or incremental indexing may be a better option.
[Q2]
There could be many reasons why search results vary, but it is possible that the index is being dropped and re-created, which would cause the event to not be found. There are a few other things that could be happening:
There could be an issue with the search algorithm, which would cause search results to not be returned.
The data in the database could be changing, which would cause search results to not be returned.
[Q3]
There is no one standard approach to ensure JSON are deposited or retrieved consistently. However, some best practices you may consider include using a library or tool that supports serialization and deserialization of JSON data, validating input and output data, and using an editor that highlights errors in JSON syntax. You can also use locking mechanism to ensure that only one process can write to the Redis instance at a time, or using a queue to buffer writes to Redis. Also, different developers may have different preferences and opinions on the best way to handle JSON in Redis. Some possible methods include using commands such as JSET or JGET to manage JSON objects, or using a library such as JRedis to simplify the process.
[Q4]
Redis can be temperamental, and its behavior can vary depending on the specific configuration and usage scenarios. There is no specific explanation for this behavior, but it could be due to various factors such as load on the Redis server, network conditions, or other applications using the same Redis instance. In that case, server will not be able to handle requests properly and will stop responding. If everything is working fine for a few hours and then suddenly stops working, you can try restarting Redis or checking the logs for any errors that may have occurred.

redis cluster - is a proxy or cluster supporting library necessary to interact with a cluster?

So, I'm designing a distributed system with multiple redis instances to break up a large amount of streaming writes, but finding it difficult to get a clear picture of how things work.
From what I've read, it seems that a properly configured cluster will automatically shard and redirect requests made on the 'wrong instance' ( say key 'A' maps to instance 1 but is set on instance 2, it will be redirected to instance 1 ) Am I correct in assuming this?
If so, what advantages does an extra proxy and/or library cluster support give me over simply just connecting to one redis instance and letting it do all the work of figuring out where the SETS and GETS should be done?

Cluster support on the client means the client learns where the data is stored and remembers it, next time it tries to read or write a key it goes straight to the correct instance, which improves performance.
Its like calling directory enquires first every time you want to call a business vs just knowing the number of the business.

Is it possible to get list of keys changed in redis server?

I'm getting over 10000 updates in 60 seconds in my Redis server and this triggers the background save which consumes resources.
I want to track the changed keys so that I can debug my app (which method causing this much change).
Is there a way to get updated keys?

While MONITOR is perfectly valid, it does include everything that gets sent to Redis. That means filtering read requests, pings, ...
Instead I recommend that you check the keyspace notifications documentation and configure your database the AK flags. By subscribing to the __keyspace:* pattern you'll be notified about every change to keys.

As I learned, it's only possible by using MONITOR command and figure out from output.

Redis cache in a clustered web farm? Sync between two member nodes?

Ok, so what I have are 2 web servers running inside of a Windows NLB clustered environment. The servers are identical in every respect, and as you'd expect in an NLB clustered environment, everybody is hitting the cluster name and not the individual members. We also have affinity turned off on the members in the cluster.
But, what I'm trying to do is to turn on some caching for a few large files (MP3s). It's easy enough to dial up a Redis node on one particular member and hit it, everything works like you'd expect. I can pull the data from the cache and serve it up as needed.
Now, let's add the overhead of the NLB. With an NLB in play, you may not be hitting the same web server each time. You might make your first hit to member 01, and the second hit to 02. So, I'd need a way to sync between the two servers. That way it doesn't matter which cluster member you hit, you are going to get the same data.
I don't need to worry about one cache being out of date, the only thing I'm storing in there is read only data from an internal web service.
I've only got 2 servers and it looks like redis clusters need 3. So I guess that's out.
Is this the best approach? Or perhaps there is something else better?
Reasons for redis: We only want the cache to use in-memory only. No writes to the database. Thought this would be a good fit, but need to make sure the data is available in both servers.

It's not possible to have redis multi master (writing on both). And I might say it's replication is blazing fast (check the slaveof command of Redis).
But why you need it in the same server? Access it as a service. So every node will access the actual data. If the main server goes down, the slave will promptly turn itself into a master.
One observation: you might notice that Redis makes use of disk in an async way. An append only file that it does checkpoint depending on the size from time to time so.

SyncFramework 2.1 updates & deletes do not seem to apply properly

I'm synchronizing SQL Server 2008 with ~6 SQL Server 2008 Express clients (everything R2 I believe), using the SyncOrchestrator or specifically using http://code.msdn.microsoft.com/windowsdesktop/Database-SyncSQL-Server-e97d1208 as a base with slight modifications. To my knowledge this means all connections are peers or nodes.
I have 2 scopes. One is download only and the other is upload only. The download only scope is ridden with identity columns primarily because I didn't know any better and still couldn't wrap my head around introducing Guids as the PK on the client side. It doesn't totally matter as all clients should have exact replicas of about 8 or so tables and these machines don't touch this data in any way, only read it.
The upload only scope uses Guids as fortunately I can control that portion of the database and there would be no way 10 clients all using the same identity seed could sync back to the server properly. Both scopes use the default provisioning with bulk inserts and the whole 9 yards so there shouldn't be anything I'm doing on the provisioning end to screw this up.
I initially set everything up not using PerformPostRestoreFixup AND the initial database would be manually synchronized with insert statements from the host. This seemed fine but no updates or deletes seemed to ever be applied. You can safely ignore this (only used for historical accuracy and to prove my ineptness) as I then used VS2010 Database Projects to rebuild the database down to schema only & synchronized. I then used the steps outlined here (http://social.microsoft.com/Forums/br/syncdevdiscussions/thread/9ac6d1a1-1565-4b82-a8d8-3d4a9ff5d07b) (sync, backup, restore, call performpostrestorefixup, sync on x clients) and on my dev box where I'm setting all this up I could see updates and deletes just fine. Its when I deploy this to the x clients that I'm not seeing a mirror of the database as I think I should.
The initial sync will complain and try to synchronize all records again. I believe this is expected. During ApplyChangeFailed event on the client I set everything other than DbConflictType.ErrorsOccurred to ApplyAction.RetryWithForceWrite. This may be a source of problems as I initially thought this should be done to force the change down to the client. I want the server to always win in this scenario but during trace I always see the phrase "Local wins" during the bulk insert/update calls. It's possible I'm seeing the error before the re-apply happens but it's awkward to look at.
The only problem I seem to be having is with the download only scope. The initial client database is about a week old now and if I use the performpostrestorefixup steps I don't see any of the updates that have applied between now and then as I think I should. It's as if SyncFx almost prefers a blank database on the client side to kick off the initial sync then all the updates seem to apply just fine with no ApplyChangesFailed events kicking off.
If anyone has seen this before or has a clue where to go I would greatly appreciate it. My brain has fried trying to determine what it is that's going on. My last ditch effort will be to deploy blank databases to all the clients and have them start the sync. I've had no issues with this on the dev side but I can only test one other client to know if that'll do anything different. Aside from that I don't know what to do other than to keep doing manual syncs which would defeat this purpose entirely. I thought PerformPostRestoreFixup would alleviate the issue entirely but I seem to be having the same problems with or without it or perhaps I'm not looking at what I need to be.
Thanks

I wanted to report and close the entry with my findings.
When I would deploy a previously configured client database, I'd often get ApplyChangeFailed events in the form of this log:
"[05:30:41 PM] - ApplyChange Failed: TableName: , Stage: ApplyingInserts, ConflictType: LocalInsertRemoteInsert, Action: RetryWithForceWrite"
This is what I thought would be expected as it tried to reinsert the data that is already there. What this should've been changed to was an update statement during RetryWithForceWrite but I found the data was not updating with what was being sent down.
Once I started each client with a completely blank database and provisioned locally, all of these errors went away. It's as if every client expects some unique id only it sets. I'm also using x64 builds versus x86 which may have some or no bearing on the results. I wish I could determine what exactly happened but it seems that when in doubt, and whenever possible, starting from absolute zero and letting sync fill in the data is your safest option.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas