Possible redis data corruption bug

Possible redis data corruption bug - redis

I have seen a data problem happening in redis and I am wondering if my diagnosis is correct. Essentially when I'm doing a lot of writing to a server and reading using a Jedis client, I am seeing timeouts followed by incorrect data being returned by get() operations - the data makes sense but it's for a different key.
Here is what I think is happening:
Master is put under a lot of write load
Slave does a periodic bgsave
Slave tries to catch up to the master but it's gotten too far behind so it does a full re-sync
To serve the full re-sync, master does a bgsave of a 10GB+ data set while handling lots of reads and writes
Jedis client get() call times out before the data comes back from the server
The next get() call done on the same client reads the data that has been written in response to the one that timed out (since it actually arrives in the socket buffer after the timeout but before the next call)
From now on, every get() call returns the data intended for the previous one
My solution, which seems to work, is to close and reopen the connection every time a timeout exception is thrown.
Does this seem like a plausible explanation for what I am seeing?

What you are describing would not be a Redis bug but a Jedis one as the offset reads would be happening in the client.
In this case a workaround to reconnect on timeout would be reasonable and should work. I'd also recommend submitting it as a bug to Jedis.

Related

In a Cloudflare worker why the faster stream waits the slower one when using the tee() operator to fetch to R2?

I want to fetch an asset into R2 and at the same time return the response to the client.
So simultaneously streaming into R2 and to the client too.
Related code fragment:
const originResponse = await fetch(request);
const originResponseBody = originResponse.body!!.tee()
ctx.waitUntil(
env.BUCKET.put(objectName, originResponseBody[0], {
httpMetadata: originResponse.headers
})
)
return new Response(originResponseBody[1], originResponse);
I tested the download of an 1GB large asset with a slower, and a faster internet connection.
In theory the outcome (success or not) of putting to R2 should be the same in both cases. Because its independent of the client's internet connection speed.
However, when I tested both scenarios, the R2 write was successful with the fast connection, and failed with the slower connection. That means that the ctx.waitUntil 30 second timeout was exceeded in case of the slower connection. It was always an R2 put "failure" when the client download took more than 30 sec.
It seems like the R2 put (the reading of that stream) is backpressured to the speed of the slower consumer, namely the client download.
Is this because otherwise the worker would have to enqueue the already read parts from the faster consumer?
Am I missing something? Could someone confirm this or clarify this? Also, could you recommend a working solution for this use-case of downloading larger files?
EDIT:
The Cloudflare worker implementation of the tee operation is clarified here: https://community.cloudflare.com/t/why-the-faster-stream-waits-the-slower-one-when-using-the-tee-operator-to-fetch-to-r2/467416
It explains the experiences.
However, a stable solution for the problem is still missing.

Cloudflare Workers limits the flow of a tee to the slower stream because otherwise it would have to buffer data in memory.
For example, say you have a 1GB file, the client connection can accept 1MB/s while R2 can accept 100MB/s. After 10 seconds, the client will have only received 10MB. If we allowed the faster stream to go as fast as it could, then it would have accepted all 1GB. However, that leaves 990MB of data which has already been received from the origin and needs to be sent to the client. That data would have to be stored in memory. But, a Worker has a memory limit of 128MB. So, your Worker would be terminated for exceeding its memory limit. That wouldn't be great either!
With that said, you are running into a bug in the Workers Runtime, which we noticed recently: waitUntil()'s 30-second timeout is intended to start after the response has finished. However, in your case, the 30-second timeout is inadvertently starting when the response starts, i.e. right after headers are sent. This is an unintended side effect of an optimization I made: when Workers detects that you are simply passing through a response body unmodified, it delegates pumping the stream to a different system so that the Worker itself doesn't need to remain in memory. However, this inadvertently means that the waitUntil() timeout kicks in earlier than expected.
This is something we intend to fix. As a temporary work-around, you could write your worker to use streaming APIs such that it reads each chunk from the tee branch and then writes it to the client connection in JavaScript. This will trick the runtime into thinking that you are not simply passing the bytes through, but trying to perform some modifications on them in JavaScript. This forces it to consider your worker "in-use" until the entire stream completes, and the 30-second waitUntil() timeout will only begin at that point. (Unfortunately this work-around is somewhat inefficient in terms of CPU usage since JavaScript is constantly being invoked.)

How synchronous is galera cluster

Actually I have couple of questions here.
1) When I call insert from my application using Mysql connector, its answered by one of the Master node, but does that master node waits before the insert is applied on all the nodes and then reply to the client. If it waits for all the nodes to insert before replying to the client then how is wsrep_sst_method=xtrabackup helps, will it make it reply to client immediately or will it make no difference. Maybe I understood this variable wrong.
2) What about read, I guess it is just answered by one of the master node. In case wsrep_sync_wait is set only in that case it waits for a reply from all the nodes.
Thanks

"How synchronous"? Synchronous enough, but with one exception: "Critical read".
The "fix" is during reading, not writing.
When writing the heavyweight checking is done during COMMIT. At this point, all other nodes are contacted to see if "this transaction will eventually commit successfully". That is, the other nodes say "yes" but don't actually finish the work enough for a subsequent SELECT to see the results of the write. The guarantee here is that, the cluster is in a consistent state and will stay that way, even if any one node dies.
"Critical read" is, for example, when a user posts something, then immediately reads the database and expects to see the posting. But, if the read (SELECT) hits a different node, the "almost" synchronous nature of Galera may not have committed the data to the reading node. The data is there, and will be successfully written to disk, but maybe not yet. The workaround is to use wsrep_sync_wait when reading to assure that replication is caught up before the SELECT. No action is taken when writing.
(I don't see the relevance of wsrep_sst_method=xtrabackup. That relates to recovering from a dead node.)

Web application hangs after multiple requests

The application is using Apache Server as a web server and Tomcat as an application server.
Operations/requests can be triggered from the UI, which can take time to return from the server as it does some processing like fetching data from the database and performing calculations on that data. This time depends on the amount of data in the database and the duration of data it is processing. It could be as long as 30min to an hour or 2 min's based on the parameters.
Apart from this, there are some other calls which fetche small amount of data from the database and return immediately.
Now when I have multiple, say 4 or 5 of these long heavy calls to the server, and they are currently running, when I make a call that is supposed to be smaller and return immediately, this call also hangs as it never reaches my controller.
I am unable to find a way to debug this issue or find a resolution. Please let me know if in case you happen to know how to proceed with this issue.
I am using Spring, with c3p0 connection pooling with Hibernate.

So I figured out what was wrong with the application, and thought about sharing it in case someone somewhere faces the same issue. It turns out nothing was wrong with the application server or the web server, when technically speaking it was the browsers fault.
I found out that the browser can only have a limited number of open concurrent calls to a domain. In the case of the latest version of chrome at the time of writing is 6. This is something all the browsers do to prevent DDOS attacks.
As in my application, the HTTP calls take a lot of time to return until the calculations are completed several HTTP calls accumulate concurrently and as a result, the browser stops sending any further calls after the 6th concurrent call and it feels like the application is unresponsive. You can read about the maximum no of concurrent calls by a browser in SO.
A possible solution I have thought is either polling or even better Long Polling. I would have used WebSockets but then we would need to make a lot of changes.

how to deal with read() timeout in Redis client?

Assume that my client send a 'INCR' command to redis server, but the response packet is lost, so my client's read() will times out, but client is not able to tell if INCR operation has been performed by server.
what to do next? resending INCR or continuing next command? If client resends INCR, but in case redis had carried out INCR in server side before, this key will be increased two times, which is not what we want.

This is not a problem specific to Redis: it also applies to any other data stores (including transactional ones). There is no solution to this problem: you can only hope to minimize the issue.
For instance, some people tend to put very aggressive values for their timeout thinking that Redis is supposed to be a soft real-time data store. Redis is fast, but you also need to consider the network, and the system itself. Network related problems may generate high latencies. If the system starts swapping, it will very seriously impact Redis response times.
I tend to think that putting a timeout under 2 secs is a nonsense on any Unix/Linux system, and if a network is involved, I am much more comfortable with 10 secs. People put very low values because they want to avoid their application to block: it is a mistake. Rather than setting very low timeouts and keep the application synchronous, they should design the application to be asynchronous and set sensible timeouts.
After a timeout, a client should never "continue" with the next command. It should close the connection, and try to open a new one. If a reply (or a query) has been lost, it is unlikely that the client and the server can resynchronize. It is safer to close the connection.
Should you try to issue the INCR again after the reconnection? It is really up to you. But if a read timeout has just been triggered, there is a good chance the reconnection will time out as well. Redis being single-threaded, when it is slow for one connection, it is slow for all connections simultaneously.

How to find unclosed connection? Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding

I've had this problem before and found that basically I've got a connection that I'm not closing quickly enough (leaving connections open and waiting for garbage collection isn't really a best practice).
Now I'm getting it again but I can't seem to find where I'm leaving my connections open. By the time is see the error the database has cleared out the old connections so I can't see all the locked up connections last command (very helpful last time I had this issue).
Any idea how I could instrument my code or database to track what's going on so I can find my offending piece of code?

The error you are providing doesnt really point to a connection that is left open; it is more likely that there is a query that is taking longer than the application expects.
you can increase the time it waits for a response, and you could use Sql to find which queries are the most taxing.

Hopefully you have one data access layer class, instead of a whole bunch of classes, each one creating its own connection, right? What language are you using? If your using C#, the biggest cause of this problem is DataReaders and returning these objects to the upper layers. Most likely some client class is not closing the DataReader it received from your DAL class, leaving the connection open/locked for who knows how long. Track down the DataReaders you're returning and make sure your client classes are closing/disposing of them properly.
I'd also start thinking about redesigning your data access layer by implementing Disposable pattern and possibly returning POCOs instead of Data (...Tables, ...Sets, ...Readers) objects.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas