how to deal with read() timeout in Redis client? - redis

Assume that my client send a 'INCR' command to redis server, but the response packet is lost, so my client's read() will times out, but client is not able to tell if INCR operation has been performed by server.
what to do next? resending INCR or continuing next command? If client resends INCR, but in case redis had carried out INCR in server side before, this key will be increased two times, which is not what we want.

This is not a problem specific to Redis: it also applies to any other data stores (including transactional ones). There is no solution to this problem: you can only hope to minimize the issue.
For instance, some people tend to put very aggressive values for their timeout thinking that Redis is supposed to be a soft real-time data store. Redis is fast, but you also need to consider the network, and the system itself. Network related problems may generate high latencies. If the system starts swapping, it will very seriously impact Redis response times.
I tend to think that putting a timeout under 2 secs is a nonsense on any Unix/Linux system, and if a network is involved, I am much more comfortable with 10 secs. People put very low values because they want to avoid their application to block: it is a mistake. Rather than setting very low timeouts and keep the application synchronous, they should design the application to be asynchronous and set sensible timeouts.
After a timeout, a client should never "continue" with the next command. It should close the connection, and try to open a new one. If a reply (or a query) has been lost, it is unlikely that the client and the server can resynchronize. It is safer to close the connection.
Should you try to issue the INCR again after the reconnection? It is really up to you. But if a read timeout has just been triggered, there is a good chance the reconnection will time out as well. Redis being single-threaded, when it is slow for one connection, it is slow for all connections simultaneously.

Related

In a Cloudflare worker why the faster stream waits the slower one when using the tee() operator to fetch to R2?

I want to fetch an asset into R2 and at the same time return the response to the client.
So simultaneously streaming into R2 and to the client too.
Related code fragment:
const originResponse = await fetch(request);
const originResponseBody = originResponse.body!!.tee()
ctx.waitUntil(
env.BUCKET.put(objectName, originResponseBody[0], {
httpMetadata: originResponse.headers
})
)
return new Response(originResponseBody[1], originResponse);
I tested the download of an 1GB large asset with a slower, and a faster internet connection.
In theory the outcome (success or not) of putting to R2 should be the same in both cases. Because its independent of the client's internet connection speed.
However, when I tested both scenarios, the R2 write was successful with the fast connection, and failed with the slower connection. That means that the ctx.waitUntil 30 second timeout was exceeded in case of the slower connection. It was always an R2 put "failure" when the client download took more than 30 sec.
It seems like the R2 put (the reading of that stream) is backpressured to the speed of the slower consumer, namely the client download.
Is this because otherwise the worker would have to enqueue the already read parts from the faster consumer?
Am I missing something? Could someone confirm this or clarify this? Also, could you recommend a working solution for this use-case of downloading larger files?
EDIT:
The Cloudflare worker implementation of the tee operation is clarified here: https://community.cloudflare.com/t/why-the-faster-stream-waits-the-slower-one-when-using-the-tee-operator-to-fetch-to-r2/467416
It explains the experiences.
However, a stable solution for the problem is still missing.
Cloudflare Workers limits the flow of a tee to the slower stream because otherwise it would have to buffer data in memory.
For example, say you have a 1GB file, the client connection can accept 1MB/s while R2 can accept 100MB/s. After 10 seconds, the client will have only received 10MB. If we allowed the faster stream to go as fast as it could, then it would have accepted all 1GB. However, that leaves 990MB of data which has already been received from the origin and needs to be sent to the client. That data would have to be stored in memory. But, a Worker has a memory limit of 128MB. So, your Worker would be terminated for exceeding its memory limit. That wouldn't be great either!
With that said, you are running into a bug in the Workers Runtime, which we noticed recently: waitUntil()'s 30-second timeout is intended to start after the response has finished. However, in your case, the 30-second timeout is inadvertently starting when the response starts, i.e. right after headers are sent. This is an unintended side effect of an optimization I made: when Workers detects that you are simply passing through a response body unmodified, it delegates pumping the stream to a different system so that the Worker itself doesn't need to remain in memory. However, this inadvertently means that the waitUntil() timeout kicks in earlier than expected.
This is something we intend to fix. As a temporary work-around, you could write your worker to use streaming APIs such that it reads each chunk from the tee branch and then writes it to the client connection in JavaScript. This will trick the runtime into thinking that you are not simply passing the bytes through, but trying to perform some modifications on them in JavaScript. This forces it to consider your worker "in-use" until the entire stream completes, and the 30-second waitUntil() timeout will only begin at that point. (Unfortunately this work-around is somewhat inefficient in terms of CPU usage since JavaScript is constantly being invoked.)

In what types of workloads does multi-threaded I/O in Redis 6 make a difference?

My basic understanding is that all operations in Redis are single threaded. In Redis-6 there is multi-threaded I/O.. I'm just curious what advantage this has if all the I/O threads still need to wait on the single thread that does all the querying? I was hoping someone could provide some example work loads that would illustrate the advantages or disadvantages.
My basic understanding is that all operations in Redis are single threaded.
NO. Even before Redis 6, there're some background threads, e.g. background saving, unlinking keys asynchronously.
I'm just curious what advantage this has if all the I/O threads still need to wait on the single thread that does all the querying?
Before Redis 6, Redis processes a request with 4 steps in serial (in a single thread):
reading the request from socket
parsing it
process it
writing the response to socket
Before it finishes these 4 steps, Redis cannot process other requests, even if there're some requests ready for reading (step 1). And normally writing the response to socket (step 4) is slow, so if we can do the write operation in another IO thread (configuration: io-threads), Redis can process more requests, and be faster.
Also you can set Redis to run step 1 and 2 in another IO thread (configuration: io-threads-do-reads), however, the Redis team claims that normally it doesn't help much (Usually threading reads doesn't help much. -- quoted from redis.conf).
NOTE: since step 3 is always running in a single thread, Redis operations are still guaranteed to be atomic.
someone could provide some example work loads that would illustrate the advantages or disadvantages.
If you want to test the Redis speedup using redis-benchmark, make sure you also run the benchmark itself in threaded mode, using the --threads option to match the number of Redis theads, otherwise you'll not be able to notice the improvements. -- quoted from redis.conf

IBM MQ Multi-Instance Queues

My company uses IBM MQ's Multi-Instance Queues right now. We would like to replicate those queues to a different Data Center over the WAN for Disaster Recover purposes. I'm skeptical it will work simply due to all the message traffic and even a slight delay will cause the Queues to fail.
What is the technical reason why this will not work?
Are you talking about storage replication? If so are you planning to use synchronous or asynchronous replication?
Asynch will not cause any delay on the replicating end but there will be some amount of delay before the receiving end receives data depending on network distance. Your storage team should be able to tell you how many seconds the async replication delay could be.
With synch the data is sent over the network by the replicating end storage array and a confirmation comes back over the network before the the storage array returns to the OS that the write was successful. To be usable the two arrays have to be with in 6ms of each other. This type of replication adds a delay to each write equal to the network ms.
MQ application can batch messages into single units of work to improve performance with sync replication is in place, but this will slow down persistent message performance.
Define "Slight delay" in your statement?
Async replication will cause a delay and RPO will not be zero. Your storage team can advise on RPO value. If that is not acceptable, asynch replication is not an option for you.
Although it's pragmatic choice from cost and distance standpoint but could cause duplicate or missing transactions.
For synch replication, the distance in data-centers is limited. (Apart from hit on performance on Primary DC). Check with your storage team on the distance limit.

Possible redis data corruption bug

I have seen a data problem happening in redis and I am wondering if my diagnosis is correct. Essentially when I'm doing a lot of writing to a server and reading using a Jedis client, I am seeing timeouts followed by incorrect data being returned by get() operations - the data makes sense but it's for a different key.
Here is what I think is happening:
Master is put under a lot of write load
Slave does a periodic bgsave
Slave tries to catch up to the master but it's gotten too far behind so it does a full re-sync
To serve the full re-sync, master does a bgsave of a 10GB+ data set while handling lots of reads and writes
Jedis client get() call times out before the data comes back from the server
The next get() call done on the same client reads the data that has been written in response to the one that timed out (since it actually arrives in the socket buffer after the timeout but before the next call)
From now on, every get() call returns the data intended for the previous one
My solution, which seems to work, is to close and reopen the connection every time a timeout exception is thrown.
Does this seem like a plausible explanation for what I am seeing?
What you are describing would not be a Redis bug but a Jedis one as the offset reads would be happening in the client.
In this case a workaround to reconnect on timeout would be reasonable and should work. I'd also recommend submitting it as a bug to Jedis.

Low performance ActiveMQ

I am performance testing my piece of code working on activeMQ,
I use virtual topics in there. when I send about a 1000 Concurrent requests to en-queue my messages,it takes ages to en-queue all the messages, and sometimes it just hangs in between and starts back after sometime.
I am using JDBC message store,I know some performance effect might be because of that.
Is this hit on performance mainly due to virtual topics?,because on activemq Website they Specify a very high performance of the topic(under ideal conditions ofcourse)
P.S: 1 message takes almost 13-15 milliseconds to be enqueued and dequeued, which is way too high than what performance activeMQ claims to have
http://activemq.apache.org/performance.html
The performance hit is mainly because of the JDBC message store. Virtual Topics do not differ much in performance compared to durable subscriptions.
Please use LevelDB or KahaDB if you want performance. The JDBC store is mainly there for compability with setups that already uses fail-over secured databases with backups etc and want to use them for messages as well. You won't come even close to the numbers in the performance page with plain JDBC.