Every now and then we received a large set of timeouts (around peak time for website traffic) with lots of logs in the following form:
Timeout performing GET (5000ms)
next: GET ObjectPageView.120.633.0
inst: 21
qu: 0
qs: 0
aw: False
bw: SpinningDown
rs: ReadAsync
ws: Idle
in: 0
last-in: 0
cur-in: 0
sync-ops: 456703
async-ops: 1
conn-sec: 72340.11
mc: 1/1/0
mgr: 10 of 10 available
IOCP: (Busy=0 Free=1800 Min=600 Max=1800)
WORKER: (Busy=720 Free=1080 Min=600 Max=1800)
v: 2.6.90.64945
What do sync-ops and conn-sec stand for? The rest of the numbers seem fine, but these seem high and I'm not entirely sure what they are describing.
These are statistic about the current connection:
"sync-ops" a count of synchronous operations (as opposed to async-ops for asynchronous operations) performed on the current connection.
"conn-sec" is the current duration of the current connection (from connected until now)
Related
What does "bw: SpinningDown" mean in this error -
Timeout performing GET (5000ms), next: GET foo!bar!baz, inst: 5, qu: 0, qs: 0, aw: False, bw: SpinningDown, ....
Does it mean that the Redis server instance is spinning down, or something else?
It means something else actually. The abbreviation bw stands for Backlog-Writer, which contains the status of what the backlog is doing in Redis.
For this particular status: SpinningDown, you actually left out the important bits that relate to it.
There are 4 values being tracked for workers being Busy, Free, Min and Max.
Let's take these hypothetical values: Busy=250,Free=750,Min=200,Max=1000
In this case there are 50 more existing (busy) threads than the minimum.
The cost of spinning up a new thread is high, especially if you hit the .NET-provided global thread pool limit. In which case only 1 new thread is created every 500ms due to throttling.
So once the Backlog is done processing an item, instead of just exiting the thread, it will keep it in a waiting state (SpinningDown) for 5 seconds. If during that time there still is more Backlog to process, the same thread will process another item from the Backlog.
If no Backlog item needed to be processed in those 5 seconds, the thread will be exited, which will eventually lead to a decrease in Busy (existing) threads.
This only happens for threads above the Min count of course, as those will be kept alive even if there is no work to do.
.NET Framework 4.7 and StackExchange.Redis version=2.5.43.
I am only seeing this error when the cache is on the server but not when running locally with the cache running in a container on my machine.
StackExchange.Redis.RedisTimeoutException:
'Timeout performing EXISTS (10000ms),
next: PSETEX MyKey-f79c9cad-c265-e611-80d8-005056b35bfa,
inst: 190,
qu: 39996,
qs: 176,
aw: True,
bw: Flushing,
rs: DequeueResult,
ws: Flushing,
in: 0,
in-pipe: 0,
out-pipe: 528424,
serverEndpoint: MyServer:6380,
mc: 1/1/0,
mgr: 9 of 10 available,
clientName: MyClient(SE.Redis-v2.5.43.42402),
IOCP: (Busy=0,Free=1000,Min=12,Max=1000),
WORKER: (Busy=1,Free=2046,Min=12,Max=2047),
v: 2.5.43.42402 (Please take a look at this article for some common client-side issues
that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)'
I have tried increasing the SyncTimeout configuration to 10000 but no difference.
The out-pipe value looks quite high but I am not sure what this is an indication of, or if it is a red herring.
Any ideas what could cause this timeout?
Thanks
As pointed out by #slorello the high "qu" and "qs" seems to be the issue where some big payloads blocking the pipe.
Breaking the big payloads down into smaller payloads seems to stopped the timeout. I will also investigate ConnectionMultiplexer pooling as recommended here in point 10
I got this timeout exception suddenly when I try to persist a range of data, it was working before and I didn't do any changes:
Timeout performing HMSET {key}, inst: 0, mgr: ExecuteSelect, err:
never, queue: 2, qu: 1, qs: 1, qc: 0, wr: 1, wq: 1, in: 0, ar: 0,
clientName: {machine-name}, serverEndpoint:
Unspecified/localhost:6379, keyHashSlot: 2689, IOCP:
(Busy=0,Free=1000,Min=4,Max=1000), WORKER:
(Busy=0,Free=2047,Min=4,Max=2047), Local-CPU: 100% (Please take a look
at this article for some common client-side issues that can cause
timeouts:
https://github.com/StackExchange/StackExchange.Redis/tree/master/Docs/Timeouts.md)
I'm using Redis on windows.
In your timeout error message, I see Local-CPU: 100%. This is the CPU on your client that is calling into Redis server. You might want to look into what is causing the high CPU load on your client.
This article describes why high CPU usage can lead to client-side timeouts. https://gist.github.com/JonCole/db0e90bedeb3fc4823c2#high-cpu-usage
So, I battled with this issue for a few days and almost gave up. Like #Amr Reda said, breaking a large sets into smaller ones might work but that's not optimal.
In my case, I was trying to move 27,000 records into redis and i kept encountering the issue.
To resolve the issue, increase the SyncTimeout value in your redis connection string. It's set by default to 1000ms ie 1second. Large datasets typically take longer to add.
I found out what causing the issue, as I was trying to bulk inserting into hash. What I did is that I chunked the inserted list into smaller ones.
Quick suggestions that worked in my case, using a console .net project with very high concurrency using multithread (around 30.000).
In the program.cs, I added some ThreadPool settings:
int newWorkerThreadsPerCore = 50, newIOCPPerCore = 100;
ThreadPool.SetMinThreads(newWorkerThreadsPerCore, newIOCPPerCore);
Also, I had to change everything from:
var redisValue = dbCache.StringGet("SOMETHING");
To:
var redisValue = dbCache.StringGetAsync("SOMETHING").Result;
Even if you might think they look almost the same (considering you always end up waiting for a result), if you use the non-async version and one single thread receives a redis timeout, it will make all the other 29.999 threads waiting for redis to timeout too, while the async one will only cause a timeout in that only single thread.
We are receiving following timeout exception while retrieving data from Redis cache.
'Timeout performing GET inst: 2, mgr: Inactive, err: never, queue: 3, qu: 0, qs: 3, qc: 0, wr: 0, wq: 0, in: 18955,
IOCP: (Busy=4,Free=996,Min=2,Max=1000), WORKER: (Busy=0,Free=1023,Min=2,Max=1023),
Please note: Every timeout exception has different above values. queue is sometimes 2,1,3 and qs also varies with the queue value.
Also, IN: values keeps changing like 18955, 65536, 36829 etc.
Even IOCP changes like
IOCP: (Busy=6,Free=994,Min=2,Max=1000), WORKER: (Busy=0,Free=1023,Min=2,Max=1023).
Please note:
There are many similar questions in stack overflow and tried all of them. But, no luck.
We recently updated nuget package to the latest stable version (v1.2.1) of StackExchange.Redis library,
This exception seems to be occuring at the same place everytime even though there are various places where we are using redis cache. This has been found with the help of stack trace.
Also, we never faced this issue earlier like we are using the same solution from last 3 years and never encountered this issue. This exception has been occurring from last 3 months frequently atleast 3-4 times daily.
It looks like you are experiencing threadpool throttling (from the Busy and Min numbers in your error message). You will need to increase the MIN values for IOCP and Worker pool threads.
https://gist.github.com/JonCole/e65411214030f0d823cb#file-threadpool-md has more information.
I have a job that periodically does some work involving ServerXmlHttpRquest to perform an HTTP POST. The job runs every 60 seconds.
And normally it runs without issue. But there's about a 1 in 50,000 chance (every two or three months) that it will hang:
IXMLHttpRequest http = new ServerXmlHttpRequest();
http.open("POST", deleteUrl, false, "", "");
http.send(stuffToDelete); <---hang
When it hangs, not even the Task Scheduler (with the option enabled to kill the job if it takes longer than 3 minutes to run) can end the task. I have to connect to the remote customer's network, get on the server, and use Task Manager to kill the process.
And then its good for another month or three.
Eventually i started using Task Manager to create a process dump,
so i could analyze where the hang is. After five crash dumps (over the last 11 months or so) i get a consistent picture:
ntdll.dll!_NtWaitForMultipleObjects#20()
KERNELBASE.dll!_WaitForMultipleObjectsEx#20()
user32.dll!MsgWaitForMultipleObjectsEx()
user32.dll!_MsgWaitForMultipleObjects#20()
urlmon.dll!CTransaction::CompleteOperation(int fNested) Line 2496
urlmon.dll!CTransaction::StartEx(IUri * pIUri, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4453 C++
urlmon.dll!CTransaction::Start(const wchar_t * pwzURL, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4515 C++
msxml3.dll!URLMONRequest::send()
msxml3.dll!XMLHttp::send()
Contoso.exe!FrobImporter.TFrobImporter.DeleteFrobs Line 971
Contoso.exe!FrobImporter.TFrobImporter.ImportCore Line 1583
Contoso.exe!FrobImporter.TFrobImporter.RunImport Line 1070
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.HandleFrobImport Line 433
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.CoreExecute Line 71
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.Execute Line 84
Contoso.exe!Contoso.Contoso Line 167
kernel32.dll!#BaseThreadInitThunk#12()
ntdll.dll!__RtlUserThreadStart()
ntdll.dll!__RtlUserThreadStart#8()
So i do a ServerXmlHttpRequest.send, and it never returns. It will sit there for days (causing the system to miss financial transactions, until come Sunday night i get a call that it's broken).
It is of no help unless someone knows how to debug code, but the registers in the stalled thread at the time of the dump are:
EAX 00000030
EBX 00000000
ECX 00000000
EDX 00000000
ESI 002CAC08
EDI 00000001
EIP 732A08A7
ESP 0018F684
EBP 0018F6C8
EFL 00000000
Windows Server 2012 R2
Microsoft IIS/8.5
Default timeouts of ServerXmlHttpRequest
You can use serverXmlHttpRequest.setTimeouts(...) to configure the four classes of timeouts:
resolveTimeout: The value is applied to mapping host names (such as "www.microsoft.com") to IP addresses; the default value is infinite, meaning no timeout.
connectTimeout: A long integer. The value is applied to establishing a communication socket with the target server, with a default timeout value of 60 seconds.
sendTimeout: The value applies to sending an individual packet of request data (if any) on the communication socket to the target server. A large request sent to a server will normally be broken up into multiple packets; the send timeout applies to sending each packet individually. The default value is 30 seconds.
receiveTimeout: The value applies to receiving a packet of response data from the target server. Large responses will be broken up into multiple packets; the receive timeout applies to fetching each packet of data off the socket. The default value is 30 seconds.
The KB305053 (a server that decides to keep the connection open will cause serverXmlHttpRequest to wait for the connection to close) seems like it plausibly could be the issue. But the 30 second default timeout would have taken care of that.
Possible workaround - Add myself to a Job
The Windows Task Scheduler is unable to terminate the task; even though the option is enabled to do do.
I will look into using the Windows Job API to add my self process to a job, and use SetInformationJobObject to set a time limit on my process:
CreateJobObject
AssignProcessToJobObject
SetInformationJobObject
to limit my process to three minutes of execution time:
PerProcessUserTimeLimit
If LimitFlags specifies
JOB_OBJECT_LIMIT_PROCESS_TIME, this member is the per-process
user-mode execution time limit, in 100-nanosecond ticks. Otherwise,
this member is ignored.
The system periodically checks to determine
whether each process associated with the job has accumulated more
user-mode time than the set limit. If it has, the process is
terminated.
If the job is nested, the effective limit is the most
restrictive limit in the job chain.
Although since Task Scheduler uses Job objects to also limit a task's time, i'm not hopeful that the Job Object can limit a job either.
Edit: Job objects cannot limit a process by process time - only user time. And with a process idle waiting for an object, it will not accumulate any user time - certainly not three minutes worth.
Bonus Reading
How can a ServerXMLHTTP GET request hang? (GET, not POST)
KB305053: ServerXMLHTTP Stops Responding When You Send a POST Request (which says the timeout should expire; where mine does not)
MS Forums: oHttp.Send - Hangs (HEAD, not POST)
MS Forums: ASP to test SOAP WebService using MSXML2.ServerXMLHTTP Send hangs
CC to MS Support Forums
Consider switching to a newer, supported API.
msxml6.dll using MSXML2.ServerXMLHTTP.6.0
winhttpcom.dll using WinHttp.WinHttpRequest.5.1.
The msxml3.dll library is no longer supported and is only kept around for compatibility reasons. Plus, there were a number of security and stability improvements included with msxml4.dll (and newer) that you are missing out on.