RabbitMQ API returning incorrect queue statistics - rabbitmq

I'm working with RabbitMQ instances hosted at CloudAMQP. I'm calling the management API to get detailed queue statistics. About 1 in 10 calls to the API return invalid numbers.
The endpoint is /api/queues/[vhost]/[queue]?msg_rates_age=600&msg_rates_incr=30. I'm looking for average message rates at 30 second increments over a 10 minute span of time. Usually that returns valid data for the stats I'm interested in, e.g.
{
"messages": 16,
"consumers": 30,
"message_stats": {
"ack_details": {
"avg_rate": 441
},
"publish_details": {
"avg_rate": 441
}
}
}
But sometimes I get incorrect results for one or both "avg_rate" values, often 714676 or higher. If I then wait 15 seconds and call the same API again the numbers go back down to normal. There's no way the average over 10 minutes jumps by a multiple of 200 and then comes back down seconds later.
I haven't been able to reproduce the issue with a local install, only in production where the queue is always very busy. The data displayed on the admin web page always looks correct. Is there some other way to get the same stats accurately like the UI?

Related

Stress Test hit an HTTP error 400 during stress test with 600 threads

i'm doing Stress Test for my API for two endpoint. First is /api/register and second is /api/verify_tac/
request body on /api/register is
{
"provider_id": "lifecare.com.my",
"user_id": ${random},
"secure_word": "Aa123456",
"id_type": "0",
"id_number": "${id_number}",
"full_name": "test",
"gender": "F",
"dob": "2009/11/11",
"phone_number": ${random},
"nationality": "MY"
}
where ${random} and ${id_number} is a list from csv data config.
while request body for verify_tac is
{
"temp_token": "${temp_token}",
"tac":"123456"
}
${temp_token} is a response extract from /api/register response body.
For the test. I have done 5 type of testing without returning all error.
100 users with 60 seconds ramp up periods. All success.
200 users with 60 seconds ramp up periods. All success.
300 users with 60 seconds ramp up periods. All success.
400 users with 60 seconds ramp up periods. All success.
500 users with 60 seconds ramp up periods. All success.
600 users with 60 seconds ramp up periods. most of the /api/register response data is empty resulting in /api/verify_tac return with an error. request data from /api/verify_tac that return an error is
{
"temp_token": "NotFound",
"tac":"123456"
}
How can test number 6 was return with an error while all other 5 does not return error. They had the same parameter.
Does this means my api is overload with request? or weather my testing parameter is wrong?
If for 600 users response body is empty - then my expectation is that your application simply gets overloaded and cannot handle 600 users.
You can add a listener like Simple Data Writer configured as below:
this way you will be able to see request and response details for failing requests. If you untick Errors box JMeter will store request and response details for all requests. This way you will be able to see response message, headers, body, etc. for previous request and determine the failure reason.
Also it would be good to:
Monitor the essential resources usage (like CPU, RAM, Disk, Network, Swap usage, etc.) on the application under test side, it can be done using i.e. JMeter PerfMon Plugin
Check your application logs for any suspicious entries
Re-run your test with profiler tool for .NET like YourKit, this way you will be able to see the most "expensive" functions and identify where the application spends most time and what is the root cause of the problems

Azure Function Apps - maintain max batch size with maxDequeueCount

I have following host file:
{
"version": "2.0",
"extensions": {
"queues": {
"maxPollingInterval": "00:00:02",
"visibilityTimeout": "00:00:30",
"batchSize": 16,
"maxDequeueCount": 3,
"newBatchThreshold": 8
}
}
}
I would expect with setup there could never be more than batchSize+newBatchThreshold number of instances running. But I realized when messages are dequed they are run instantly and not just added to the back of the queue. This means you can end up with a very high amount of instances causing a lot of 429 (to many requests). Is there anyway to configure the function app to just add the dequeded messages to the back of the queue?
It was not related to dequeueCount. The problem was because it was a consumption plan, and then you cant control the amount of instances. After chaning to a Standard plan it worked as expected.

API throttle RateLimit-Remaining never updates every minutes

I have an API created with Laravel 5.2. I am using throttle for rate limiting. In my route file, I have set the following..
Route::group(['middleware' => ['throttle:60,1'], 'prefix' => 'api/v1'], function() {
//
}
As I understood the Laravel throttling, the above script will set the request limit to 60 per minute. I have an application querying the route which repeat every 10 seconds. So, per minute there are 6 request which is much more satisfying the above throttle.
Problem is, my query works until I execute 60 request regardless of time and after 60 request, it responds me 429 Too Many Requests with header Retry-After: 60. As per my understanding, the X-RateLimit-Remaining should update every 1 minute. But it seems never updated until it goes to 0. After become zero, then waits for 60 seconds then updated.
Am I doing anything wrong here.

ServerXmlHttpRequest hanging sometimes when doing a POST

I have a job that periodically does some work involving ServerXmlHttpRquest to perform an HTTP POST. The job runs every 60 seconds.
And normally it runs without issue. But there's about a 1 in 50,000 chance (every two or three months) that it will hang:
IXMLHttpRequest http = new ServerXmlHttpRequest();
http.open("POST", deleteUrl, false, "", "");
http.send(stuffToDelete); <---hang
When it hangs, not even the Task Scheduler (with the option enabled to kill the job if it takes longer than 3 minutes to run) can end the task. I have to connect to the remote customer's network, get on the server, and use Task Manager to kill the process.
And then its good for another month or three.
Eventually i started using Task Manager to create a process dump,
so i could analyze where the hang is. After five crash dumps (over the last 11 months or so) i get a consistent picture:
ntdll.dll!_NtWaitForMultipleObjects#20()
KERNELBASE.dll!_WaitForMultipleObjectsEx#20()
user32.dll!MsgWaitForMultipleObjectsEx()
user32.dll!_MsgWaitForMultipleObjects#20()
urlmon.dll!CTransaction::CompleteOperation(int fNested) Line 2496
urlmon.dll!CTransaction::StartEx(IUri * pIUri, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4453 C++
urlmon.dll!CTransaction::Start(const wchar_t * pwzURL, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4515 C++
msxml3.dll!URLMONRequest::send()
msxml3.dll!XMLHttp::send()
Contoso.exe!FrobImporter.TFrobImporter.DeleteFrobs Line 971
Contoso.exe!FrobImporter.TFrobImporter.ImportCore Line 1583
Contoso.exe!FrobImporter.TFrobImporter.RunImport Line 1070
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.HandleFrobImport Line 433
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.CoreExecute Line 71
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.Execute Line 84
Contoso.exe!Contoso.Contoso Line 167
kernel32.dll!#BaseThreadInitThunk#12()
ntdll.dll!__RtlUserThreadStart()
ntdll.dll!__RtlUserThreadStart#8()
So i do a ServerXmlHttpRequest.send, and it never returns. It will sit there for days (causing the system to miss financial transactions, until come Sunday night i get a call that it's broken).
It is of no help unless someone knows how to debug code, but the registers in the stalled thread at the time of the dump are:
EAX 00000030
EBX 00000000
ECX 00000000
EDX 00000000
ESI 002CAC08
EDI 00000001
EIP 732A08A7
ESP 0018F684
EBP 0018F6C8
EFL 00000000
Windows Server 2012 R2
Microsoft IIS/8.5
Default timeouts of ServerXmlHttpRequest
You can use serverXmlHttpRequest.setTimeouts(...) to configure the four classes of timeouts:
resolveTimeout: The value is applied to mapping host names (such as "www.microsoft.com") to IP addresses; the default value is infinite, meaning no timeout.
connectTimeout: A long integer. The value is applied to establishing a communication socket with the target server, with a default timeout value of 60 seconds.
sendTimeout: The value applies to sending an individual packet of request data (if any) on the communication socket to the target server. A large request sent to a server will normally be broken up into multiple packets; the send timeout applies to sending each packet individually. The default value is 30 seconds.
receiveTimeout: The value applies to receiving a packet of response data from the target server. Large responses will be broken up into multiple packets; the receive timeout applies to fetching each packet of data off the socket. The default value is 30 seconds.
The KB305053 (a server that decides to keep the connection open will cause serverXmlHttpRequest to wait for the connection to close) seems like it plausibly could be the issue. But the 30 second default timeout would have taken care of that.
Possible workaround - Add myself to a Job
The Windows Task Scheduler is unable to terminate the task; even though the option is enabled to do do.
I will look into using the Windows Job API to add my self process to a job, and use SetInformationJobObject to set a time limit on my process:
CreateJobObject
AssignProcessToJobObject
SetInformationJobObject
to limit my process to three minutes of execution time:
PerProcessUserTimeLimit
If LimitFlags specifies
JOB_OBJECT_LIMIT_PROCESS_TIME, this member is the per-process
user-mode execution time limit, in 100-nanosecond ticks. Otherwise,
this member is ignored.
The system periodically checks to determine
whether each process associated with the job has accumulated more
user-mode time than the set limit. If it has, the process is
terminated.
If the job is nested, the effective limit is the most
restrictive limit in the job chain.
Although since Task Scheduler uses Job objects to also limit a task's time, i'm not hopeful that the Job Object can limit a job either.
Edit: Job objects cannot limit a process by process time - only user time. And with a process idle waiting for an object, it will not accumulate any user time - certainly not three minutes worth.
Bonus Reading
How can a ServerXMLHTTP GET request hang? (GET, not POST)
KB305053: ServerXMLHTTP Stops Responding When You Send a POST Request (which says the timeout should expire; where mine does not)
MS Forums: oHttp.Send - Hangs (HEAD, not POST)
MS Forums: ASP to test SOAP WebService using MSXML2.ServerXMLHTTP Send hangs
CC to MS Support Forums
Consider switching to a newer, supported API.
msxml6.dll using MSXML2.ServerXMLHTTP.6.0
winhttpcom.dll using WinHttp.WinHttpRequest.5.1.
The msxml3.dll library is no longer supported and is only kept around for compatibility reasons. Plus, there were a number of security and stability improvements included with msxml4.dll (and newer) that you are missing out on.

Error returned by Nuance DragonMobile text-to-speech when maximum number of transactions is reached

I'm about to release my App on IOS that uses Nuance Dragon Mobile SDK. I'm signed up for the "Silver" plan, which allows me 20 transactions per day.
My question is, does anyone know what error is returned by Nuance, when the limit is exceeded? I'm concerned, because I am filtering out:
error.code == 5 // Because this fires whenever I interrupt running speech
error.code == 1 // Because after interrupting speech, the first time I restart, it cuts off
// before finished, so I automatically start again, so as not to trouble the user to do so
I figure if Nuance returns an error different from these, I'll allow it to pass through, and be able to alert the user that they've reached their daily limit.
I think the following gives the possible errors:
extern NSString * const SKSpeechErrorDomain;
enum {
SKServerConnectionError = 1,
SKServerRetryError = 2,
SKRecognizerError = 3,
SKVocalizerError = 4,
SKCancelledError = 5,
};
It seems likely to me that it's the SKServerConnectionError that would be fired. In that case, I need to come up with a different strategy. If I could figure out what's going on with the restart issue I wouldn't have to filter out that error. Plus, when I automatically restart these false starts, I'm probably racking up my transaction count, which is unfortunate.
Anybody have experience with this aspect of the Nuance SDK for IOS?