Why is this monit config reminder syntax not resulting in repeated alerts? - monit

I have setup a monit config to check that a jenkins build node is connected (its VPN connection is still up) by checking for its VPN IP address on a server that is inside the network already. It seems to work at least once when the computer is not connected. But it only seems to trigger once in a blue moon and not repeatedly like I want it to.
check host JenkinsMacOSXNode with address 192.168.237.10
if failed icmp type echo
count 5 with timeout 5 seconds
2 times within 3 cycles
then alert with reminder on 3 cycles
alert admin#ourdomain.com
Is the above syntax correct for having an alert sent repeatedly when an expected computer is not pingable?
In case the next question is how often is the cycle set to, the /etc/monit/monitrc indicates set daemon 120 so each cycle should be every 2 minutes
Is there a better way to accomplish checking for a computer that should be connected via VPN to the network and alert if it is not?

Try setting the alert with reminder definition before the test:
set alert admin#ourdomain.com with reminder on 3 cycles
check host JenkinsMacOSXNode with address 192.168.237.10
if failed icmp type echo
count 5 with timeout 5 seconds
2 times within 3 cycles
then alert

Related

Catchpoint pause vs. waitForNoRequest - What's the difference?

I have a test that was alerting because it was taking extra time for an asset to load. We changed from waitForNoRequest to a pause (at Catchpoint's suggestion). That did not seem to have the expected effect of waiting for things to load. We increased the pause from 3000 to 12000 and that helped to allow the page to load and stop the alert. We noticed some more alerts, so I tried to increase the pause to something like 45000 and it would not allow me to pause for that long.
So the main question here is - what functionality does both of these different features provide? What do I gain by pausing instead of waiting, if anything?
Here's the test, data changed to protect company specific info. Step 3 is where we had some failures and we switched between pause and wait.
// Step - 1
open("https://website.com/")
waitForNoRequest("2000")
click("//*[#id=\"userid\"]")
type("//*[#id=\"userid\"]", "${username}")
setStepName("Step1-Login-")
// Step - 2
clickMouseAndWait("//*[#id=\"continue\"]")
waitForVisible("//*[#id=\"challenge-password\"]")
click("//*[#id=\"challenge-password\"]")
type("//*[#id=\"challenge-password\"]", "${password}")
setStepName("Step2-Login-creds")
// Step - 3
clickMouseAndWait("//*[#id=\"signIn\"]")
setStepName("Step3-dashboard")
waitForTitle("Dashboard")
waitForNoRequest("3000")
click("//*[#id=\"account-header-wrapper\"]")
waitForVisible("//*[#id=\"logout-link\"]")
click("//*[#id=\"logout-link\"]")
// Step - 4
clickAndWait("//*[text()=\"Sign Out\"]")
waitForTitle("Login - ")
verifyTextPresent("You have been logged out.")
setStepName("Step5-Logout")
Rachana here, I’m a member of the Technical Service Team here at Catchpoint, I’ll be happy to answer your questions.
Please find the differences below between waitForNoRequest and Pause commands:
Pause
Purpose: This command pauses the script execution for a specified amount of time, whether there are HTTP/s requests downloading or not. Time value is provided in milliseconds, it can range between 100 to 30,000 ms.
Explanation: This command is used when the agent needs to wait for a set amount of time and this is not impacted by the way the requests are loaded before proceeding to the next step or command. Only a parameter is required for this action.
WaitForNoRequest
Purpose: This commands waits for a specified amount of time, when there was no HTTP/s requests downloading. The wait time parameter can range between 1,000 to 5,000 ms.
Explanation: The only parameter for this action is a wait time. The agent will wait for that specified amount of time before moving onto the next step/command. Which will, in return, allow necessary requests more time to load after document complete.
For instance when you add waitforNoRequest(5000), initially agent waits 5000 ms after doc complete for any network activity. During that period if there is any network activity, then the agent waits another 5000 ms for the next network activity to end and the process goes on until no other request loads within the specified timeframe(5000 ms).
A pause command with 12000 ms, gives exactly 12 seconds to load the page. After 12 seconds the script execution will continue to next command no matter the page is loaded or not.
Since waitForNoRequest has a max time value of 5000 ms, you can tell the agent to wait for a gap of 5 seconds when there is no network activity. In this case, the page did not have any network activity for 3 seconds and hence proceeded to the next action. The page was not loaded completely and the script failed.
I tried to increase the pause to something like 45000 and it would not allow me to pause for that long.
We allow a maximum of 30 seconds pause time hence 45 seconds will not work.
Please reach out to our support team and we’ll be glad to connect you with our scripting SMEs and help you with any scripting needs you might have.

Modem config match

I have two similar modems, when I insert the SIM in the first modem it connects automatically to the network. But if I insert the same SIM in the second modem, it doesn't connect to the network.
I launched the command: AT&V to read the profile of each modem. I compared the settings and they are all the same except for the following:
+CGDCONT: (1,"IP","cmnet","0.0.0.0",0,0)
+CGDCONT: (1,"IP","internet","0.0.0.0",0,0)
----------------------
+CIND: 0,3,1,0,0,0,1,0
+CIND: 0,0,0,0,0,0,0,0
----------------------
+CGATT: 1
+CGATT: 0
----------------------
+COPS: 1,0,""
+COPS: 0,2,""
----------------------
Q1: Could one of these settings cause the problem?
Q2: Is there a way to save/restore a modem config?
NB. The first setting of each pair is of the working modem.
Looks like the APN of the second modem is different from the first one. The APN of second modem is "internet", while the first one is "cmnet". This can cause the problem (first one is attached while the second one did not: +CGATT 1 vs 0), if the network does not support "internet" APN.
You can set same APN for the second modem as the first one to have a try. i.e.
AT+CGDCONT =1,"IP","cmnet"
But, APN difference is only one of the possible reasons. For analyzing the actual reason of attach failure, logs are needed.

ServerXmlHttpRequest hanging sometimes when doing a POST

I have a job that periodically does some work involving ServerXmlHttpRquest to perform an HTTP POST. The job runs every 60 seconds.
And normally it runs without issue. But there's about a 1 in 50,000 chance (every two or three months) that it will hang:
IXMLHttpRequest http = new ServerXmlHttpRequest();
http.open("POST", deleteUrl, false, "", "");
http.send(stuffToDelete); <---hang
When it hangs, not even the Task Scheduler (with the option enabled to kill the job if it takes longer than 3 minutes to run) can end the task. I have to connect to the remote customer's network, get on the server, and use Task Manager to kill the process.
And then its good for another month or three.
Eventually i started using Task Manager to create a process dump,
so i could analyze where the hang is. After five crash dumps (over the last 11 months or so) i get a consistent picture:
ntdll.dll!_NtWaitForMultipleObjects#20()
KERNELBASE.dll!_WaitForMultipleObjectsEx#20()
user32.dll!MsgWaitForMultipleObjectsEx()
user32.dll!_MsgWaitForMultipleObjects#20()
urlmon.dll!CTransaction::CompleteOperation(int fNested) Line 2496
urlmon.dll!CTransaction::StartEx(IUri * pIUri, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4453 C++
urlmon.dll!CTransaction::Start(const wchar_t * pwzURL, IInternetProtocolSink * pOInetProtSink, IInternetBindInfo * pOInetBindInfo, unsigned long grfOptions, unsigned long dwReserved) Line 4515 C++
msxml3.dll!URLMONRequest::send()
msxml3.dll!XMLHttp::send()
Contoso.exe!FrobImporter.TFrobImporter.DeleteFrobs Line 971
Contoso.exe!FrobImporter.TFrobImporter.ImportCore Line 1583
Contoso.exe!FrobImporter.TFrobImporter.RunImport Line 1070
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.HandleFrobImport Line 433
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.CoreExecute Line 71
Contoso.exe!CommandLineProcessor.TCommandLineProcessor.Execute Line 84
Contoso.exe!Contoso.Contoso Line 167
kernel32.dll!#BaseThreadInitThunk#12()
ntdll.dll!__RtlUserThreadStart()
ntdll.dll!__RtlUserThreadStart#8()
So i do a ServerXmlHttpRequest.send, and it never returns. It will sit there for days (causing the system to miss financial transactions, until come Sunday night i get a call that it's broken).
It is of no help unless someone knows how to debug code, but the registers in the stalled thread at the time of the dump are:
EAX 00000030
EBX 00000000
ECX 00000000
EDX 00000000
ESI 002CAC08
EDI 00000001
EIP 732A08A7
ESP 0018F684
EBP 0018F6C8
EFL 00000000
Windows Server 2012 R2
Microsoft IIS/8.5
Default timeouts of ServerXmlHttpRequest
You can use serverXmlHttpRequest.setTimeouts(...) to configure the four classes of timeouts:
resolveTimeout: The value is applied to mapping host names (such as "www.microsoft.com") to IP addresses; the default value is infinite, meaning no timeout.
connectTimeout: A long integer. The value is applied to establishing a communication socket with the target server, with a default timeout value of 60 seconds.
sendTimeout: The value applies to sending an individual packet of request data (if any) on the communication socket to the target server. A large request sent to a server will normally be broken up into multiple packets; the send timeout applies to sending each packet individually. The default value is 30 seconds.
receiveTimeout: The value applies to receiving a packet of response data from the target server. Large responses will be broken up into multiple packets; the receive timeout applies to fetching each packet of data off the socket. The default value is 30 seconds.
The KB305053 (a server that decides to keep the connection open will cause serverXmlHttpRequest to wait for the connection to close) seems like it plausibly could be the issue. But the 30 second default timeout would have taken care of that.
Possible workaround - Add myself to a Job
The Windows Task Scheduler is unable to terminate the task; even though the option is enabled to do do.
I will look into using the Windows Job API to add my self process to a job, and use SetInformationJobObject to set a time limit on my process:
CreateJobObject
AssignProcessToJobObject
SetInformationJobObject
to limit my process to three minutes of execution time:
PerProcessUserTimeLimit
If LimitFlags specifies
JOB_OBJECT_LIMIT_PROCESS_TIME, this member is the per-process
user-mode execution time limit, in 100-nanosecond ticks. Otherwise,
this member is ignored.
The system periodically checks to determine
whether each process associated with the job has accumulated more
user-mode time than the set limit. If it has, the process is
terminated.
If the job is nested, the effective limit is the most
restrictive limit in the job chain.
Although since Task Scheduler uses Job objects to also limit a task's time, i'm not hopeful that the Job Object can limit a job either.
Edit: Job objects cannot limit a process by process time - only user time. And with a process idle waiting for an object, it will not accumulate any user time - certainly not three minutes worth.
Bonus Reading
How can a ServerXMLHTTP GET request hang? (GET, not POST)
KB305053: ServerXMLHTTP Stops Responding When You Send a POST Request (which says the timeout should expire; where mine does not)
MS Forums: oHttp.Send - Hangs (HEAD, not POST)
MS Forums: ASP to test SOAP WebService using MSXML2.ServerXMLHTTP Send hangs
CC to MS Support Forums
Consider switching to a newer, supported API.
msxml6.dll using MSXML2.ServerXMLHTTP.6.0
winhttpcom.dll using WinHttp.WinHttpRequest.5.1.
The msxml3.dll library is no longer supported and is only kept around for compatibility reasons. Plus, there were a number of security and stability improvements included with msxml4.dll (and newer) that you are missing out on.

Mono error when load testing

During load testing (using Load UI) of a new .Net web api using Mono hosted on a medium sized Amazon server I'm receiving the following results (in chronological order over the course of about ten minutes)
5 connections per second for 60 seconds
No errors
50 connections per second for 60 seconds
No errors
100 connections per second for 60 seconds
Received 3 errors, appearing later during the run
2014-02-07 00:12:10Z Error HttpResponseExtensions Error occured while Processing Request: [IOException] Write failure Write failure|The socket has been shut down
2014-02-07 00:12:10Z Info HttpResponseExtensions Failed to write error to response: {0} Cannot be changed after headers are sent.
5 connections per second for 60 seconds
No errors
100 connections per second for 30 seconds
No errors
100 connections per second for 60 seconds
Received 1 error same as above, appearing later during the run
100 connections per second for 45 seconds
No errors
Doing some research on this, this error seems to be a standard one received when a client closed the connection. As this is only occurring during the heavier load tests, I am wondering if it is just getting to the upper limits of what the server instance can support? If not any suggestions on hunting down the source of the errors?

How to get priority of current job?

In beanstalkd
telnet localhost 11300
USING foo
put 0 100 120 5
hello
INSERTED 1
How can I know what is the priority of this job when I reserve it? And can I release it by making the new priority equals to current priority +100?
Beanstalkd doesn't return the priority with the data - but you could easily add it as metadata in your own message body. for example, with Json as a message wrapper:
{'priority':100,'timestamp':1302642381,'job':'download http://example.com/'}
The next message that will be reserved will be the next available entry from the selected tubes, according to priority and time - subject to any delay that you had requested when you originally sent the message to the queue.
Addition: You can get the priority of a beanstalk job (as well as a number of other pieces of information, such as how many times it has previously been reserved), but it's an additional call - to the stats-job command. Called with the jobId, it returns about a dozen different pieces of information. See the protocol document, and your libraries docs.