I'm trying to configure Nagios service notifications this way:
Send email notification after being for 30 minutes in the warning state
Send email notification after being for 15 minutes in the critical state
I can't get it working at all. I can' manage the situation when there is a warning for a short time followed by the critical (or vice versa). Nagios seems to combine time spent in these states together, which is not what I want.
Do you have some ideas how to get it working as expected?
You can do two things depending on what you want, but be aware that nagios is not timebased, but number of checks based.
Use max_check_attempts and check_interval:
You can set the maximum amount of checks before it sends an alerts (based on the check interval). example: check every 5 minutes; max_check_attempts: 4; on the 15th minute it will send an alert if after 4 checks in a row it is in critical or warning.
use escalation
You can use nagios escalation, this is defined in their documentation here:
http://nagios.sourceforge.net/docs/nagioscore/4/en/objectdefinitions.html#serviceescalation
What you can do it set a certain check period and say that between check 1-5 it is a warning and between check 6-10 it is a critical if it is not fixed in the mean time.
Related
I have an alarm set on Lambda iterator age, when it crosses X over 5 minutes, It goes to alarm and I send e-mail to a certain group.
Problem I have is - how to, or where to - setup notification for when alarm is resolved.
We have occasional blips which last only a few minutes more than the alarm itself (e.g. 6-7 minutes) and yes, I could extend and not trigger the alarm, but I'd love to get "Alarm is now resolved" e-mail rather than having people dropping everything and jumping on the problem.
I don't see that option, I tried to copy the same alarm and set it "inverted"
But now this guy is "always on" end "Red" which is not what I want.
And I have the proper Alarm that is currently "not in alarm" and which works as Intended:
So, what are my options here? Do I need composite alarm somehow?
If anyone arrives with the same problem. In late 2021.
AWS UI is not the greatest, so my initial thought was that "Add notification" means, add existing configuration. Sort of "apply".
Turns out, that's the place where you need to add another notification, or multiple ones, e.g. one for OK, one for Insufficient data behaviour.
Just in case anyone ends up with the same problem.
Background
I have an application that send HTTP request to foreign servers. The application communicating with other services with strict rate limit policy. For example, 5 calls per second. Any call above the allowed rate will get 429 error code.
The application is deployed in the cloud and run by multiple instances. The tasks are coming from shared queue.
The allowed rate limit synced by Redis Rate Limit pattern.
My current implementation
Assuming that the rate limit is 5 per second: I split the time into multiple "window". Each window has maximum rate of 5. Before each call I checking if the counter is less then 5. If yes, fire the request. If no, wait for the next window (after a second).
The problem
In order to sync the application around the Redis, I need to Redis calls: INCR and EXPR. Let's say that each call can take around 250ms to be returned. So we have checking time of ~500ms. Having said that, in some cases you will check for old window because until you will get the answer the current second has been changed. In case that on the next second we will have another 5 quick calls - it will lead to 429 from the server.
Question
As you can see, this pattern not really ensuring that the rate of my application will be up to 5 calls\second.
How do you recommend to do it right?
So I'm trying to understand what practical problems Queues solve. By reading all the information from Google, I get the high-level.
Push message to Queue for processing at a later time
So I'm looking at an architecture from Company A and they have different use cases for Job Queueing like for example
chat messages
file conversion
searching
Heavy sql queries
Why process it at a later time?
Here's my best guess...
Let's say I have an application that can process 10 "things" at a time.
My application then maxes out it's processing capacity.
an 11th request came in so app puts it in the Queue for later processing
Assuming this is a valid Use Case, wouldn't adding more servers to process more "things" make sense? Is it because it's more costly to add more servers than employ a Queue and sacrifice response time a little bit?
Given my Use Case examples, what other problems would Queues solve for them?
Have you ever lined up at a bank when it is busy? You would have waited in a queue.
"But," you could say, "wouldn't adding more staff to process more customers make sense? Is it because it's more costly to add more staff than employ a Queue and sacrifice response time a little bit?"
That would be correct. It can be quite costly to staff a bank based on the peak number of customers who would arrive each day. It is cheaper to staff below this level and have some customers wait in a queue.
Also, the number of customers each day are not 100% predictable. A queue allows excess demand to wait without breaking the system.
Queues enable decoupling.
For example, imagine an online store where customers purchase an item. They select the item, provide a credit card number and click 'Purchase'. If the credit card is declined, the online store can immediately prompt them to re-enter the number. This interaction has to take place immediately while the customer is still online.
However, there is no need to have the customer wait while an invoice is generated, a record is added to the accounting system and inventory is pulled off the shelf. This can be decoupled from the ordering process. A good way to do it is to push the order into a queue, which can be handled by the next system.
If that 'next system' happens to be offline at the moment, there is no reason to cancel the whole sale. The transaction can be processed when the 'next system' comes back online. This is much better than failing the whole process just because one component (which is not required immediately) has a failure.
Bottom line: Queues are excellent. They enable better handling of failures. They makes things more resilient (just wait a few minutes and try again!). They should be used at all times when the process is compatible with a queuing architecture.
Let's do scenarios
Scenario 1 without queue:
you request an endpoint /blabla/do-eveything/
this request do
download an image from very slow FTP
e.g 1.5 sec (can error, retry ? add +X sec)
attach the image to an email
send an email (3 sec)
e.g 1 sec (can error, retry ? add +X sec)
confirmation received > store confirmation to a third company tracking stuff
e.g 1.5 (can error, retry ? add +X sec)
when tracking confirm, update your data from another third company for big data purpose
e.g 2 sec (can error, retry ? add +X sec)
... you get the idead
return the response e.g 11 sec later (this is to slow) or more or timeout when everything failed
End user said internet was faster 20 years ago, maybe I need to change my internet connection or change my 16 threads
Scenario 2 queue everything you can:
you request an endpoint /blabla/do-eveything/
this request do
Queue job "DO_EVERYTHING"
e.g 0.02 sec
Return the response less then 0.250 sec
End user said that is website/app is too fast, I can keep my 56K internet connection
on queue/event system one failed job can be retry later without affeting the end user
you can pause job, add a unlimited number a task/step after the original message
better fault tolerance
Working with queue will allow you a better micro/nano service architecture, better testing because, you can test a single job, intead of a full controller that do everything...
Ye, is maybe more work, more thinking, but a the end no need to think about the work when holidays
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How can I locally detect iPhone clock advancement by a user between app runs?
Is there a way to determine the actual time and date in iOS (not the time of the device)
Is there a clock in iOS that can be used that cannot be changed by the user
Brief
I am working with an auto-renewable subscription-based app. When the app receives the latest receipt from Apple, it stores the expires_date_ms key in NSUserDefaults. Thirty days after that date, the app checks with Apple to see if the subscription is still active. The app can be considered an offline app, but it must connect to the internet once every 30 days in order to check the subscription status. This time comparison will be used to tell the user he/she must connect.
Problem
I am using the code below to compare the current time with the expires_date_ms:
NSTimeInterval expDateMS = [[productInfo objectForKey:#"expires_date_ms"] doubleValue];
NSTimeInterval currentDateMS = ([[NSDate date] timeIntervalSince1970] * 1000);
if (currentDateMS > expDateMS)
subExpired = YES;
This is fine and works well, but from what I can tell there's a loophole that can be exploited - if the user sets the device's clock back a hour/month/decade, the time comparison will become unreliable because [NSDate date] uses the device's current time (please correct me if I'm wrong).
Question
Is there any way of retrieving a device-independent time in milliseconds? One that can be accurately and reliably measured with no regards to the device clock?
While Kevin and H2CO3 are completely correct, there are other solutions for the purposes of checking a subscription (which I would hope does not need millisecond accuracy....)
First watch UIApplicationSignificantTimeChangeNotification so that you get notifications of when the time changes suddenly. This will even be delivered to you if you were suspended (though I don't believe you will receive it if you were terminated). This gets called when there is a carrier time update, and I believe it is called when there is manual time update (check). It also is called at local midnight and at DST changes. The point is that it's called pretty often when the time suddenly changes.
Keep track of what time it was when you go into the background. Keep track of what time it is when you come back into the foreground. If time moves radically backwards (more than a day or two), kindly suggest that you would like access to the network to check things. Whenever you check-in with your server, it should tell you what time it thinks it is. You can use that to synchronize the system.
You can similarly keep track of your actual runtime. If it gets wildly out of sync with apparent runtime, then again, request access to the network to sync things up.
I'm certain that attackers would be able to sneak 35 days or whatever out of this system rather than 30, but anyone willing to work that hard will just crack your software and take the check out entirely. The focus here is the uncommitted attacker who is just messing with their clock. And that you can catch pretty well.
You should test this carefully, and be very hesitant to accuse the user of anything. Just connecting to your server should always be enough to get a legitimate user working again.
You need to connect to/retrieve information from a reliable, official time server and use that time data in your app. For example, here's a world time server with an easy-to use API
Here are three options I can think of:
clock_gettime(CLOCK_MONOTONIC) gets the current system uptime. This is relatively unreliable because if the user reboots, this is reset. You could save the last value used and at launch use the last saved value as an offset, but the problem with this is that the time that the device was shut off for won't be calculated.
mach_absolute_time() counts the number of CPU ticks since the last reboot. It can be fetched easily through CACurrentMediaTime. Note that this can be reset simply by rebooting the device, so if changing the time is very important, I'm not so sure if you would go this way.
Network Time Protocol (NTP) is a networking protocol for synchronizing the clocks of computer systems. In practice, all NTP is is querying a time server. An iOS library for NTP can be found here.
So the first two methods do not require connectivity, while the third does. However, the third method is the only foolproof one.
There is no such thing as a non-mutable device clock that persists across reboots. The only way to get a trustworthy time is to contact a remote server that you trust and ask what its time is.
I am creating a BREW app that requests the user's position.
If the phone cannot acquire the position, I would like to display an error.
How long should I wait for my callback to be called before I determine that the phone is not likely to get a GPS fix?
When a cold start is required, the receiver has to download a full set of Ephemeris data, which is broadcast from the GPS satellite over a 30 second cycle and re-transmitted every 30 seconds.
So I would say that 60-90 seconds (two or three Ephemeris cycles) would be a suitable time to wait before declaring failure.
http://www.navigadget.com/index.php/gps-knowledge/ttff-time-to-first-fix
Note that if a device requires an almanac download, the startup time can be much longer (on the order of 12.5 to 15 minutes). This is referred to as a Factory TTFF (Time to First Fix).
I might go with an increment (say 20 or 30 seconds) between notifying the user that you have failed to establish a link, and give them the option to stop trying. Keep at it until they stop you, or a set number of iterations have passes (say 5 - 10 iterations).
45-90 seconds.
For more information, see the GPS Time To First Fix article at Wikipedia.
But you can never know when the user actually has view to the satellites or not, maybe they are still inside when they start your program, so the approach suggested by Matthew Vines is much better than a constant delay.
Cellphone-specifically, I've had a Motorola phone that had a GPS receiver, but was horrendously bad at it - it could take it around 5 minutes to get a fix where my standalone Bluetooth receiver would manage in less than a minute.
Why are you declaring failure after a fixed timeout anyway? Why not, after a reasonable time has passed (say, a minute), display a message to the tune of "GPS fix still not available; but I'm still trying" with a possibility to cancel at any time if the user is fed up? What do you expect the user to do with the failure message you're proposing to give him?