Is it possible to read out or receive webhook events when the health state of a backend changes?
We would like to post the events to an emergency slack channel so that we can remediate the situation whenever a backend is unhealthy. We could set a different monitoring solution, such as Rancher's health checks or Grafana alerts. It seems it would be less trouble and more reliable to obtain our operations alert from our Traefik instance directly.
That is not feasible right now to my knowledge.
The closest you can get to is to use a metrics system like Prometheus and set up a proper alert for too many failing health checks.
Related
I have a working monolith application (deployed in a container), for which I want to add notifications feature as a separate microservice.
I'm planning for the monolith to emit events to a message bus (RabbitMQ) where they will be received by the new service, which will send the notification to user. In order to compose a notification, it will need other information about the user from the monolit, so it will call monolith's REST API in order to obtain it.
The problem is, that access to the monolith's API requires authentication in form of a token. I was thinking of:
using the secret from the monolith to issue a never-expiring token - I don't think this is a great idea from the security perspective, and also I know that sometimes the keys rotate in which case the token would became invalid eventually anyway
using the message bus to retrieve the information - this does not seem a good idea either as the asynchrony would make it very complicated
providing all the info the notification service needs in the event - this would make them more coupled together, and moreover, I plan to also send notifications based on the state on the monolith not triggered by an event
removing the authentication from the monolith and implementing it differently (not sure how yet)
My question is, what are some of the good ways this kind of problem can be solved, and also, having just started learning about microservices, is what I am trying to do right in the first place?
When dealing with internal security you should always consider the deployment and how the APIs are exposed to the outside world, an API gateway might be used to simply make it impossible to access internal APIs. In that case, a fixed token might be good enough to ensure that the client is authorized.
In general, though, I would suggest looking into OAuth2 or a JWT-based solution as it helps to validate the identities of the calling system as well as their access grants.
As for your architecture doubts, you need to consider the following scenarios when building out the solution:
The remote call can fail, at any time for unknown reasons, as such you shouldn't acknowledge the notification event until you're certain that the notification has been processed successfully.
As you've mentioned RabbitMQ, you should aim to keep the notification queue as small as possible, to that effect, a cache that contains the user details might help speed things along (and help you reduce the chance of failure due to the external system not being available).
If your application sends a lot of notifications to potentially millions of different users, you could consider having a read-only database replica of the users which is accessible to the notification service, and directly read from the database cluster in batches. This reduces the load on the monolith and shift it to the database layer
Hello I am trying to create a simple push-notification system similar to this common use case:
1. The user gets a chest and can either watch an ad to skip the wait time or wait one hours for the chest to open. The app sends an upstream request which sets up a downstream push notification that shall be delivered in one hour to let the user know the chest is ready.
2a. The user then waits an hour, gets a push notification (outside of the app) to open their chest and they do!
or
2b. They wait 20 minutes then decide to watch the ad. The app sends an upstream request which cancels the pending push notification which would have otherwise been delivered in 40 minutes.
Okay awesome so that is the problem and I am having a hard time understanding how to do this. I have looked over the documentation for each of these programs but they seem designed for downstream push notifications. It just seems odd there is no built-in support for this use case. It seems like such a common use case.
I so far found 3 solutions that will integrate into my cross-platform Unity setup and provide services for free or super-cheap:
Amazon Simple Notification Service (SNS)
Google Firebase Cloud Messaging (FCM)
OneSignal
Amazon seems to group clients into "Topics" so I guess I would be setting up a one-device-topic and essentially. I can subscribe and unsubscribe from them but it doesn't seem to support a topic with a 60 minute delay.
2a. Create a topic: https://docs.aws.amazon.com/sns/latest/dg/sns-tutorial-create-topic.html (it would just include the current device)
2b. Subscribe to it
2c. Send a message to it https://docs.aws.amazon.com/sns/latest/dg/sns-tutorial-publish-message-with-attributes.html
So basically I can add attributes to my message but it would seem I need to implement the server-side code to read a delay attribute then somehow queue a message for delay. Maybe I am missing something?
For Firebase I pretty much see the same thing as Amazon. There are topics https://firebase.google.com/docs/cloud-messaging/android/topic-messaging and a means to send upstream messages https://firebase.google.com/docs/cloud-messaging/android/send-with-console but with the messages I don't see anyway here to get the time delay https://firebase.google.com/docs/cloud-messaging/unity/topic-messaging I see conditions towards the bottom of that article but I don't know if it is meant for this use case.
OneSignal has the easiest to scroll-through API. I'll refer to some strings that you can CTRL-F by using the format ("Create Notif") because everything is on this one page: https://documentation.onesignal.com/reference
So basically I can ("Send to Specific Devices") which I guess would be the sending device, then I can ("Schedule notification for future delivery.") using the send_after parameter. And finally, if need be, I can ("Cancel notification"). So this appears to be everything I need. I'm currently looking at this option and trying to figure out how to actually get this working.
So there is my progress over the last few hours researching each of these options. I am hoping you can help me better understand how I may be misunderstanding the above options as this seems to me a very common use-case. Perhaps I am just not googling the question correctly. Any help appreciated.
Whenever there's a likelihood that you'll need to cancel a significant percent of the notifications you send, you should use local notifications. That way you can easily schedule and cancel them locally without making any network requests. Also, this solution works for offline devices which is great for games (played on planes, etc...)
If API gateway fails (single entry point to the system), then unable to access all the services. Any HA(High Availability) design to handle API gateway failure?
1) As per your project location, you can choose one more region as your disaster recovery plan. When ever something fails in one region then immediately you can switch to another region by just changing the end point.
2) You can use services like route53 to divide your traffic between two regions or two api gateways. That way you will save atleast part of your traffic flowing even if one apig fails.
3) Always keep cloudwatch alarms to get notification about any failures in your system.
4) It is very unlikely that a api gateway will fail. It is AWS my friend.
"node_saini" has a great response and it's correct. I tried to comment but don't have the reputation to do so yet... the comment would say:
5) Configure your timeout to fail ASAP based on baselines and implement retries with exponential backoff on 5xx errors to alleviate any small percentage of failures which may occur.
With all applications, temporary failures are expected but permanent failures after retry can be a sign of a real problem brewing.
I have a service that handles messages that persists data to an external system. If (a.k.a. when) the writing of this data to the external system fails, or normal monitoring strategy will alert system admins of the failure.
I would like to also notify the user who submitted the message that there is a delay in processing their request.
Where/How is the best way to accomplish this scenario? I've looked into the IManageMessageFailures, but it seems that will bypass the SLR functionality.
Starting with NServiceBus version 5.1 now has the ability to use Reactive Extensions to observe when a message is sent to an error queue. From there, you can log, email, or whatever best meets your needs.
http://docs.particular.net/nservicebus/subscribing-to-push-based-error-notifications
Why don't you try and separate the two concerns?
Manage the 3rd party interaction in a saga, and if it fails, send a failure notification message (you can use timeout to cater for no proper reply).
We're trying to track down some performance issues with an application and would like to figure out how many outgoing requests are being triggered by each page load. I can find lots of counters for showing incoming WCF connections, but nothing tracking how many are going out. Any ideas short of retrofitting all of the pages with custom counters?
You could set up tracing and use the Trace View Tool or you could go lower level and use Wireshark to monitor the outgoing network activity.