(Gitops Argocd Notification) I want to receive an alarm by detecting only the changes in github What should I do? - amazon-eks

I use argocd notification
I'm getting an alarm on Slack.
By the way, the problem is as follows.
New Deploy alarm occurs when github build & manifest update occurs
Alarm occurs when pod change occurs due to scale-in/out
Alarm occurs when pod is regenerated for some reason
I want to receive an alarm by detecting only the changes in github
What should I do?
Trigger in use
triggers:
trigger.on-deployed: |
- description: Application is synced and healthy. Triggered once per commit.
oncePer: app.status.syncResult.revision
send:
- app-deployed
when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status == 'Healthy'

Related

How can I be alerted if a Fargate RunTask triggered by EventBridge fails

We have very bursty load and use EventBridge to trigger tasks. Sometimes this fails silently. There is no failed invocations in the EventBridge rule. CloudTrail shows RunTask is executed. There is no corresponding CreateLogStream (or for that matter log in CloudWatch) so it seems like starting the task fails. I don't see any error anywhere.
In the AWS console, I see that the tasks has stopped reason "ResourceInitializationError: failed to configure ENI: failed to setup regular eni: netplugin failed with no error message".
Is there a way to detect this or automatically retry?

S3 - Kubernetes probe

I have the following situation:
Application uses S3 to store data in Amazon. Application is deployed as a pod in kubernetes. Sometimes some of developers messes with access data for S3 (eg. user/password) and application fails to connect to S3 - but pod starts normally and kills previous pod version that worked OK (since all readiness and aliveness probes are OK). I thought of adding S3 probe to readiness - in order to execute HeadBucketRequest on S3 and if this one succeeds it is able to connect to S3. The problem here is that these requests cost money, and I really need them only on start of the pod.
Are there any best-practices related to this one?
If you (quote) "... really need them [the probes] only on start of the pod" then look into adding a startup probe.
In addition to what startup probes help with - pods that take longer time to start - a startup probe will make it possible to verify a condition only at pod startup time.
Readiness and liveness prove as for checking the health of POD or container while running. You scenario is quite wired but with Readiness & liveness probe it wont work as it fire on internal and which cost money.
in this case you might can use the lifecycle hook :
containers:
- image: MAGE_NAME
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "script.sh"]
which will run the hook at starting of the container you can keep shell file inside the POD or image.
inside shell file you can right logic if 200 response move a head and container get started.
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/

How to send notification e-mail when release doesn't succeed in Azure DevOps?

I try to send notification from Azure DevOps when a test fails in the release pipeline. If the test fails then the release pipeline has Partialy succeeded status.
I can't find option to notify when this pipeline fails in Azure.
Question: How to send notification e-mail when release doesn't succeed in Azure DevOps?
Create a new release notification subscription for "A deployment is completed". Add a new filter clause such that:
Deployment Status = Partially succeeded
or Deployment Status = Failed
If you want to treat a test failure as a failure and not a partial success, you will likely need to un-check the "Continue on error" option under "Control Options" of the test task in your release pipeline.

Why is "await Publish<T>" hanging / not completing / not finishing

The following piece of code has been working for some time and it has suddenly stopped returning:
await availableChangedPublishEndpoint
.Publish<IAvailableStockChanged>(
AvailableStockCounter.ConvertSkuQtyToAvailableStockChangedEvent(
newAvailable,
absMessage.Warehouse)
);
There is nothing clever in ConvertSkuQtyToAvailableStockChangedEvent - it just maps one simple class to another.
We added logs before and after this code and it's definitely just stopping at this point. Other systems are publishing fine, other messages are being sent from this application (for e.g. logs are actually sent via RabbitMQ). We have redeployed and we have upgraded to latest MassTransit version. We are seeing that the messages are being published - possibly multiple times, but this Publish method never returns.
We had a broken RabbitMQ node and a clean service restart on one node fixed it. I appreciate there might be other reasons for this behaviour, but this was our problem.
systemctl restart rabbitmq-server
Looking further into RabbitMQ we saw that some of the empty queues that were connected to this exchange were not synchronized (see below) and when we tried to synchronize them that wouldn't work.
We also couldn't delete some of these unsynchronized queues.
We believe an unexpected shutdown of one of the nodes had caused this problem - but it left most queues / exchanges completely OK.

NServiceBus - Service Pulse - Can't connect to ServiceControl (http://localhost:33333/api/)

I am using NServiceBus 5. My messages are sending / receiving correctly, but I'm having trouble with Service Pulse. I have configured the auditing using the default endpoint names.
When I navigate to Service Pulse (http://localhost:9090/) I get the following error.
Can't connect to ServiceControl (http://localhost:33333/api/)
Looking at my services I see that Particular ServiceControl is not started. When I attempt to start it, it starts and immediately stops.
I have checked the logs at:
%LOCALAPPDATA%\Particular\ServiceControl\logs
and
%WINDIR%\System32\config\systemprofile\AppData\Local\Particular\ServiceControl\logs
But apart from the errors about the missing queues from yesterday (see below) - nothing. When I attempt to restart the service now I get no errors.
Anyone know what I should do to get Service Pulse working correctly?
I deleted all my private queues yesterday thinking that they would be recreated automatically. Now I realise only the endpoint ones are recreated, I have recreated some manually.
Right now along with my endpoint queues I have:
audit
auditqueue
error
error.log
particular.servicecontrol
particular.servicecontrol.errors
particular.servicecontrol.retries
particular.servicecontrol.timeouts
particular.servicecontrol.timeoutsdispatcher
--- EDIT ---
Ended up just uninstalling and reinstalling - fixed the problem.
ServiceControl --uninstall
ServiceControl --install
Try and run ServiceControl --install in an admin Console and it will create the queues (C:\Program Files (x86)\Particular Software\ServiceControl> .\ServiceControl.exe --install)
If not you need to add these queues manually or reinstall ServiceControl:
particular.servicecontrol
particular.servicecontrol.errors
particular.servicecontrol.timeouts
particular.servicecontrol.timeoutsdispatcher