Detect and Setup alerts on clusters being dropped from envoy - reverse-proxy

We are using envoy as a reverse proxy and have few static/dynamic clusters. I need a way to monitor all the static clusters (all are critical) and create alerts whenever any of them is not reachable. The alert will help team take timely action.
I am new to envoy and exploring its features. It would be helpful if someone can answer/ point me to right resource.
thanks

As far as I know, this is not possible out-of-the-box with Envoy. But you can use something like Prometheus and Alertmanager to monitor and create alerts for your clusters.
If you have admin interface set up (https://www.envoyproxy.io/docs/envoy/v1.21.1/operations/admin), you can query /stats/prometheus to get some metrics.
The following metrics can be interesting in your case :
envoy_cluster_update_failure{envoy_cluster_name="my-cluster"} : increase when the cluster is not reachable
envoy_cluster_update_success{envoy_cluster_name="my-cluster"} : increase when the cluster is reachable
I am not an expert in Prometheus/Alertmanager, but something like :
increase(envoy_cluster_update_failure{envoy_cluster_name="my-cluster"}[1m]) > 0
should trigger alerts when the cluster my-cluster become not reachable.

Related

CoTurn Data Usage Stats on Multi User System

We want to track each users turn usage seperately. I inspected Turn Rest API, AFAIU it is just used to authorize the user which already exists in Coturn db. This is the point I couldn't realize exactly. I can use an ice server list which includes different username-credential peers. But I must have initialized these username-credential peers on coturn db before. Am I right? If I am right, I do not have an idea how to do this. Detection of user's request to use turn from frontend -> Should generate credentials like this CoTURN: How to use TURN REST API? which I am already achieved -> if this is a new user, my backend should go to my EC2 instance and run "turnadmin create user command" somehow -> then I will let the WebRTC connection - > then track the usage of specific user and send it back to my backend somehow.
Is this a true scenario? If not, how should it be? Is there another way to manage multiple users and their data usage? Any help would be appreciated.
AFAIU to get the stats data, we must use redis database. I tried to used it, I displayed the traffic data (with psubscribe turn/realm//user//allocation//traffic ) but the other subscribe events have never worked ( psubscribe turn/realm//user//allocation//traffic/peer or psubscribe turn/realm//user//allocation/*/total_traffic even if the allocation is deleted). So, I tried to get past traffic data from redis db but I couldn't find how. At redis, KEYS * command gives just "status" events.
Even if I get these traffic data, I couldn't realize how to use it with multi users. Currently in our project we have one user(in terms of coturn) and other users are using turn over this user.
BTW we tried to track the usage where we created peer connection object from RTCPeerConnection interface. I realized that incoming bytes are lower than the redis output. So I think there is a loss and I think I should calculate it from turn side.

Is there a way to programmatically get all IgniteQueue & IgniteCache proxies for the caches & queues created on the whole ignite cluster?

I am currently running ignite 2.5 & wondering if there is a way to programmatically get all IgniteQueue & IgniteCache proxies for the caches & queues created on the whole ignite cluster (or their configurations, for caches i think i can get that from IgniteConfiguration if its a configured one or from IgniteCache proxy, can queues be configured aswell? also how do i get their configuration).
I see for example this one, Ignite#cacheNames() which i think will returns all cache names including the ones internally created for the queue? I am going to try it but want to make sure i don't do/use something based not documented/intended for the purpose.
The intention is to recreate the queues/caches programmatically if they are no longer present in the cluster.
Thanks
UPDATE 1:
Thanks #alex-k for confirming there is no public API for queues like caches to get the configuration..it would be nice to have had this support.
You can use Ignite.cacheNames() top get cache names, and Ignite.configuration().getCacheConfiguration() to get the configs.
There are no public APIs to get all queue names.

How to scale out apache atlas

There is no info provided in atlas document on how to scale it.
Apache atlas is connected to cassandra or hbase in the backend which can scale out ,but I dont know how apache atlas engine ( rest web-service and request processor ) can scale out.
I can install multiple instances of it on different machine and have load balancer in front of it to fan out the request. But would this model help ? Does it do any kind of locking and do db transaction, so that this model would not work.
Does someone know how apache atlas scales out ?
Thanks.
So Apache Atlas runs Kafka as the message queue under the covers, and in my experience, the way they have designed the Kafka queue (consumer group that says you should ONLY have ONE consumer) is the choke point.
Not only that, when you look at the code, the consumer has a poll time for the broker of 1 sec hard coded into the consumer. Put these two together, and that means that if the consumer can't process the messages from the various producers (HIVE, Spark, etc) within that second, the broker then disengages the ONLY consumer, and waits for a non-existent consumer to pick up messages...
I need to design something similar, but this is as far as I have got...
Hope that helps somewhat...
Please refer to this page. http://atlas.apache.org/#/HighAvailability
Atlas does not support actual horizontal scale-out.
All the requests are handled by the 'Active instance'. the 'Passive instances' just forward all the requests to the 'Active instance'.

Is it possible to read the health check state out of Traefik?

Is it possible to read out or receive webhook events when the health state of a backend changes?
We would like to post the events to an emergency slack channel so that we can remediate the situation whenever a backend is unhealthy. We could set a different monitoring solution, such as Rancher's health checks or Grafana alerts. It seems it would be less trouble and more reliable to obtain our operations alert from our Traefik instance directly.
That is not feasible right now to my knowledge.
The closest you can get to is to use a metrics system like Prometheus and set up a proper alert for too many failing health checks.

Active MQ get count number of messages consumed/produced per second

Is there any way in activemq with which we can get count number of messages
consumed/produced per second/minute at the broker end?
I have tried JMeter configuration using http://activemq.apache.org/jmeter-performance-tests.html but there is hardly any performance matrix I can gather.
thanks
If you wanted to write this yourself then you should use JMX on your broker. The Broker MBean has "TotalEnqueueCount" and "TotalDequeCount" attributes. You can poll at specific intervals for those values and calculate yourself how many messages a second/minute/hour that your broker is being produced to or consumed from.
You'll need to make sure you have JMX setup on the broker side, of course. See here for more details on that: http://activemq.apache.org/jmx.html
to simply view total enqueue/dequeue stats, use jconsole or the web console
if you need to process it further (to calculate rates, etc), then you should do one of the following:
access stats programmatically using Java JMX APIs and gather/process over time
use a third party tool for monitoring (Cacti and Splunk can also help with this)
another option is to use Camel Dataset to simulate data routing and gather stats