AWS documentation states cloudwatch shares metrics every one minute, is it possible to get the metrics checked every 10 secs or less than a minute? If an instance goes down and I have to wait a full 1 minute to know that it is down? To spin up a new one in its place?
I presume that you are referring to Amazon EC2 metrics that are collected by Amazon CloudWatch.
No, you cannot configure these metrics to be collected more often. By default, Amazon EC2 metrics are collected every five minutes. You can activate detailed monitoring to obtain the metrics every one minute.
However, Elastic Load Balancing health checks can check the health of an instance more often, and it will only send traffic to instances that are responding correctly to health checks.
Amazon EC2 Auto Scaling can be configured to use Elastic Load Balancing health checks to determine the health of instances. If an instance is identified as unhealthy, Auto Scaling will automatically replace the instance. However, this can take several minutes to be identified and have a new instance operational. Thus, it is recommended to always be running a minimum of two instances.
Related
Need to know if there are alert metrics in CloudWatch for RDS Performance insights.
ie. Trigger and Alarm, whenever there is => high load, waits in SQL Server?
You may need to read Overview of Monitoring Amazon RDS
Amazon RDS automatically sends metrics to CloudWatch every minute for
each active database. You are not charged additionally for Amazon RDS
metrics in CloudWatch.
You can watch a single Amazon RDS metric over a specific time period,
and perform one or more actions based on the value of the metric
relative to a threshold you set
You can create an alarm in RDS console and select the metric that is of your interest. Here is a snapshot to display that:
Amazon RDS Performance Insights recently released a feature that sends key performance metrics from Performance Insights to Amazon CloudWatch. Using this feature, you can set alerts on these metrics.
When Performance Insights is enabled, it automatically sends the following three metrics to CloudWatch:
DBLoad
DBLoadCPU
DBLoadNonCPU
https://aws.amazon.com/blogs/database/set-alarms-on-performance-insights-metrics-using-amazon-cloudwatch/
I want to build a service running on AWS that would fetch metrics from another service A, also running on AWS, do some processing, and then post it to a different service B for computing overall resource usage running on a different public cloud. The APIs for the existing services A and B are already defined, and are beyond my control.
My principal concern is that the volume of data I will fetch and post may be high, and I may have to do some computation on this data before posting the results. If the service is to run on a fixed periodic schedule, and I need to make it resilient, how should it be deployed?
EC2 VM.
Lambda.
Additionally:
How do I make the service resilient / highly available?
How do I scale it with higher data volumes? One thought is to partition the keyspace of the data based on tenant, etc. and perform the computation in concurrent compute instances that are independent, non-overlapping.
If I store the data in transit for intermediate processing, how can I make the data in transit resilient?
These questions are from an AWS infrastructure perspective because I have very little prior knowledge of AWS.
Sample numbers
Input data from service A: 10000 records per minute, each record about 1 KiB. So 10 MiB per minute.
Processing latency in the service - maximum 0.1 second per record.
Data posted to service B: About 2 MiB per minute. The connections are over WAN.
I am planing to migrate to gcp and my front needs some level of reliability, but I am also cost constrained and cannot offer to double my instances.
Is it possible to have one running instance with a load balancer which test its health, and in case of failure would awake a sleeping copy of this instance ?
That would provide a kind of automatic fail-over with an interruption of service of 1 to 2 minutes, which is acceptable for my business.
Sounds like you want a Managed Instance Group with autoscaling (minNumReplicas=1, maxNumReplicas=1) and autohealing. Managed Instance Groups can automatically identify and recreate unhealthy instances.
You can apply HTTP health checks to Managed Instance Groups to monitor and verify that servers are running properly on the instances in that group. If a health check determines that a service has failed on an instance, the group automatically recreates that instance.
We're thinking about moving to the Elastic Load Balancer on Amazon. However, it turns out that since we use more than one domain name, we would have to rename some of our applications to limit to a single ELB. Another issue is we currently use free level one certificates, whereas moving to ELB would require moving up to level 2, although that's not a huge deal. Another issue is we don't have a lot of volume at this point, and don't really have a need for load-balancing in terms of traffic alleviation. Also, in the case of a failure of an amazon instance, which seems to be quite rare (have not experienced in several years), we can quickly be up and running by creating another instance and restoring.
Otoh, according to all I read about it, people are generally happy and recommend it, due to ease of setup and the value it brings.
Given the above, is it worth it?
since we use more than one domain name, we would have to rename some of our applications to limit to a single ELB
What makes you say this? There's nothing preventing you from launching multiple ELB's if you really want to. And if your application already manages multiple domains properly then there's no reason a single ELB can't handle that either. We currently have one ELB fronting an application on a bunch of EC2 instances that 11 different domains all point to.
Another issue is we currently use free level one certificates, whereas moving to ELB would require moving up to level 2, although that's not a huge deal.
Not sure what you mean by "level one" and "level 2". If you're using a self-signed SSL certificate then you'll need to switch to using certificate signed by a third party Certificate Authority, which will indeed cost you some money. Amazon supports all manner of certificates, including simple certs, EV certs, SAN certs, etc. You'll find more information on ELB and SSL certs in the AWS documentation.
Also, in the case of a failure of an amazon instance, which seems to be quite rare (have not experienced in several years), we can quickly be up and running by creating another instance and restoring.
Consider yourself lucky. We've had Amazon instances fail from time to time, and we also regularly get notifications from Amazon that instances need to be rebooted in order to migrate them off of faulty/old hardware.
If you really don't care about being down for a while and feel like you don't need the capacity that a load balancer and multiple appservers provides then there's no reason for you to move to using an ELB. However if you want the reliability of multiple appservers then moving to an ELB is indeed a good idea.
And if you anticipate your traffic level growing then you might want to consider using Amazon's Auto Scaling tools. Using Auto Scaling you basically tell Amazon the minimum number of application servers you want running behind an ELB, and some parameters to indicate when they should automatically launch additional instances if/when load increases.
Our Amazon account rep actually recommended to us that if we had even a single instance that we wanted to minimize downtime of (like a monitoring server, etc) that we should create an Auto Scaling group with a limit of exactly 1 instance in it. That way if the instance ever does die for any reason whatsoever, Amazon will automatically spin up a new replacement instance.
Agree with Bruce, just wanted to add my 5 cents about Auto Scaling(ASG) and " Amazon will automatically spin up a new replacement instance.".
This is really cool way to get robust hosting solution, but will need some challenge to create CloudFormation template and bash auto install script that will be called from CloudFormation template to install all server software and deploy your app code.
So if you will have 2 instances and ASG with Min/Max = 2, then if some instance will be crashed, ASG will recreate it automaticly with all software installed and code deployed and ready to go
Also if you need to handle some periodic traffic jumps automaticly, then you can change the ASG as (Min=2, Max=5), create 2 CloudWatch alarms:
1. if cpu usage is 90+ for 5 or 10 mins
2. if cpu usage is 30- for 5 or 10 mins
Then assign Alarm 1 to scale up 1 additional instance and assign alarm 2 to destroy any additional instance created by 1
I'm thinking of using Google Compute Engine to run a LOT of instances in a target pool behind a network load balancer. Each of those instances will end up real-time processing many large data streams, so at full scale and peak times there might be multiple Terabytes per second go through.
Question:
Is there a quota or limit to the data you can push through those load balancers? Is there a limit of instances you can have in a target pool? (the documentation does not seem to specify this)
It seems like load balancers have a dedicated IP (means it's a single machine?)
There's no limit on the amount of data that you can push through a LB. As for instances, there are default limits on CPUs, persistent or SSD disks, and you can see those quotas in the Developers Console at 'Compute' > 'Compute Engine'> 'Quotas', however you can always request increase quota at this link. You can have as many instances that you need in a target pool. Take a look to the Compute Engine Autoscaler that will help you to spin up machines as your service needs. The single IP provided for your LB is in charge of distributing incoming traffic across your multiple instances.