How to check what is causing the high CPU usage on EC2 instance

How to check what is causing the high CPU usage on EC2 instance - apache

We have EC2 instance of type m5.xlarge which is having 4 CPUs but the CPU usage is 100% on weekend when the DB connections are normal.
How to debug what is causing the high CPU usage. we checked the cron running on server as well but everything is normal.
This also causes the increase in DB connections on RDS when the actual site users are less.
Please help finding the solution.

Related

Redis Server is down twice a day why?

I install Redis Server in Ec2 instance and type is STANDALONE. Storage of my redis is more than 6Million Keys.After crossing this storage server is down frequently twice a day why how to overcome this Issue ?
Thank you in advance for your help!

it's quite possible that you are running out of memory and the OOM is killing the process.
Try setting the memory limit
config set maxmemory <80% of your instance memory size>
and check your eviction policy so you know how to handle memory being full

How to debug aws fargate task running out of memory?

I'm running a task at fargate with CPU as 2048 and memory as 8192. Task after running some time is stopped with error
container was stopped as it ran out of memory.
Thing is that task does not fails every time. If I run the same task 10 time it fails 5 times and works 5 times. However If I take an ec2 machine with 2 vcpu and 4GB memory and try to run the same container it runs successfully.(Infact the memory usage on ec2 instance is very low).
Can somebody please guide me how to figure out the memory issue while running a fargate task?
Thanks

The way to start would be enabling memory metrics from container insights for your fargate tasks and Further correlating the Memory Usage graph with Application logs. help here
The difference between running on EC2 vs Fargate could probably be due to the fact that when you run a container on ECS Fargate, it runs on AWS's internal EC2 Instances. Now, here could possibly arise a Noisy Neighbour Situation although the chances would be pretty low.

spring-data-redis cluster recovery issue

We're running a 7-node redis cluster, with all nodes as masters (no slave replication). We're using this as an in-memory cache, so we've commented out all saves in redis.conf, and we've got the following other non-defaults in redis.conf:
maxmemory: 30gb
maxmemory-policy allkeys-lru
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-require-full-coverage no
The client for this cluster is a spring-boot rest api application, using spring-data-redis with jedis as the driver. We mainly use the spring caching annotations.
We had an issue the other day where one of the masters went down for a while. With a single master down in a 7-node cluster we noted a marked increase in the average response time for api calls involving redis, which I would expect.
When the down master was brought back online and re-joined the cluster, we had a massive spike in response time. Via newrelic I can see that the app started making a ton of redis cluster calls (newrelic doesn't tell me which cluster subcommand was being used). Our normal avg response time is around 5ms; during this time it went up to 800ms and we had a few slow sample transactions that took > 70sec. On all app jvms I see the number of active threads jump from a normal 8-9 up to around 300 during this time. We have configured the tomcat http thread pool to allow 400 threads max. After about 3 minutes, the problem cleared itself up, but I now have people questioning the stability of the caching solution we chose. Newrelic doesn't give any insight into where the additional time on the long requests is being spent (it's apparently in an area that Newrelic doesn't instrument).
I've made some attempt to reproduce by running some jmeter load tests against a development environment, and while I see some moderate response time spikes when re-attaching a redis-cluster master, I don't see anything near what we saw in production. I've also run across https://github.com/xetorthio/jedis/issues/1108, but I'm not gaining any useful insight from that. I tried reducing spring.redis.cluster.max-redirects from the default 5 to 0, which didn't seem to have much effect on my load test results. I'm also not sure how appropriate a change that is for my use case.

Suddenly bad disk I/O utilization on SQL-server even on weekend where we have few transactions

On friday our MS SQL-server 2012 suddenly showed 100% disk I/O utilization in New Relic (Performance monitor). We hade made no updates what so ever and windows update showed nothing that had happend.
The load on the server was low because fridays have low traffic on our website. The disk I/O utilization has been kept high even over the weekend.
The server is a VM-Ware machine with 16 procs and 36 gb of memory. The disk are located in a san.
We have about 5 mb of reads per second and very low on writes on the database server.
The server has about 500 I/O operations per second.
The CPU is at 25%
The database is stored in 12 files on a separate drive on the server.
No long running task are running.
The server is defragmentet and all the indexeson the database have been rebuilt.
Perfmon on the sql server shows disk que at peek 5.
Our server guys says that the SAN is running smoothly. But my gues is that something happend on that friday whick keeps our SQL have to wait for file operations.
Any ideas?

There are lots of reasons this could happen.
Most obvious, backup schedule?
If your traffic was so low on Friday that IIS shut down your Application Pool after the idle timeout, the next hit to that Website will trigger reloading of all data which is cached on application startup.
Since your database server is virtualised, the IO to that server may be also virtualised (as opposed to directly connecting the SQL Server Virtual Machine to the storage on the physical host). In this case, your database server performance may be limited by other machines on the same host saturating the link to the SAN.

The reaseon was Another SQL-server on the same host that accessed the disk a lot

Weblogic Performance Tunning

We have a problem with Weblogic 10.3.2. We install a standard domain with default parameters. In this domain we only have a managed server and only running a web application on this managed server.
After installation we face performance problems. Sometimes user waits 1-2 minutes for application responses. (Forexample user clicks a button and it takes 1-2 minutes to perform GUI refresh. Its not a complicated task.)
To overcome these performance problems define parameters like;
configuraion->server start->arguments
-Xms4g -Xmx6g -Dweblogic.threadpool.MinPoolSize=100 -Dweblogic.threadpool.MaxPoolSize=500
And also we change the datasource connection pool parameters of the application in the weblogic side as below.
Initial Capacity:50
Maximum Capacity:250
Capacity Increment: 10
Statement Cache Type: LRU
Statement Cache Size: 50
We run Weblogic on 32 GB RAM servers with 16 CPUs. %25 resource of the server machine is dedicated for the Weblogic. But we still have performance problem.
Our target is servicing 300-400 concurrent users avoiding 1-2 minutes waiting time for each application request.
Defining a work manager can solve performance issue?
My datasource or managed bean definition is incorrect?
Can anyone help me?
Thanks for your replies

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas