Google Cloud VM: CPU usage stats not showing anymore - virtual-machine

I used to have a graph with my VM's CPU usage.
Something like this:
screenshot.
Now it isn't there anymore. Any idea how I bring it back?
Thanks

Related

CoTURN Usage Statistics

I am still a bit new to the WebRTC world and trying to find my way through. I have succcessfully set up CoTURN, and been able to route calls behind a firewall by using CoTURN. Now I am wondering if it is possible to somehow inspect and possibly visualize usage statistics of CoTURN? I would love to know how many users are utilizing the server at any given time, how much the bandwidth and CPU usage is etc.? I saw details on how to optimize bandwidth and CPU usage in the official docs, but I haven't found any info on actually monitoring the usage. Any help would be highly appreciated.
If you want to monitor standard usage statistics like CPU usage, load, bandwidth, etc., you can focus on what's available for your infrastructure. For example in AWS you could have CloudWatch, or in generic Linux deployments export the usage stats with Prometheus and have them presented with Grafana.
For the coturn/TURN specific statistics, then coturn allows to store some metrics in Redis; it's described in https://github.com/coturn/coturn/blob/master/turndb/schema.stats.redis
Total traffic information is also reported when the allocation is deleted. The keys are
"turn/user/<username>/allocation/<id>/total_traffic" or "turn/user/<username>/allocation/<id>/total_traffic/peer".
Applications interested in the total amount of traffic per allocation can subscribe to these events as:
psubscribe turn/realm/*/user/*/allocation/*/total_traffic
psubscribe turn/realm/*/user/*/allocation/*/total_traffic/peer

Google Compute Engine VM constantly crashes

On the Compute Engine VM in us-west-1b, I run 16 vCPUs near 99% usage. After a few hours, the VM automatically crashes. This is not a one-time incident, and I have to manually restart the VM.
There are a few instances of CPU usage suddenly dropping to around 30%, then bouncing back to 99%.
There are no logs for the VM at the time of the crash. Is there any other way to get the error logs?
How do I prevent VMs from crashing?
CPU usage graph
This could be your process manager saying that your processes are out of resources. You might wanna look into Kernel tuning where you can increase the limits on the number of active processes on your VM/OS and their resources. Or you can try using a bigger machine with more physical resources. In short, your machine is falling short on resources and hence in order to keep the OS up, process manager shuts down the processes. SSH is one of those processes. Once you reset the machine, all comes back to normal.
How process manager/kernel decides to quit a process varies in many ways. It could simply be that a process has consistently stayed up for way long time to consume too many resources. Also, one thing to note is that OS images that you use to create a VM on GCP is custom hardened by Google to make sure that they can limit malicious capabilities of processes running on such machines.
One of the best ways to tackle this is:
increase the resources of your VM
then go back to code and find out if there's something that is leaking in the process or memory
if all fails, then you might wanna do some kernel tuning to make sure your processes have higer priority than other system process. Though this is a bad idea since you could end up creating a zombie VM.

Dashboard shows close to 100% CPU load, but actual usage is different

During peak hours of our service, I notice the CPU load goes up to 100%. However, when I SSH into the machine and use top or htop, I don't ever see the CPU usage go above 25%. This instance is a dedicated load balancer running HAProxy.
Here is a screenshot of the Dashboard: https://gyazo.com/010715208f81ec97c1bc9b78123fe4d4
Here is a screenshot of top in the instance: https://gyazo.com/afa7098331f7a3f66c018041b9be686d
During these peak times, I noticed there are some latency and this is not caused by database or my other server instances as I checked their loads and were far below threshold. I was wondering if Compute Engine is throttling by mistaking it is at 100% CPU usage or something?
Has anyone had a similar experience?

How to monitor Elasticache metrics for Redis like resources used

I want to monitor the metrics for Redis like memory.
Can anyone tell how to find these metrics.
Assuming this is done through the AWS console, for memory usage you can use theBytesUsedForCachemetric. For other metrics refer to http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/CacheMetrics.Redis.html

S3cmd sync returns "killed"

I am trying to sync a few large buckets on amazon S3.
When I run my S3cmd sync --recursive command I get a response saying "killed".
What does this refer to? Is there a limit on the number of files that can be synced in S3?
After reading around it looks like the program has memory consumption issues. In particular this can cause the OOM killer (out of memory killer) to take down the process and prevent the system from getting bogged down. A quick look at dmesg after the process is killed will generally show if this is the case or not.
With that in mind I would ensure you're on the latest release, which notes memory consumption issues being solved in the release notes.
Old question, but I would like to say that, before you try to add more physical memory or increase vm memory, try just adding more swap.
I did this with 4 servers (ubuntu and centos) with low ram (700MB total, only 15MB available) and it is working fine now.