how to handle memory leaks in amazon web services t1.micro? - amazon-s3

I have a t1.micro instance in amazon web services to handle a virtual image (in concrete a formhub image) and sometimes I got an eror of not allocated memory, I solve it rebooting the instance. Any clues?
is possible to reboot the instances automatically every day?

The micro instances are quite constrained with only 600mb or so of RAM. You may solve the problem by moving up to a small or medium instance or even one of the new T2 instances - even the smallest one has 1Gb of RAM.
If this is not an option for you, you can add a cron job to restart the instance at a particular time of day.
ssh in to the instance and type the command:
sudo crontab -e
Enter a line like:
0 5 * * * /sbin/reboot
to restart the system at 5am each day. This is for an Ubuntu system - the reboot command may be elsewhere in other distributions. Run the command which reboot to check.

Related

Google-Compute-Engine Virtual Machine Instance: Unable to login/SSH the VM instance after adding a disk

GCP VM instance: OS: Ubuntu (18.04 bionic) Disk size: 10GB. Later added another disk of 10 GB.
While working on the GCP VM instance, I was facing the issue for 'no-disk space left'. Then, I created another disk of 10 GB and added to this GCP VM instance as referred in https://cloud.google.com/compute/docs/disks/add-persistent-disk?&_ga=2.217520662.-1058595688.1590395241#formatting
Now, I exited the GCP VM instance and stopped it.
Later on, when I restarted the GCP VM instance, I am unable to connect. I tried to connect using the SSH connection option available on GCP, putty, WinSCP and telnet, but I am unable to connect now.
My understanding to this is that it might be possible that some services might have stopped on the GCP VM instance. Is there a way to check whether the services are running or not on the GCP VM instances. If yes,then how?
If you think, there is some other issue for connecting to the GCP VM instance then please let me know.
There may be several reasons;
Firewall rules - check them to be sure nothing blocks SSH traffic to your machine.
Have a look at the serial console output - you can do it via console gui or gcloud compute instances get-serial-port-output instance_name --zone=my_zone.
If your drive gets full you may not be able to login (no matter how).
Adding another persistend disk won't help if the first one is full.
You can increase it's size though - also via console or gcloud compute disks resize example-disk-1 --size=11GB - this will add 1GB more and if it's the matter of disk space it should allow you to log in.
If you're still not able to log in try enabling interaction with serial console gcloud compute instances add-metadata instance-name --metadata serial-port-enable=TRUE and connect to it gcloud compute connect-to-serial-port instance-name since this is the most full-proof method if everything else fails.
If you're able to connect via serial console check if the SSH service is listening:
sudo service ssh status - if not start it with sudo service ssh start and watch for any errors.
Similar case was also discussed here.

GCP VM consistently shutting down without warning

Been using a GCP preemptible VM for a few months without problems, but in the last 4 weeks my instances have consistently shut off anywhere from 10 minutes to 20 minutes into operation.
I'll be in the middle of training, and my notebook will suddenly disconnect. The terminal will show this error:
jupyter#fastai-instance:~$ Connection to 104.154.142.171 closed by remote host.
Connection to 104.154.142.171 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I then check the status of my VM, to see that it has shutdown.
I searched the terminal traceback and found this thread, which seemed promising: ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]
When I ran sudo gcloud compute config-ssh my VM ran for much longer than usual before shutting down, yet shutdown in the same way after about an hour. Since then, back to the same behavior.
I know preemptible instances can be shutdown when the platform needs resources, but my understanding is that comes with some kind of warning. I've checked the status of GCP's servers after shutdowns and they appear to be fine. This is also happening the same way every time I turn my VM on, which seems too frequent for preempting.
I am not sure where to look for any clues – has anyone else had a problem like this? What's especially puzzling to me is, if it is in fact an SSH problem, why would that cause the VM itself to shutdown, rather than just break the connection?
Thanks very much for any help!
Did you try to set a shutdown script and to print something in a file for validating the state of the VM when it goes down ?
Try this as shutdown script
#!/bin/bash
curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted" -H "Metadata-Flavor: Google" > /tmp/preempted.log
If there is TRUE in the file, it's because the VM has been preempted.
If a VM stops and you have an active SSH connection to that VM (via gcloud compute ssh), then it's normal that you are receiving an error. Since the VM goes down, all connections are closed, so does your SSH connection (you cannot connect to a stopped instance). The VM termination causes the SSH error, not the opposite.
When using preemptible instances, Google can reclaim the instance whenever it's needed. Note that (from the docs about preemptible instances limitations) :
Compute Engine might terminate preemptible instances at any time due to system events. The probability that Compute Engine will terminate a preemptible instance for a system event is generally low, but might vary from day to day and from zone to zone depending on current conditions.
It means that one day, your instance may be running for 24 hours without being terminated, but an other day, your instance may be stopped 30 minutes after being started if Compute Engine needs to reclaim some resources.
A comment on the "continuously shutting down" part:
(I have experienced this as well)
Keep in mind that Google prefers to shut down RECENTLY STARTED preemptible instances, over ones started earlier.
The link below (and supplied earlier) has the statement:
Generally, Compute Engine avoids preempting too many instances from a single customer and preempts new instances over older instances whenever possible.
This would generally mean that, yes, I suppose, if you are preempted, and boot up again, it is quite likely that you are going to be preempted again and again until the load in the zone reduces.
I'm surprised that Google don't simply preclude you starting the preemptible VM for a while (like 30-60 minutes?). - How much CPU is being wasted bouncing VMs up and down and crossing our fingers???
P.S. There is a dirty trick to end-around your frustration - Have 2 VMs identically configured, except for preemptibility, but only 1 underlying book disk. If you are having a bad day with preempts, simply 'move' the boot disk to the non-preemptible VM, boot it, and carry on. - It's a couple of simple gcloud commands to achieve this, easily scripted and very fast. Don't tell Google I told ya....
https://cloud.google.com/compute/docs/instances/preemptible#limitations

ESXi 5.1 revert to snapshot daily or every night

I'm trying to revert a virtual machine to the previous snapshot every day or night.
Unfortunately, I haven't found any way to do this the way I want it.
Here are some things I tried and that didn't fit :
- snapshot.action=autoRevert --> The VM has to HALT, REBOOT doesn't work the same. I don't want to power on my VM manually.
- snapshot.action=autoRevert on a running snapshot. I tried this, thinking it might work and resolve the first issue. But when i HALT my VM, the snapshot is reverted but the VM is placed in a suspended state...
- PowerCLI script : I don't want to have a Windows machine running just for this little thing.
- NonPersistent disk : same thing as the first issue : VM needs to HALT, not REBOOT.
How can I simply do this ? I thought I could just do those things and place a cron on my linux VM to reboot every night.
In the past I've set up scripts that revert VMs to specific snapshots via the SSH server on my ESXi host. Once sshd is enabled, you can remotely run vim-cmd over SSH. This was on ESXi 4.x, but I assume the same can be done in newer versions.
The catch was that I had to enable the so-called "Tech Support Mode" to get sshd running, as documented in the VMware KB: kb.vmware.com/kb/1017910
The procedure I used was to first look up the ID of the VM in question by running:
vim-cmd vmsvc/getallvms
Then, you can view your VM's snapshot tree by passing its ID to this command (this example uses the VM with ID 80):
vim-cmd vmsvc/get.snapshotinfo 80
Finally, you can use an SSH client to remotely revert the VM to an arbitrary snapshot by passing the VM and snapshot IDs to 'snapshot.revert':
ssh root#YOUR_VMWARE_HOST vim-cmd vmsvc/snapshot.revert VM_ID 0 SNAPSHOT_ID
One other thing to note is that you can set up public key authentication between the ESXi server and the machine running your scripts so that the latter won't have to use a password.
The only annoyance with that approach was that I didn't immediately see a way to preserve the authorized_keys file on the ESXi server between reboots - if the ESXi server has to be rebooted, you'll have to rebuild its authorized_keys file before public key auth will work again.

mass-restarting httpd on lots of EC2 instances

I am running a variable number of EC2 instances (CentOS 64) that contain an apache web server that caches a bunch of code in production mode.
Now every time I make some changes to the code (generally on a weekly basis) I have to log into each one of them instances and do a "su" then "service httpd restart"
Is there a way to automate this so that I can run a single command on one of the instances it would connect to all others and restart it? Getting really time consuming especially when the application has spawned some 20-30 instances on its own (happens on some days when we get high traffic)
Thanks!
Dancer's shell, dsh, is provided specifically to do this. No 'scripting' required. As #tix3 suggests, you should probably also convince sudo on those machines (configure /etc/sudoers using visudo) to configure them to accept your restart command.

Is it recommended to run redis using Supervisor

Is it a good practice to run redis in production with Supervisor?
I've googled around, but haven't seen many examples of doing so. If not, what is the proper way of running redis in production?
I personally just use Monit on Redis in production. If Redis crash Monit will restart it but more importantly Monit will be able to monitor (and alert when a threeshold is reach) the amount of RAM that Redis currently takes (which is the biggest issue)
Configuration could be something like this (if maxmemory was set to 1Gb in Redis)
check process redis
with pidfile /var/run/redis.pid
start program = "/etc/init.d/redis-server start"
stop program = "/etc/init.d/redis-server stop"
if 10 restarts within 10 cycles
then timeout
if failed host 127.0.0.1 port 6379 then restart
if memory is greater than 1GB for 2 cycles then alert
Well..it depends. If I were do use redis under daemon control I would use runit. I do use monit but only for monitoring. I like to see the green light.
However, for redis to exploit the true power, you dont run redis as a deamon esp a master. If a master goes down, you will have to switch a slave to a master. Quit simply, I just shoot the node in the head and I have a chef recipe bring up a new node.
But then again....it also depends on how often you snapshot. I do not snapshot thus no need for deamon control.
People use reids for brute force speed. that means not writing to disk and keep all data in ram. If a node goes down...and you dont snapshot...data is lost.