ESXi 5.1 revert to snapshot daily or every night - automation

I'm trying to revert a virtual machine to the previous snapshot every day or night.
Unfortunately, I haven't found any way to do this the way I want it.
Here are some things I tried and that didn't fit :
- snapshot.action=autoRevert --> The VM has to HALT, REBOOT doesn't work the same. I don't want to power on my VM manually.
- snapshot.action=autoRevert on a running snapshot. I tried this, thinking it might work and resolve the first issue. But when i HALT my VM, the snapshot is reverted but the VM is placed in a suspended state...
- PowerCLI script : I don't want to have a Windows machine running just for this little thing.
- NonPersistent disk : same thing as the first issue : VM needs to HALT, not REBOOT.
How can I simply do this ? I thought I could just do those things and place a cron on my linux VM to reboot every night.

In the past I've set up scripts that revert VMs to specific snapshots via the SSH server on my ESXi host. Once sshd is enabled, you can remotely run vim-cmd over SSH. This was on ESXi 4.x, but I assume the same can be done in newer versions.
The catch was that I had to enable the so-called "Tech Support Mode" to get sshd running, as documented in the VMware KB: kb.vmware.com/kb/1017910
The procedure I used was to first look up the ID of the VM in question by running:
vim-cmd vmsvc/getallvms
Then, you can view your VM's snapshot tree by passing its ID to this command (this example uses the VM with ID 80):
vim-cmd vmsvc/get.snapshotinfo 80
Finally, you can use an SSH client to remotely revert the VM to an arbitrary snapshot by passing the VM and snapshot IDs to 'snapshot.revert':
ssh root#YOUR_VMWARE_HOST vim-cmd vmsvc/snapshot.revert VM_ID 0 SNAPSHOT_ID
One other thing to note is that you can set up public key authentication between the ESXi server and the machine running your scripts so that the latter won't have to use a password.
The only annoyance with that approach was that I didn't immediately see a way to preserve the authorized_keys file on the ESXi server between reboots - if the ESXi server has to be rebooted, you'll have to rebuild its authorized_keys file before public key auth will work again.

Related

Ansible playbook stops after loosing connection (even for few seconds) with ssh window of VM on which it is running?

My ansible playbook consist several task in it and I am running my ansible playbook on Virtual Machine. I am using ssh method to log in to VM and run the playbook. if my ssh window gets closed during the execution of any task (when internet connection is not stable and not reliable), the execution of ansible playbook stops as the ssh window already got closed.
It takes around 1 hour for My play book to run, and sometimes even if I loose internet connectivity for few seconds , the ssh terminal lost its connection and thus entire playbook stops. any idea how to make ansible script more redundant to avoid this problem ?
Thanks in advance !!
If you need to run a job on an external system that hangs for a long time and it is relevant that the task completes. It is extremly bad idea to run that job in the foreground.
It is not important that the task is Ansible or the connection is SSH. In every case you would always just "push" the command to the remote host and send it to background with something like "nohup" if available. The problem is of course the tree of processes. Your connection creates a process on the remote system and that creates the job you want to run. Is the connection gets lost, al subprocesses will be killed automatically by the OS.
So - under Windows - maybe use RDP to open a screen that stays available even after connection is lost or use something like Cygwin and nohup via SSH to change the hung up your process from the ssh session.
Or - when you need to run a playbook on that system install for example a AWX container and use that. There are many options based on your requirements, resources and administrative options.

Hyper-V Anti Affinity

I'm trying to setup anti affinity with a cluster Hyper-V setup but am struggling to get any VMs to stay apart. It seems to be that the anti affinity is simply not honored.
Setup:
3 x Hyper-V servers (server1, server2, server3)
3 x VMs (web_test_1, web_test_2, web_test3)
Attempt 1:
I ran the below script on server1:
$WEBAntiAffinity = New-Object System.Collections.Specialized.StringCollection
$WEBAntiAffinity.Add("WEB Servers")
(Get-ClusterGroup –Name WEB_TEST_1).AntiAffinityClassNames = $WEBAntiAffinity
(Get-ClusterGroup –Name WEB_TEST_2).AntiAffinityClassNames = $WEBAntiAffinity
(Get-ClusterGroup –Name WEB_TEST_3).AntiAffinityClassNames = $WEBAntiAffinity
Get-ClusterGroup |Select-Object -Property name,AntiAffinityClassNames
All three VMs were powered off before I ran the above and all created on server1.
When powering them on, they all powered on and stayed on server1.
Attempt 2:
I ran the same script above, on the additional servers (server2 and server3).
I powered off the VMs and powered them back on again, they again all remained on server1.
Attempt 3:
After having ran the script on all the servers, I restarted the servers one by one. The VMs moved between nodes as normal during the reboots but when all were rebooted I stopped all the VMs, moved them to server1 and then started them again.
My assumption was that 2 would move before powering on but that didn't happen, they all started on server1.
Anyone know what I'm doing wrong here? Am I missing some pre-reqs? There's not a great amount of examples online that I've been able to find.
This is not called out specifically in Microsoft's documentation, but to make anti-affinity rules work in Hyper-V you also need System Center Virtual Machine Manager (SCVMM). SCVMM reads the anti-affinity rules (and other rules such as affinity, priority, etc.) and performs the migrations to apply those rules.
This is equivalent to the vmware stack, where ESXi is the hypervisor, the VM configuration contains the rules, and Director or vSphere actually apply the rules.

Google-Compute-Engine Virtual Machine Instance: Unable to login/SSH the VM instance after adding a disk

GCP VM instance: OS: Ubuntu (18.04 bionic) Disk size: 10GB. Later added another disk of 10 GB.
While working on the GCP VM instance, I was facing the issue for 'no-disk space left'. Then, I created another disk of 10 GB and added to this GCP VM instance as referred in https://cloud.google.com/compute/docs/disks/add-persistent-disk?&_ga=2.217520662.-1058595688.1590395241#formatting
Now, I exited the GCP VM instance and stopped it.
Later on, when I restarted the GCP VM instance, I am unable to connect. I tried to connect using the SSH connection option available on GCP, putty, WinSCP and telnet, but I am unable to connect now.
My understanding to this is that it might be possible that some services might have stopped on the GCP VM instance. Is there a way to check whether the services are running or not on the GCP VM instances. If yes,then how?
If you think, there is some other issue for connecting to the GCP VM instance then please let me know.
There may be several reasons;
Firewall rules - check them to be sure nothing blocks SSH traffic to your machine.
Have a look at the serial console output - you can do it via console gui or gcloud compute instances get-serial-port-output instance_name --zone=my_zone.
If your drive gets full you may not be able to login (no matter how).
Adding another persistend disk won't help if the first one is full.
You can increase it's size though - also via console or gcloud compute disks resize example-disk-1 --size=11GB - this will add 1GB more and if it's the matter of disk space it should allow you to log in.
If you're still not able to log in try enabling interaction with serial console gcloud compute instances add-metadata instance-name --metadata serial-port-enable=TRUE and connect to it gcloud compute connect-to-serial-port instance-name since this is the most full-proof method if everything else fails.
If you're able to connect via serial console check if the SSH service is listening:
sudo service ssh status - if not start it with sudo service ssh start and watch for any errors.
Similar case was also discussed here.

Will processes running on Vmware ESXI host shell via SSH continue if SSH session is disconnected?

I've got a VMware ESXi server that I connected to via SSH to run a process. (I was running "vmkfstools --punchzero myVirtualDrive.vmdk" to reclaim space on a virtual disk). The SSH client connection was disconnected when the process was only 70% complete (with several hours of estimated time remaining).
Was the process terminated when the SSH client dropped its connection to the ESXi host?
Is there any way to tell if the process is still running?
I did a quick ps|grep to find it, but didn't see anything that looked like vmkfstools.
On most standard linux machines, I use "screen" to avoid SSH client disconnection issues, but since the ESXi shell is purposely a very stripped down environment, I just ran the command straight away.
Thoughts?
The command was killed when you disconnected.
What you can do to prevent this in the future is to run the command with nohup. This will run your command in the background, continuing even when your shell "hangs up".
What I usually do when I expect a command to run long:
I ssh to a Linux machine that runs 24/7
start screen
ssh to the ESXi host
start the command
This way I don't have to worry about disconnects, I can just detach screen and go home after work and reattach screen the next morning to check the status of the command.
This was exactly what I was looking for, thanks all. I attempted to unmap unused block on LUNs which is vmfs 5 file system. I run this cmd esxcli storage vmfs unmap -l [LUN-NAME] I wonder that what will happen if I close remote SSH connection while this cmd process still running.

How to fix the Zookeeper error for Hbase

Main OS is windows 7 64bit. Using VM player to create two vm CentOS 5.6 system. The net connection is bridge. I installed Hbase on both of the CentOS system, one is master, the other is slave. When I enter the shell, and run status 'details'.
The error from master is
zookeeper.ZKConfig: no valid quorum servers found in zoo.cfg ERROR:
org.apache.hadoop.hbase.ZooKeeperConnectionException: An error is
preventing HBase from connecting to ZooKeeper
And the error from slave is
ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is
able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is
the default). Consider inspecting your ZK server logs for that error
and then make sure you are reusing HBaseConfiguration as often as you
can. See HTable's javadoc for more information.
Please give me some suggestion.
Thanks a lot
Check if this is within your .bashrc, if not, add them and restart all hbase services (do not forget to manually run them as well), that did it for me with a pseudo-distributed installation. My problem (and maybe yours as well) was that Hbase wasn't detecting it's configuration.
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HBASE_CONF_DIR=/etc/hbase/conf
I see this very often on my machine. I don't have a failsafe cure, but end up running stop-all.sh, and deleting every place that hadoop and dfs (its a dfs failure) store their temp files. It seems to happen after my computer goes to sleep while dfs is running.
I am going to experiment with single-user mode to avoid this. I dont need distribution while developing.