Hyper-V Anti Affinity - hyper-v

I'm trying to setup anti affinity with a cluster Hyper-V setup but am struggling to get any VMs to stay apart. It seems to be that the anti affinity is simply not honored.
Setup:
3 x Hyper-V servers (server1, server2, server3)
3 x VMs (web_test_1, web_test_2, web_test3)
Attempt 1:
I ran the below script on server1:
$WEBAntiAffinity = New-Object System.Collections.Specialized.StringCollection
$WEBAntiAffinity.Add("WEB Servers")
(Get-ClusterGroup –Name WEB_TEST_1).AntiAffinityClassNames = $WEBAntiAffinity
(Get-ClusterGroup –Name WEB_TEST_2).AntiAffinityClassNames = $WEBAntiAffinity
(Get-ClusterGroup –Name WEB_TEST_3).AntiAffinityClassNames = $WEBAntiAffinity
Get-ClusterGroup |Select-Object -Property name,AntiAffinityClassNames
All three VMs were powered off before I ran the above and all created on server1.
When powering them on, they all powered on and stayed on server1.
Attempt 2:
I ran the same script above, on the additional servers (server2 and server3).
I powered off the VMs and powered them back on again, they again all remained on server1.
Attempt 3:
After having ran the script on all the servers, I restarted the servers one by one. The VMs moved between nodes as normal during the reboots but when all were rebooted I stopped all the VMs, moved them to server1 and then started them again.
My assumption was that 2 would move before powering on but that didn't happen, they all started on server1.
Anyone know what I'm doing wrong here? Am I missing some pre-reqs? There's not a great amount of examples online that I've been able to find.

This is not called out specifically in Microsoft's documentation, but to make anti-affinity rules work in Hyper-V you also need System Center Virtual Machine Manager (SCVMM). SCVMM reads the anti-affinity rules (and other rules such as affinity, priority, etc.) and performs the migrations to apply those rules.
This is equivalent to the vmware stack, where ESXi is the hypervisor, the VM configuration contains the rules, and Director or vSphere actually apply the rules.

Related

DC/OS has three roles, they are master, slave, slave_public, why can't put them on one host?

I just investigate DC/OS, I find that DC/OS has three roles:master, slave, slave_public, I want to deploy a cluster which can host master, slave or slave_public roles on one host, but currently I can't do that.
I want to know that why can't put them on one host when designed. If I do that, could I get some suggestions?
I just have the idea. If I can't do, I'll quit using DCOS, I'll use mesos and marathon.
Is there someone has the idea with me? I look forward to the reply.
This is by design, and things are actually being worked on to re-enforce that an machine is installed with only one role because things break with more than one.
If you're trying to demo / experiment with DC/OS and you only have one machine, you can use Virtual Machines or Docker to partition that one machine into multiple machines / parts which you can install DC/OS on. dcos-vagrant and dcos-docker can help you there.
As far as installing though, the configuration for each of the three roles is incompatible with one another. The "master" role causes a whole bunch of pieces of software to be started / installed on a host (Mesos-DNS, Mesos master, marathon, exhibitor, zookeeper, 3dt, adminrouter, rexray, spartan, navstar among others) which listen on various ports. The "slave" role causes a machine to have a mesos-agent (mesos renamed mesos-slave to mesos-agent, hence the disconnect) configured and started on the agent. The mesos-agent is configured to control / most ports greater than 1024 to tasks which are launched by mesos frameworks on the agent. Several of those ports are used by services which are run on masters, resulting in odd conflicts and hard to fix bad behavior.
In the case of running the "slave" and "slave_public" on the same host, those two conflict more directly, because both of them cause mesos-agent to be run on the host, with slightly different configuration. Both the mesos-agent (the one configured with the "slave" role and the one with the "slave_public" role are configured to listen on port 5051. Only one of them can use it though, so you end up with one of the agents being non-functional.
DC/OS only supports running a node as either a master or an agent(slave). You are correct that Mesos does not have this limitation. But DC/OS is more than just a Mesos/Marathon. To enable all the additional features of DC/OS there are various components built around Mesos and Marathon. At times these components behave differently whether they are running on a master or an agent and at other times the components that exist on a master may or may not exist on an agent or vice versa. So running a master and an agent on the same node would lead to conflicts/issues.
If you are looking to run a small development setup before scaling the solution out to a bigger distributed system DC/OS Vagrant might be a good starting point.

Windows Server 2008 VM - network services failing

I would really appreciated another perspective on an issue we have been experiencing.
The environment:
We have a small subset of VMs (5 Windows Server 2008 R2 VM's) hosted on a Windows Server 2012 Cluster of 8 Physical Hosts which supports 100's over VMs across various OS (2008/2012 etc).
The issue:
Servers within the subset of VMs experience widespread network SERVICE failures. The failure presents itself as a loss in connectivity for a large number of network related services operating on the VMs (including certain critical network dependant applications).
The impacts:
Server remains online.
Inability to RDP to the servers via Domain Accounts (Local accounts are fine).
Windows event logs associated with Netlogon Failure: Event ID 5719 - This computer was not able to set up a secure session with a domain controller in domain DOWNERGROUP due to the following:
The RPC server is unavailable. This may lead to authentication problems.
Windows event logs assocaited with Group Policy Failure:
Event ID 1054:The processing of Group Policy failed. Windows could not
obtain the name of a domain controller. This could be caused by a name
resolution failure. Verify your Domain Name System (DNS) is configured
and working correctly
Widespread Agent Failure (AV, Monitoring, Application) - Lack of connectivty to centralised management servers.
The resolution(s). Stopping an agent service. Strange however its not limited to a specific agent however if we stop agent A, the server comes back to life, however if we also stop agent B, the server comes back to life with Agent A still running. Restarting the VM also resolves the issue.
Note that these events do not appear on other VMs hosted off the same host at the time of the outage. Also note that the guest is located on the same host prior to, during and after the outage.
We have investigated the suspicion that their may be issues with Dynamic Range Port Allocation with the server possibly getting into a bottleneck state. We have implementedthe "MaxUserPort" and "TCPTimedWaitDelay" registry parameters and have set them to 65k and 30 respectively.
Also note that when an outage occurs, it does not always occur on the same VMs in the group. Often times it is 2, 3, 4 or all servers.
Im really just asking if anyone can see these symptoms and relate to possible causes for our situation.
Any help/discussion would be appreciated.
Well, this turned out to be an interesting resolution.
We discovered that one of our server agents, while not actually showing open ports in Netstat, had over 40,000 handles growing linearly over time.
Had to enable the "handles" column in task manager to be able to see this info.
This was the miracle post...
http://blogs.technet.com/b/kimberj/archive/2012/07/06/sever-quot-hangs-quot-and-ephemeral-port-exhaustion-issues.aspx

Do I need to run the WebLogic node manager on a single machine that has multiple WebLogic instances?

Forward: I'm using Java 6u45, WebLogic 10.3.6, and Ubuntu Desktop 14.04 64-bit.
I just started as a student assistant at one of my state's IT offices. On my first day I was tasked with testing WebLogic on Ubuntu (Windows isn't cases sensitive, causing later issues because WebLogic is...). I started messing around with clustering, and now my setup is as follows:
1 Ubuntu machine
1 domain
6 servers: Admin server, wls1-4, and wlsmaster (wlsmaster was supposed to be what wls1 and wls2 reported to within the cluster because I set the cluster to be unicast, but that's a secondary question for now).
2 clusters: cluster1 and cluster2. wls1, wls2, and wlsmaster are on cluster1. wls3 and 4 are on cluster2.
Given my setup, do I even need to use node manager because I'm only using one physical machine? Secondary question; if I want to use unicast, how do I set the master? $state uses unicast for what few Weblogic servers we have, so I was told to check that out.
A few things:
No, you don't necessarily have to use a nodemanager, but it will make your life easier. When you log into the weblogic admin console and attempt to start one of your servers e.g. wls1-4, the Admin server will attempt to talk to the node manager to start the servers. Without the nodemanager you will have to start each server individually using the startManagedWebLogic.sh script and if you need to bring servers up and down often it will be very annoying.
With regards to Unicast it is pretty easy to set up (we just leave all the default values alone). Here is the pertinent info from the Oracle Docs:
"Each of the Managed Servers in a WebLogic Server cluster has a name. For unicast clusters, WebLogic Server reads these Managed Server names and then sorts them into an ordered list by alphanumeric name. The first 10 Managed Servers in the list (up to 10 Managed Servers) become the first unicast clustering group. The second set of 10 Managed Servers (if applicable) becomes the second group, and so on until all Managed Servers in the cluster are organized into groups of 10 Managed Servers or less. The first Managed Server for each group becomes the group leader for the other (up to) nine Managed Servers in the group."
So you will want to name your master servers in such a way that they are the first alphanumerically in the cluster. That said, for your use case I doubt you need those master servers as all. Just have 2 clusters, one with wls1-2 and one with wls3-4.

ESXi 5.1 revert to snapshot daily or every night

I'm trying to revert a virtual machine to the previous snapshot every day or night.
Unfortunately, I haven't found any way to do this the way I want it.
Here are some things I tried and that didn't fit :
- snapshot.action=autoRevert --> The VM has to HALT, REBOOT doesn't work the same. I don't want to power on my VM manually.
- snapshot.action=autoRevert on a running snapshot. I tried this, thinking it might work and resolve the first issue. But when i HALT my VM, the snapshot is reverted but the VM is placed in a suspended state...
- PowerCLI script : I don't want to have a Windows machine running just for this little thing.
- NonPersistent disk : same thing as the first issue : VM needs to HALT, not REBOOT.
How can I simply do this ? I thought I could just do those things and place a cron on my linux VM to reboot every night.
In the past I've set up scripts that revert VMs to specific snapshots via the SSH server on my ESXi host. Once sshd is enabled, you can remotely run vim-cmd over SSH. This was on ESXi 4.x, but I assume the same can be done in newer versions.
The catch was that I had to enable the so-called "Tech Support Mode" to get sshd running, as documented in the VMware KB: kb.vmware.com/kb/1017910
The procedure I used was to first look up the ID of the VM in question by running:
vim-cmd vmsvc/getallvms
Then, you can view your VM's snapshot tree by passing its ID to this command (this example uses the VM with ID 80):
vim-cmd vmsvc/get.snapshotinfo 80
Finally, you can use an SSH client to remotely revert the VM to an arbitrary snapshot by passing the VM and snapshot IDs to 'snapshot.revert':
ssh root#YOUR_VMWARE_HOST vim-cmd vmsvc/snapshot.revert VM_ID 0 SNAPSHOT_ID
One other thing to note is that you can set up public key authentication between the ESXi server and the machine running your scripts so that the latter won't have to use a password.
The only annoyance with that approach was that I didn't immediately see a way to preserve the authorized_keys file on the ESXi server between reboots - if the ESXi server has to be rebooted, you'll have to rebuild its authorized_keys file before public key auth will work again.

Does anyone know of a free solution to perform failover for VMware ESXi?

I would like to setup a free/custom solution to perform failover for VMware ESXi.
The setup is as follows:
2x Physical servers each with independent storage.
For each physical server there are 2x Win2k8 Enterprise servers.
In the case a physical server completely fails, we want the other (for convenience sake we can assign it with a slave role) to resume operation.
For this to occur, we need to somehow do continuous replication of the virtual servers, and in the case of the primary server failing have it take over the IP, start the virtual machines and continue operation.
I am new to VMware ESXi myself, but I am trying to research alternative solutions to the expensive VMware licensing for failover.
Thanks.
Take a look at Veeam Backup & Replication.