Galera cluster into Google cloud platform - sql

We have a galera cluster with 3 nodes, on 3 different physical machines but all located in the same datacenter.
From what I understood, the reason they deployed this in the past was to increase availability and reliability.
Each node is installed on a VM using 12 cores and 4Gb RAM.
We are asked to migrate this to Google Cloup Platform in order to get rid of the ops tasks.
I could create 3 compute engine instances and deploy the galera cluster, but I have difficulties to see the added value compared to Cloud SQL instances with replication and backup. I am not very familiar in scaling heavy load systems.
The db hosted in these nodes is kind of critical and should ensure maximum availability and reliability.
What strategy should I adopt in order to migrate this architecture to GCP ?

Related

Apache Tomcat Crashes In Google Compute Engine f1-micro

I am running Apache Guacamole on a Google Cloud Compute Engine f1-micro with CentOS 7 because it is free.
Guacamole runs fine for some time (an hour or so) then unexpectantly crashes. I get the ERR_CONNECTION_REFUSED error in Chrome and when running htop I can see that all of the tomcat processes have stopped. To get it running again I just have to restart tomcat.
I have a message saying "Instance "guac" is overutilized. Consider switching to the machine type: g1-small (1 vCPU, 1.7 GB memory)" in the compute engine console.
I have tried limiting the memory allocation to tomcat, but that didn't seem to work.
Any suggestions?
I think the reason for the ERR_CONNECTION_REFUSED is likely due to the VM instance falling short on resources and in order to keep the OS up, process manager shuts down some processes. SSH is one of those processes, and once you reboot the vm, resource will resume operation in full.
As per the "over-utilization" notification recommending g1-small (1 vCPU, 1.7 GB memory)", please note that, f1-micro is a shared-core micro machine type with 0.2 vCPU, 0.60 GB of memory, backed by a shared physical core and is only ideal for running smaller non-resource intensive applications..
Depending on your Tomcat configuration, also note that:
Connecting to a database is an intensive process.
Creating a Tomcat with Google Marketplace, the default VM setting is "VM instance: 1 vCPU + 3.75 GB memory (n1-standard-1) so upgrading to machine type: g1-small (1 vCPU, 1.7 GB memory) so should ideal in your case.
Why was g1 small machine type recommended. Please note that Compute Engine uses the same CPU utilization numbers reported on the Compute Engine dashboard to determine what recommendations to make. These numbers are based on the average utilization of your instances over 60-seconds intervals, so they do not capture short CPU usage spikes.
So, applications with short usage spikes might need to run on a larger machine type than the one recommended by Google, to accommodate these spikes"
In summary my suggestion would be to upgrade as recommended. Also note that, the rightsizing gives warnings when VM is underutilized or overutilized and in this case, it is recommending to increase your VM size due to overutilization and keep in mind that this is only a recommendation based on the available data.

How to setup HPC cluster on one server

i'm working on an application that needs to be tested in a HPC cluster.
i'm thinking about using xcat as a resource manager.
i don't have much hardware resources, i have one HP desktop and MacBook laptop.
the question: is it possible to set up a virtual cluster (using virtualBox or KVM) on one hardware resource
thanks,
The short answer here is yes, depending on how much memory and disk you have available on your one machine. I've done this numerous times on a MacBook Pro with 8 GB of RAM.
The long answer is that there is absolutely nothing magical about an HPC cluster. All you need to test basic parallel applications in a simulated cluster environment are two or more VMs which meet these criteria:
Same OS, as identical as possible.
Passwordless authentication (ssh key based auth).
Same software stack in same location on all nodes (See #4 or use rsync).
At least one shared filesystem, e.g. NFS mounted $HOME
Shared network with name resolution configured (correct /etc/hosts on all nodes)
None of this requires job schedulers, provisioning tools or any complex networking. You can find many NFS setup howtos to help get one node set up to share $HOME to the others, this might be the most complicated part. VirtualBox does a good job of setting up local networking.
On top of this you can layer setting up a job scheduler like SLURM (highly recommended), provisioning tools like Warewulf or xCat, parallel filesystems across the VMs (BeeGFS is easy to set up and a great introduction), etc. I have had a full featured stateless cluster simulated on my Macbook Pro a number of times using tools from this list and VirtualBox VMs. It's a great way to learn about setting up an HPC cluster.

I want to learn about virtualization

As a very beginner, I only know how to create VMs and install OS on these using Oracle VirtualBox. All the VMs created are dependent on the hardware resources (CPU, RAM etc.) of a single machine. If the machine goes down the VMs will go down. Need to know how VMs can be created using taking resources from different physical machines (manually or dynamically) to avoid failure of any VMs.
For example: There are 4 physical machines having 8 core and 16GB RAM each. Now, I want to create three VM having having 8 core and 16GB RAM taking from different physical machines. If one physical machine goes down, no VM will be down.
You can look up clustering solutions (e.g. VMware clusters, or Hyper-V failover clusters). In this model, if a physical host goes down, then the virtualization platform will power up the VMs on other hosts.
If you're looking for zero downtime, then VMware has something called Fault Tolerance in which a shadow copy of a VM is running on a different host and is continuously synchronized with the primary copy. If the primary host goes down, the shadow copy can take over with zero downtime (e.g. you don't have to boot from the shadow copy because it's already running). This feature, while cool, has a lot of real-world limitations in how it inter-operates with other features of VMware. For example, as of vSphere 6.0, you cannot do various kinds of migrations for such VMs, etc. I believe it also requires a more expensive license.
These solutions generally require some shared resources between the physical hosts (most notably storage). Otherwise they will not work (or at the very least, performance will greatly suffer).

Migrate 100+ virtual machines from on-prem to azure

Apologies if this is the wrong platform for this question.
If I want to migrate 100 VM's onto Azure VM's what all things I need to consider and how can I migrate?
This is not a comprehensive answer but some things to consider are:
- Start with a thorough inventory of the VMs to migrate. Issues to watch out for include..
- Any unsupported OS versions, including 32-bit.
- large numbers of attached drives.
- Disk drives >1TB.
- Gen 2 VHDs.
- Application and network interdependencies which need to be maintained.
- Specific performance requirements (i.e. any VMs that would need Azure premium storage, SSD drives etc.).
In developing a migration strategy some important considerations are:
- How much downtime can you tolerate? To minimize downtime look at solutions like Azure Site Recovery which supports rapid switchover. If downtime is more flexible there are more offline migration tools and scripts available.
- Understand whether to move to the new Azure Resource Manager or the Service Management deployment model. See https://azure.microsoft.com/en-us/documentation/articles/resource-group-overview/.
- Which machines to move first (pick the simplest, with fewest dependences).
- Consider cases where it may be easier to migrate the data or application to a new VM rather than migrate the VM itself).
A good forum to ask specific migration questions is: Microsoft Azure Site Recovery
Appending to sendmarsh's reply
The things you will have to consider are:
Version of virtual environment i.e VMWare or Hyper-V.
Os version, RAM Size, OS disk size, OS disk count, Number of disks, Capacity of each disk, format of hard disk, number of processor cores,number of NIC's, processor architecture, Network configurations such as IP address's, generation type if the environment is Hyper-V.
I could have missed a few more things... like checking if the VMWare tools are installed. Some of the configurations are not supported like having an iSCSI disk will not be supported. Microsoft supports not all naming conventions for the machines, so be careful in setting the name as that might affect things later.
A full length of pre-requisites list is over at:
[1]: https://azure.microsoft.com/en-us/documentation/articles/site-recovery-best-practices/#azure-virtual-machine-requirements
Update: Using Powershell to automate the migration would make your life easier.

Is a google compute virtual machine highly available?

So I have a cloud virtual machine on google compute, does this mean by nature that it is highly available? If the VM is running on a single piece of hardware on GCE, if the piece of hardware breaks then the VM could go down. Is the VM running on some kind of RAID, but for servers? So if one of the machines goes down another machine will pick up and continue running the vm? Thanks.
The machine itself is not highly available. However, Google takes several steps to increase reliability:
Storage is replicated and independent of the physical machine the VM is running on (obviously not for local SSD). This means that even if the physical machine catches on fire, only the "runtime" state is lost but the attached disks are fine.
VMs can live-migrate. This is a setting you can control. If enabled, the VM will be migrated to a different physical machine on maintenance events. Live-migration can lead to brief performance degradation while memory etc. is synced to the other host but the machine is not shut down / restarted.
Even when the physical host suddenly dies, you can set your instance to restart automatically on a new machine. If you plan to use this mode, make sure your instance is able to cleanly boot to serving state without manual intervention.
If you need high availability, the best approach is to spread your instances among zones of the same region and using a network or HTTP(S) loadbalancer. These will automatically stop sending traffic to a machine in case it becomes unhealthy. Also see this short youtube video on Google's network architecture for more info.
For high availability of your application data, there are highly available options like Datastore for database-like usage and Cloud Storage for file-oriented data. Keep in mind that Cloud SQL also runs on a single instance/physical machine which means that you have to setup slaves/replicas to get high availability. However, you can also do that with your favorite DB system on plain Compute Engine instances if you are willing to maintain them yourself.