how to migrate an instance in openstack from one availability zone to another availability zone through heat template - migration

Migrate instance from one availability zone to other availability zone through heat template.
What could be the solution for migrating the instance to different availability zone thorough heat stack loosing instance state.
Created an instance VM1 in availability zone AZ1 throught heat template
Now i wanted to move instance "VM1" from AZ1 to AZ2
As the instance is created with heat stack, if we change the availability_zone parameter of OS::Nova::Server resource it will recreate the instance.

Related

Ceph Object Gateway: what is the best backup strategy?

I have a Ceph cluster managed by Rook with a single RGW store over it. We are trying to figure out the best backup strategy for this store. We are considering the following options: using rclone to backup object via an S3 interface, using s3fs-fuse (haven’t tested it yet but s3fs-fuse is known to be not reliable enough), and using NFS-Ganesha to reexport the RGW store as an NFS share.
We are going to have quite a lot of RGW users and quite a lot of buckets, so all three solutions do not scale well for us. Another possibility is to perform snapshots of RADOS pools backing the RGW store and to backup these snapshots, but the RTO will be much higher in that case. Another problem with snapshots is that it does not seem possible to perform them consistently across all RGW-backing pools. We never delete objects from the RGW store, so this problem does not seem to be that big if we start snapshotting from the metadata pool - all the data it refers to will remain in place even if we create a snapshot on the data pool a bit later. It won’t be super consistent but it should not be broken either. It’s not entirely clear how to restore single objects in a timely manner using this snapshotting scheme (to be honest, it’s not entirely clear how to restore using this scheme at all), but it seems to be worth trying.
What other options do we have? Am I missing something?
We're planning to implement Ceph in 2021.
We don't expect a large number of users and buckets, initially.
While waiting for https://tracker.ceph.com/projects/ceph/wiki/Rgw_-_Snapshots, I successfully tested this solution to address the Object Store protection by taking advantage of multisite configuration + sync policy (https://docs.ceph.com/en/latest/radosgw/multisite-sync-policy/) in the "Octopus" version.
Assuming you have all zones in the Prod site Zone Sync'd to the DRS,
create a Zone in the DRS, e.g. "backupZone", not Zone Sync'd from
or to any of the other Prod or DRS zones;
the endpoints for this backupZone are in 2 or more DRS cluster
nodes;
using (https://rclone.org/s3/) write a bash script: for each the
"bucket"s in the DRS zones, create a version enabled "bucket"-p in the backupZone
and schedule sync, e.g. twice a day, from "bucket" to "bucket"-p;
protect the access to the backupZone endpoints so that no ordinary
user (or integration) can access them, only accessible from the other nodes in the
cluster (obviously) and the server running the rclone-based script;
when there is a failure, just recover all the objects from the *-p
buckets, once again using rclone, to the original buckets or to
filesystem.
This protects from the following failures:
Infra:
Bucket or pool failure;
Object pervasive corruption;
Loss of a site
Human error:
Deletion of versions or objects;
Removal of buckets
Elimination of entire Pools
Notes:
Only the latest version of each object is sync'd to the protected
(*-p) bucket, but if the script runs several times you have the
latest states of the objects through time;
when an object is deleted in the prod bucket, rnode just flags the
object with the DeleteMarker upon sync
this does not scale!! As the number of buckets increases, the time to
sync becomes untenable

AWS S3: change storage class without replicating the object

My bucket has a replication rule to backup the object into another region/bucket.
Now I want to change the storage class in the source object (standard -> infrequent access), but it seems this change, applied through CopyObjectRequest API (java client), is triggering the replication. This is unfortunate because cross-region replication has a cost.
So at the moment the "journey" is the following:
object is stored in standard class, source bucket
I change the storage class to IA
object gets replicated into another region (standard class)
after 1 day it's moved to glacier.
As you can see this is a total waste of money, because the replication will end up moving the very same object into glacier again.
How can I avoid this scenario?
Use a lifecycle policy in the source bucket to convert the current object version to the desired storage class. This should migrate the current object without changing its version-id, and should not trigger a replication event.
Otherwise, you'd need to create objects with the desired storage class from the beginning. There isn't a way for a user action to change an object's storage class without creating a new object version, so the seemingly redundant replication event can't otherwise be avoided -- because you are creating a new object version.

On AIX 6.1, batchman process does not correctly recognize the time zone of the local workstation in Workload Scheduler

On AIX 6.1, the batchman process does not correctly recognize the time zone of the local machine that is set to GMT, even if, in the IBM Workload Scheduler CPU definition, it is correctly set to the correct timezone. You see the following message in the stdlist log :
"10:29:39 24.11.2015|BATCHMAN:AWSBHT126I Time in CPU TZ (America/Chicago): 2015/11/24 04:29 10:29:39
24.11.2015|BATCHMAN:AWSBHT127I Time in system TZ (America/Chicago): 2015/11/24 10:29 10:29:39
24.11.2015|BATCHMAN:+ 10:29:39 24.11.2015|BATCHMAN:+ AWSBHT128I Local time zone time differs from workstation time zone time by 360 minutes."
Batchman does not recognize the correct time zone because AIX 6.1 uses (International Components for Unicode) ICU libraries to manage the timezone of the system, and these ICU libraries are in conflict with the IBM Workload Scheduler libraries.
i have an idea. You should export the TZ environment variable before starting IBM Workload Scheduler to the old POSIX format, for example, CST6CDT . This is an example of a POSIX name convention instead of an Olson name convention (for example America/Chicago). It avoids the new default TimeZone management through the ICU libraries in AIX 6.1, by switching to the old POSIX one (as in AIX 5.x).

Can VMs on Google Compute detect when they've been migrated?

Is it possible to notify an application running on a Google Compute VM when the VM migrates to different hardware?
I'm a developer for an application (HMMER) that makes heavy use of vector instructions (SSE/AVX/AVX-512). The version I'm working on probes its hardware at startup to determine which vector instructions are available and picks the best set.
We've been looking at running our program on Google Compute and other cloud engines, and one concern is that, if a VM migrates from one physical machine to another while running our program, the new machine might support different instructions, causing our program to either crash or execute more slowly than it could.
Is there a way to notify applications running on a Google Compute VM when the VM migrates? The only relevant information I've found is that you can set a VM to perform a shutdown/reboot sequence when it migrates, which would kill any currently-executing programs but would at least let the user know that they needed to restart the program.
We ensure that your VM instances never live migrate between physical machines in a way that would cause your programs to crash the way you describe.
However, for your use case you probably want to specify a minimum CPU platform version. You can use this to ensure that e.g. your instance has the new Skylake AVX instructions available. See the documentation on Specifying the Minimum CPU Platform for further details.
As per the Live Migration docs:
Live migration does not change any attributes or properties of the VM
itself. The live migration process just transfers a running VM from
one host machine to another. All VM properties and attributes remain
unchanged, including things like internal and external IP addresses,
instance metadata, block storage data and volumes, OS and application
state, network settings, network connections, and so on.
Google does provide few controls to set the instance availability policies which also lets you control aspects of live migration. Here they also mention what you can look for to determine when live migration has taken place.
Live migrate
By default, standard instances are set to live migrate, where Google
Compute Engine automatically migrates your instance away from an
infrastructure maintenance event, and your instance remains running
during the migration. Your instance might experience a short period of
decreased performance, although generally most instances should not
notice any difference. This is ideal for instances that require
constant uptime, and can tolerate a short period of decreased
performance.
When Google Compute Engine migrates your instance, it reports a system
event that is published to the list of zone operations. You can review
this event by performing a gcloud compute operations list --zones ZONE
request or by viewing the list of operations in the Google Cloud
Platform Console, or through an API request. The event will appear
with the following text:
compute.instances.migrateOnHostMaintenance
In addition, you can detect directly on the VM when a maintenance event is about to happen.
Getting Live Migration Notices
The metadata server provides information about an instance's
scheduling options and settings, through the scheduling/
directory and the maintenance-event attribute. You can use these
attributes to learn about a virtual machine instance's scheduling
options, and use this metadata to notify you when a maintenance event
is about to happen through the maintenance-event attribute. By
default, all virtual machine instances are set to live migrate so the
metadata server will receive maintenance event notices before a VM
instance is live migrated. If you opted to have your VM instance
terminated during maintenance, then Compute Engine will automatically
terminate and optionally restart your VM instance if the
automaticRestart attribute is set. To learn more about maintenance
events and instance behavior during the events, read about scheduling
options and settings.
You can learn when a maintenance event will happen by querying the
maintenance-event attribute periodically. The value of this
attribute will change 60 seconds before a maintenance event starts,
giving your application code a way to trigger any tasks you want to
perform prior to a maintenance event, such as backing up data or
updating logs. Compute Engine also offers a sample Python script
to demonstrate how to check for maintenance event notices.
You can use the maintenance-event attribute with the waiting for
updates feature to notify your scripts and applications when a
maintenance event is about to start and end. This lets you automate
any actions that you might want to run before or after the event. The
following Python sample provides an example of how you might implement
these two features together.
You can also choose to terminate and optionally restart your instance.
Terminate and (optionally) restart
If you do not want your instance to live migrate, you can choose to
terminate and optionally restart your instance. With this option,
Google Compute Engine will signal your instance to shut down, wait for
a short period of time for your instance to shut down cleanly,
terminate the instance, and restart it away from the maintenance
event. This option is ideal for instances that demand constant,
maximum performance, and your overall application is built to handle
instance failures or reboots.
Look at the Setting availability policies section for more details on how to configure this.
If you use an instance with a GPU or a preemptible instance be aware that live migration is not supported:
Live migration and GPUs
Instances with GPUs attached cannot be live migrated. They must be set
to terminate and optionally restart. Compute Engine offers a 60 minute
notice before a VM instance with a GPU attached is terminated. To
learn more about these maintenance event notices, read Getting live
migration notices.
To learn more about handling host maintenance with GPUs, read
Handling host maintenance on the GPUs documentation.
Live migration for preemptible instances
You cannot configure a preemptible instances to live migrate. The
maintenance behavior for preemptible instances is always set to
TERMINATE by default, and you cannot change this option. It is also
not possible to set the automatic restart option for preemptible
instances.
As Ramesh mentioned, you can specify the minimum CPU platform to ensure you are only migrated to an instance which has at least the minimum CPU platform you specified. At a high level it looks like:
In summary, when you specify a minimum CPU platform:
Compute Engine always uses the minimum CPU platform where available.
If the minimum CPU platform is not available or the minimum CPU platform is older than the zone default, and a newer CPU platform is
available for the same price, Compute Engine uses the newer platform.
If the minimum CPU platform is not available in the specified zone and there are no newer platforms available without extra cost, the
server returns a 400 error indicating that the CPU is unavailable.

windows registry setting for timeoffset from gmt is not getting updated

windows server has a registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation where the field ActiveTimeBias will return the offset in minutes from GMT for the machine that you are running on.
we have a web application that needs to present a user with their local time on an html page that we generate. we do this by having them set a preference for their timezone and then fetch the above value so that can compare server time and client time and do the correct calculation.
on servers that we have built this works great. we are deploying our application into a tier 1 cloud provider who is providing a windows server ami that we configure and use. what we have found is that when you use the clock control panel on a local server, the registry entrys for TimeZoneInformation are correct. when we do the same for this virtual machine, the entrys are correct with the exception of ActiveTimeBias.
Microsoft cautions against diddling the individual value in their usual fashion.
question for the community - has anyone else encountered this problem and if so, how did you fix it?
One usually doesn't program directly against these registry keys. For example, in .Net, we have the TimeZoneInfo class. It uses the same data, but in a much more developer-friendly way.
Regarding your particular question, the ActiveTimeBias key hold the current offset from UTC for the time zone that your server is set to. If your server is in a time zone that follows daylight savings rules, then this will change twice a year. If you manually update your time zone and refresh the registry, you will see it change.
The best advice I can offer is that since the timezone of the server is likely to be unimportant to anyone, you should set it to UTC.
See also: Daylight saving time and time zone best practices