Snapshots on Amazon EC2

Snapshots on Amazon EC2 - amazon-s3

I used CreateImageRequest to take a snapshot of a running EC2 machine. When I log into the EC2 console I see the following:
AMI - An image that I can launch
Volume - I believe that this is the disk image?
Snapshot - Another entry related to the snapshot?
Can anyone explain the difference in usage of each of these? For example, is there any way to create a 'snapshot' without also having an associated 'AMI', and in that case how do I launch an EBS-backed copy of this snapshot?
Finally, is there a simple API to delete an AMI and all associated data (snapshot, volume and AMI). It turns out that our scripts only store the AMI identifier, and not the rest of the data, and so it seems that that's only enough information to just Deregister an image.

The AMI represents the launchable machine configuration - it does NOT actually contain any of the machine's data, just references to it. An AMI can get its disk image either from S3 or (in your case) an EBS snapshot.
The EBS Volume is associated with a running instance. It's basically a read-write disk image. When you terminate the instance, the volume will automatically be destroyed (this may take a few minutes, note).
The snapshot is a frozen image of the EBS volume at the point in time when you created the AMI. Snapshots can be associated with AMIs, but not all snapshots are part of an AMI - you can create them manually too.
More information on EBS-backed AMIs can be found in the user's guide. It is important to have a good grasp on these concepts, so I would recommend giving the entire users guide a good read-over before going any further.
If you want to delete all data associated with an AMI, you will have to use the DescribeImageAttribute API call on the AMI's blockDeviceMapping attribute to find the snapshot ID; then delete the AMI and snapshot, in that order.

This small PS script takes the AMI parameter (stored in a variable), grab the snapshots of the given AMI ID by storing them into an array, and finally perform the required clean up (unregister & remove the snapshots).
# Unregister and clean AMI snapshots
$amiName = 'ami-XXXX' # replace this with the AMI ID you need to clean-up
$myImage = Get-EC2Image $amiName
$count = $myImage[0].BlockDeviceMapping.Count
# Loop and store snapshotID(s) to an array
$mySnaps = #()
for ($i=0; $i -lt $count; $i++)
{
$snapId = $myImage[0].BlockDeviceMapping[$i].Ebs | foreach {$_.SnapshotId}
$mySnaps += $snapId
}
# Perform the clean up
Write-Host "Unregistering" $amiName
Unregister-EC2Image $amiName
foreach ($item in $mySnaps)
{
Write-Host 'Removing' $item
Remove-EC2Snapshot $item
}
Clear-Variable mySnaps

Related

Automate unmounting and mounting step because of Service principal expiration in Databricks

I need to automate task which is related to azure databricks.we have configured a job in azure databricks and suddenly my service prinicipal secrets get expired and my notebook failed. on the next day I created new secrets and unmount --> mount it again and job worked. I know the way to create new secret or get the alert before expiring using logic app and manually will change it.
But how to manage unmount --> mount step automatically? As SP can be used in multiple project.
This is how I am mounting and mount_point used across notebook.

Here is the sample code for Unmounting and mounting notebook.
Check if the path is mounted or not, If it is not mounted yet mount the path with the given condition.
def mount_blob_storage_from_sas(dbutils, storage_account_name, container_name, mount_path, sas_token, unmount_if_exists = True):
if([item.mountPoint for item in dbutils.fs.mounts()].count(mount_path) > 0):
if unmount_if_exists:
print('Mount point already taken - unmounting: '+mount_path)
dbutils.fs.unmount(mount_path)
else:
print('Mount point already taken - ignoring: '+mount_path)
return
print('Mounting external storage in: '+mount_path)
dbutils.fs.mount(
source = "wasbs://{0}#{1}.blob.core.windows.net".format(container_name, storage_account_name),
mount_point = mount_path,
extra_configs = {"fs.azure.sas.{0}.{1}.blob.core.windows.net".format(container_name, storage_account_name): sas_token })
Create a job to run the notebook with given specific time period.
Provision the notebook path and cluster to create the JOB.
Schedule a time to trigger the notebook in Add Schedule .
Select Scheduled path to provide time to trigger.
Job will trigger at given time to run the notebook.

Azure Java SDK: container with multiple volumes

I need to mount 2 separate directories as volumes to a newly creted container.
So far I've found the way to mount only one volume since there's no way to add a file share volume via withNewAzureFileShareVolume more than once.
Here's my code:
ContainerGroup containerGroup = azure.containerGroups()
.define(containerName)
.withRegion(Region.US_EAST)
.withExistingResourceGroup("myResourceGroup")
.withLinux()
.withPrivateImageRegistry("registry")
.withNewAzureFileShareVolume("aci-share", shareName)
.defineContainerInstance(containerName)
.withImage("image")
.withVolumeMountSetting("aci-share", "/usr/local/dir1/")
.withVolumeMountSetting("aci-share-2", "/usr/local/dir2/")
.attach()
.withDnsPrefix(team)
.create();
A new storage account gets created with a single file share and I get the following error:
Volumes 'aci-share-2' in container 'team44783530d' are not found

Amazon SageMaker notebook rl_deepracer_coach_robomaker - Write log CSV on S3 after simulation

I created my first notebook instance on Amazon SageMaker.
Next I opened the Jupyter notebook and I used the SageMaker Example in the section Reinforcement Learning rl_deepracer_coach_robomaker.ipynb. The question is addressed principally to those who are familiar with this notebook.
There you can launch a training process and a RoboMaker simulation application to start the learning process for an autonomous car.
When a simulation job is launched, one can access to the log file, which is visualised by default in a CloudWatch console. Some of the informations that appear in the log file can be modified in the script deepracer_env.py in /src/robomaker/environments subdirectory.
I would like to "bypass" the CloudWatch console, saving the log file informations like episode, total reward, number of steps, coordinates of the car, steering and throttle etc. in a dataframe or csv file to be written somewhere on the S3 at the end of the simulation.
Something similar has been done in the main notebook rl_deepracer_coach_robomaker.ipynb to plot the metrics for a training job, namely the training reward per episode. There one can see that
csv_file_name = "worker_0.simple_rl_graph.main_level.main_level.agent_0.csv"
is called from the S3, but I simply cannot find where this csv is generated to mimic the process.

You can create a csv file in the /opt/ml/output/intermediate/ folder, and the file will be saved in the following directory:
s3://<s3_bucket>/<s3_prefix>/output/intermediate/<csv_file_name>
However, it is not clear to me where exactly you will create such a file. DeepRacer notebook uses two machines, one for training (SageMaker instance) and one for simulations (RoboMaker instance). The above method will only work in a SageMaker instance, but much of what you would like to log such as ("Total rewards" in an episode) is actually in RoboMaker instance. For RoboMaker instances, the intermediate folder feature doesn't exist, and you'll have to save the file to s3 yourself using the boto library. Here is an example of doing that: https://qiita.com/hengsokvisal/items/329924dd9e3f65dd48e7
There is a way to download the CloudWatch logs to a file. This way you can just print, save the logs and parse it. Assuming you are executing from a notebook cell:
STREAM_NAME= <your stream name as given by RoboMaker CloudWatch logs>
task = !aws logs create-export-task --task-name "copy_deepracer_logs" --log-group-name "/aws/robomaker/SimulationJobs" --log-stream-name-prefix $STREAM_NAME --destination "<s3_bucket>" --destination-prefix "<s3_prefix>" --from <unix timestamp in milliseconds> --to <unix timestamp in milliseconds>
task_id = json.loads(''.join(task))['taskId']
The export is an asynchronous call, so give it a few minutes to download. If you can print the task_id, then the export is done.

ArcGis Offline map layer changes synchronization

In my WPF application I’m trying to use off-line map functionality. Right now my feature service is configured for data sync and I’m able to create data replica on server and download local copy of geodatabase.
gdbSyncTask = await GeodatabaseSyncTask.CreateAsync(_featureServiceUri);
Envelope extent = new Envelope(xmin, ymin, xmax, ymax, new SpatialReference(wkidStart));
GenerateGeodatabaseParameters generateParams = await _gdbSyncTask.CreateDefaultGenerateGeodatabaseParametersAsync(extent);
_generateGdbJob = _gdbSyncTask.GenerateGeodatabase(generateParams, _gdbPath);
_generateGdbJob.JobChanged += GenerateGdbJobChanged;
_generateGdbJob.ProgressChanged += ((object sender, EventArgs e) =>
{
UpdateProgressBar();
});
_generateGdbJob.Start();
After initial synchronization, I’m able to successfully work with map in off-line mode. This includes operations like adding new geometries or editing existing polygons inside local DB.
However, when I’m trying to synchronize changes back to server – I’m getting no results.
To perform data synchronization with local database – I’m using the following code:
SyncGeodatabaseParameters parameters = new SyncGeodatabaseParameters()
{
GeodatabaseSyncDirection = SyncDirection.Bidirectional,
RollbackOnFailure = false
};
Geodatabase gdb = await Geodatabase.OpenAsync(this.GetGdbPath());
foreach (GeodatabaseFeatureTable table in gdb.GeodatabaseFeatureTables)
{
long id = table.ServiceLayerId;
SyncLayerOption option = new SyncLayerOption(id);
option.SyncDirection = SyncDirection.Bidirectional;
parameters.LayerOptions.Add(option);
}
_gdbSyncTask = await GeodatabaseSyncTask.CreateAsync(_featureServiceUri);
SyncGeodatabaseJob job = _gdbSyncTask.SyncGeodatabase(parameters, gdb);
job.JobChanged += SyncJob_JobChanged;
job.ProgressChanged += SyncJob_ProgressChanged;
job.Start();
Everything goes well. The synchronization ends with status “Succeeded”. The messages logged by the SyncGeodatabaseJob are like on the screen below:
However – when I open edited feature layer from server inside map web client I cannot found any of my local changes. In the serve database I can also see that no new records were created during synchronization.
Interesting think is that when I open “Replica” data inside web I can see the following information:
Replica Server Gen: 2
Creation Date: 2018/02/07 10:49:54 UTC
Last Sync Date: 2018/02/07 10:49:54 UTC
The “Last Sync Data” is equal to replica “Creation date” However, in the replica log in ArcMap I can see the following information:
Can anyone can tell me how should I interpret above described situation? Am I missing some steps in my code? Or maybe some configuration feature is missing on the server? It looks like data modifications are successfully pushed back to replica on server but after that replica is not synchronized with server database (should it work automatically?).
I’m a “fresh” person regarding ArcGis development so any help will be appreciated

Thanks for all the answers. It occurred that there is versioning enabled on the server database and the offline, versioned changes was not reconciled to the server.
After running reconcile/post script (http://desktop.arcgis.com/en/arcmap/10.3/manage-data/geodatabases/automate-reconcile-post-after-sync.htm) off-line changes started to be visibile to other system users.

The code looks ok on fast look so I would assume that there is something going on in the setup.
What do you get back from the sync operation after the sync has completed? Note that you can just use await syncJob.GetResultsAsync to start the job and wait the results.
How is the Feature Service set up on the server? Please refer https://enterprise.arcgis.com/en/server/latest/publish-services/linux/prepare-data-for-offline-use.htm for the different ways to set these things.

Cassandra DSE, restore from S3 timeouts

I'm trying to test S3 backup/restore functionality.
What I did:
installed DSE +OpsCenter on Amazon
Scheduled hourly backup of all keyspaces (60 MB total size). got 10 backups in the morning.
Terminate instances and create new one
try to get my data back. No luck. OpsCenter can't connect to my S3 bucket
it takes >10 min now...
What do I do wrong?
UPD:
finally, got response:

I believe that this may be OPSC-5915 (sorry no public bug tracker) which is fixed in the upcoming 5.2.0 release.
The summary is that the API calls will still work as expected but the UI is not pushing the destination information to the API endpoint correctly.
You can confirm that this is the error you are experiencing thusly:
1) goto /etc/opscenter/clusters/<cluster_name>.conf (or similar location depending on if you've done a tarball install/etc)
2) Find the destination ID that matches your bucket, it'll look something like b699738d9bd8409c82e664b543f24030
3) Confirm the clustername in your opsc URLs, it'll look something like localhost:8888/my_cluster
4) Manually hit the API to retrieve your backup list
curl localhost:8888/<clustername>/backups?amount=6\&last_seen=\&list_all=1\&destination=<destination ID>
It'll look like this
curl localhost:8888/dse/backups?amount=6\&last_seen=\&list_all=1\&destination=b699738d9bd8409c82e664b543f24030
5) You should get back a json, confirm that your backup is listed
{"opscenter_adhoc_2014-12-17-20-22-57-UTC": {"keyspaces": {"OpsCenter":...
If you see your backup in the JSON, then opsc sees your backup and this is indeed OPSC-5915, so that's at least confirmed.
If this is your case, we can work around it by manually hitting the restore API (this is admittedly a bit more involved).
http://docs.datastax.com/en/opscenter/5.1/api/docs/backups.html#backups
It'll look a bit like this:
BACKUP='opscenter_4a269167-96c1-40c7-84b7-b070c6bcd0cd_2012-06-07-18-00-00-UTC'
curl -X POST
http://192.168.1.1:8888/Test_Cluster/backups/restore/$BACKUP
-d '{
"destination": "fe85800f3f4043a88fbe76fc45b22b19",
"keyspaces": {
"Keyspace1": {
"column-families: ["users", "dates"],
"truncate": true
},
"OpsCenter": {
"truncate": false
}
},
}'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Snapshots on Amazon EC2 - amazon-s3

Related

Automate unmounting and mounting step because of Service principal expiration in Databricks

Azure Java SDK: container with multiple volumes

Amazon SageMaker notebook rl_deepracer_coach_robomaker - Write log CSV on S3 after simulation

ArcGis Offline map layer changes synchronization

Cassandra DSE, restore from S3 timeouts

Categories

Resources