I want to copy an EBS snapshot to my S3 bucket, but i cannot find a way to do it after trying and researching.
I shall be grateful to you for any information that could get me started on a solution.
There is an answer within the AWS forums, but it's rather roundabout:
Create a temporary EBS volume from the snapshot. (Snapshots: Actions: Create volume)
Create a temporary EC2 Linux instance, install aws cli
Attach the volume to the instance and mount. (EBS Volumes: Actions: Attach volume - must be same availability zone)
find name of mounted snapshot volume from lsblk - eg /dev/xvdj
Copy the volume contents to your system - eg sudo bash -c "dd if=/dev/xvdj bs=8M | gzip > /home/ubuntu/volbk.gz"
Copy your .gz file to S3 - aws s3 cp ~/volbk.gz s3://my-bucket-name
check your s3 bucket contents arrived ok; unmount the snapshot volume
Terminate the instance.
Delete the snapshot EBS volume.
from here, with my additions (Nov 2 answer):
https://forums.aws.amazon.com/thread.jspa?messageID=151285
copy-snapshot command is the AWS CLI command that copies the snapshot of EBS volume and stores it in Amazon S3. You can copy the snapshot within the same region or from one region to another.
This example command copies the snapshot of arbitrary id from one region to another.
aws --region us-east-1 ec2 copy-snapshot --source-region us-west-2 --source-snapshot-id snap-066877671789bd71b --description "This is my copied snapshot."
for more info about this refer https://docs.aws.amazon.com/cli/latest/reference/ec2/copy-snapshot.html
With Amazon EBS, you can create point-in-time snapshots of volumes, which can be stored in Amazon S3. After you've created a snapshot and it has finished copying to Amazon S3, you can copy it from one AWS region to another, or within the same region. The snapshot copy ID is different than the ID of the original snapshot.
EBS snapshots are stored in Amazon S3. However, you will not find your snapshots in any of your S3 buckets. AWS uses the S3 infrastructure to store your EBS snapshots, but you cannot access them while they reside in S3.
You can copy the AWS EBS Snapshot using either AWS EC2 Console or Command Line.
i) Copy EBS snapshot using Console-:
Open the EC2 console-> Choose snapshot in the navigation pane-> Choose copy from actions list -> In Copy Snapshot dialog box provide necessary details like destination region, description, encryption etc and select copy.
ii) Copy EBS snapshot using Command line-:
Run the below command in AWS CLI:
aws --region <destination region> ec2 copy-snapshot --source -<source region> --source -snapshot-id <snap-0xyz9999999> --description
Related
I have S3 bucket in EC2 . I want to remove multiple files between s3 folders . however it showing deleted files but files are still there
command:
aws s3 rm s3://mybucket/path1/publish/test/dummyfile_*.dat
got below message
delete: s3://mybucket/path1/publish/test/dummyfile_*.dat,. But file is still present
can anyone please help
"Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all Regions."
from https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#CoreConcepts
If you make a copy of a S3 object to an EC2 instance, you simply made a copy of it.
You can use aws s3 sync to synchronize S3 objects (files) between S3 and your EC2 instance, see https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
I need to backup our Google Storage buckets. Versioning is not enough.
I was thinking about:
backup to s3 - is there an automated bucket sync from GS->S3 or out-of-the-box solution for scheduled transfers between buckets?
backup to another GS bucket - in the same gc project, a coldline bucket "replica" with read-only privs to most users and some automated process to replicate/sync the data?
any other ideas?
thanks:)
You could use gsutil rsync to do this:
gsutil -m rsync -rd gs://your-bucket s3://your-bucket
(similarly for syncing between GCS buckets).
You would need to set up a cron job or something similar to cause this to run periodically.
As mentioned in a comment, GCS Transfer is what you are looking for, at least for the part: "backup to another GS bucket".
From the doc:
Transfer data to your Cloud Storage buckets from Amazon Simple Storage
Service (S3), HTTP/HTTPS servers, or other buckets. You can schedule
one-time or daily transfers, and you can filter files based on name
prefix and when they were changed.
I have created a AWS s3 buckets and here uploaded many of images but now i want to move all images to other AWS s3 buckets.
so can we direct copy buckets or link to other AWS server.
Please provide suggestion.
You can use the AWS Command-Line Interface (CLI) S3 modules cp ( copy ) command to copy files from bucket to bucket:
aws s3 cp S3://mybucket/file.jpg S3://anotherbucket/file.jpg
See cp command documentation.
I would like to transfer data from a table in BigQuery, into another one in Redshift.
My planned data flow is as follows:
BigQuery -> Google Cloud Storage -> Amazon S3 -> Redshift
I know about Google Cloud Storage Transfer Service, but I'm not sure it can help me. From Google Cloud documentation:
Cloud Storage Transfer Service
This page describes Cloud Storage Transfer Service, which you can use
to quickly import online data into Google Cloud Storage.
I understand that this service can be used to import data into Google Cloud Storage and not to export from it.
Is there a way I can export data from Google Cloud Storage to Amazon S3?
You can use gsutil to copy data from a Google Cloud Storage bucket to an Amazon bucket, using a command such as:
gsutil -m rsync -rd gs://your-gcs-bucket s3://your-s3-bucket
Note that the -d option above will cause gsutil rsync to delete objects from your S3 bucket that aren't present in your GCS bucket (in addition to adding new objects). You can leave off that option if you just want to add new objects from your GCS to your S3 bucket.
Go to any instance or cloud shell in GCP
First of all configure your AWS credentials in your GCP
aws configure
if this is not recognising the install AWS CLI follow this guide https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
follow this URL for AWS configure
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
Attaching my screenshot
Then using gsutil
gsutil -m rsync -rd gs://storagename s3://bucketname
16GB data transferred in some minutes
Using Rclone (https://rclone.org/).
Rclone is a command line program to sync files and directories to and from
Google Drive
Amazon S3
Openstack Swift / Rackspace cloud files / Memset Memstore
Dropbox
Google Cloud Storage
Amazon Drive
Microsoft OneDrive
Hubic
Backblaze B2
Yandex Disk
SFTP
The local filesystem
Using the gsutil tool we can do a wide range of bucket and object management tasks, including:
Creating and deleting buckets.
Uploading, downloading, and deleting objects.
Listing buckets and objects. Moving, copying, and renaming objects.
we can copy data from a Google Cloud Storage bucket to an amazon s3 bucket using gsutil rsync and gsutil cp operations. whereas
gsutil rsync collects all metadata from the bucket and syncs the data to s3
gsutil -m rsync -r gs://your-gcs-bucket s3://your-s3-bucket
gsutil cp copies the files one by one and as the transfer rate is good it copies 1 GB in 1 minute approximately.
gsutil cp gs://<gcs-bucket> s3://<s3-bucket-name>
if you have a large number of files with high data volume then use this bash script and run it in the background with multiple threads using the screen command in amazon or GCP instance with AWS credentials configured and GCP auth verified.
Before running the script list all the files and redirect to a file and read the file as input in the script to copy the file
gsutil ls gs://<gcs-bucket> > file_list_part.out
Bash script:
#!/bin/bash
echo "start processing"
input="file_list_part.out"
while IFS= read -r line
do
command="gsutil cp ${line} s3://<bucket-name>"
echo "command :: $command :: $now"
eval $command
retVal=$?
if [ $retVal -ne 0 ]; then
echo "Error copying file"
exit 1
fi
echo "Copy completed successfully"
done < "$input"
echo "completed processing"
execute the Bash script and write the output to a log file to check the progress of completed and failed files.
bash file_copy.sh > /root/logs/file_copy.log 2>&1
I needed to transfer 2TB of data from Google Cloud Storage bucket to Amazon S3 bucket.
For the task, I created the Google Compute Engine of V8CPU (30 GB).
Allow Login using SSH on the Compute Engine.
Once logedin create and empty .boto configuration file to add AWS credential information. Added AWS credentials by taking the reference from the mentioned link.
Then run the command:
gsutil -m rsync -rd gs://your-gcs-bucket s3://your-s3-bucket
The data transfer rate is ~1GB/s.
Hope this help.
(Do not forget to terminate the compute instance once the job is done)
For large amounts of large files (100MB+) you might get issues with broken pipes and other annoyances, probably due to multipart upload requirement (as Pathead mentioned).
For that case you're left with simple downloading all files to your machine and uploading them back. Depending on your connection and data amount, it might be more effective to create VM instance to utilize high-speed connection and ability to run it in the background on different machine than yours.
Create VM machine (make sure the service account has access to your buckets), connect via SSH and install AWS CLI (apt install awscli) and configure the access to S3 (aws configure).
Run these two lines, or make it a bash script, if you have many buckets to copy.
gsutil -m cp -r "gs://$1" ./
aws s3 cp --recursive "./$1" "s3://$1"
(It's better to use rsync in general, but cp was faster for me)
Tools like gsutil and aws s3 cp won't use multipart uploads/downloads, so will have poor performance for large files.
Skyplane is a much faster alternative for transferring data between clouds (up to 110x for large files). You can transfer data with the command:
skyplane cp -r s3://aws-bucket-name/ gcs://google-bucket-name/
(disclaimer: I am a contributor)
I'm running hive over EMR,
and need to copy some files to all EMR instances.
One way as I understand is just to copy files to the local file system on each node the other is to copy the files to the HDFS however I haven't found a simple way to copy stright from S3 to HDFS.
What is the best way to go about this?
the best way to do this is to use Hadoop's distcp command. Example (on one of the cluster nodes):
% ${HADOOP_HOME}/bin/hadoop distcp s3n://mybucket/myfile /root/myfile
This would copy a file called myfile from an S3 bucket named mybucket to /root/myfile in HDFS. Note that this example assumes you are using the S3 file system in "native" mode; this means that Hadoop sees each object in S3 as a file. If you use S3 in block mode instead, you would replace s3n with s3 in the example above. For more info about the differences between native S3 and block mode, as well as an elaboration on the example above, see http://wiki.apache.org/hadoop/AmazonS3.
I found that distcp is a very powerful tool. In addition to being able to use it to copy a large amount of files in and out of S3, you can also perform fast cluster-to-cluster copies with large data sets. Instead of pushing all the data through a single node, distcp uses multiple nodes in parallel to perform the transfer. This makes distcp considerably faster when transferring large amounts of data, compared to the alternative of copying everything to the local file system as an intermediary.
Now Amazon itself has a wrapper implemented over distcp, namely : s3distcp .
S3DistCp is an extension of DistCp that is optimized to work with
Amazon Web Services (AWS), particularly Amazon Simple Storage Service
(Amazon S3). You use S3DistCp by adding it as a step in a job flow.
Using S3DistCp, you can efficiently copy large amounts of data from
Amazon S3 into HDFS where it can be processed by subsequent steps in
your Amazon Elastic MapReduce (Amazon EMR) job flow. You can also use
S3DistCp to copy data between Amazon S3 buckets or from HDFS to Amazon
S3
Example Copy log files from Amazon S3 to HDFS
This following example illustrates how to copy log files stored in an Amazon S3 bucket into HDFS. In this example the --srcPattern option is used to limit the data copied to the daemon logs.
elastic-mapreduce --jobflow j-3GY8JC4179IOJ --jar \
s3://us-east-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar \
--args '--src,s3://myawsbucket/logs/j-3GY8JC4179IOJ/node/,\
--dest,hdfs:///output,\
--srcPattern,.*daemons.*-hadoop-.*'
Note that according to Amazon, at http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/FileSystemConfig.html "Amazon Elastic MapReduce - File System Configuration", the S3 Block FileSystem is deprecated and its URI prefix is now s3bfs:// and they specifically discourage using it since "it can trigger a race condition that might cause your job flow to fail".
According to the same page, HDFS is now 'first-class' file system under S3 although it is ephemeral (goes away when the Hadoop jobs ends).