Fastest way of transferring an unstructured collection of S3 files to EC2

Fastest way of transferring an unstructured collection of S3 files to EC2 - amazon-s3

If we are downloading an entire virtual directory then there is the method MultipleFileDownload(...) from TransferManager. However MultipleFileDownload(...) does not support downloading an arbitrary list of S3 files objects. What is the best way of downloading such a large list of files?

If you just want to transfer S3 files to EC2's local directory, you can just use aws cli command on your EC2 as follows:
aws s3 cp s3://<bucketname> <local directory> --recursive --include "*"

Related

move files to s3 in EC2

I have S3 bucket in EC2 . I want to remove multiple files between s3 folders . however it showing deleted files but files are still there
command:
aws s3 rm s3://mybucket/path1/publish/test/dummyfile_*.dat
got below message
delete: s3://mybucket/path1/publish/test/dummyfile_*.dat,. But file is still present
can anyone please help

"Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all Regions."
from https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#CoreConcepts
If you make a copy of a S3 object to an EC2 instance, you simply made a copy of it.
You can use aws s3 sync to synchronize S3 objects (files) between S3 and your EC2 instance, see https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

sync local folder structure to s3 root structure

In my pipeline I am trying to sync my local folder (or should I say repository folder) to the s3 bucket. Now I can do the aws s3 sync . s3:// but this off course gives an error, since the bucket is not specified. But basically that is exactly what I want. Exactly how my folder-structure locally is; is how I want in S3.
so locally:
bucket1/file1.txt
bucket1/file2.txt
bucket1/subbucket1/file3.txt
needs to go exactly to root of my s3 account... how to fix this?
btw; the sync might be an overkill since I only want to copy (and overwrite!) to the s3 folders, coming from the root. Not (yet) interested in deleting etc.
what can I do..?

The AWS Command-Line Interface (CLI) aws s3 sync command requires a bucket name.
Therefore, you will either need to write a script that extracts the bucket name and inserts it into the aws s3 sync command, or you'll need to write your own program to use in place of the AWS CLI.
If you have a limited number of buckets and they don't change that often, you could just write a script that repeatedly calls the AWS CLI, such as:
aws s3 sync bucket1/ s3://bucket1/
aws s3 sync bucket2/ s3://bucket2/
etc.

if somebody comes to the same question:
for file in `find -type f`;
do
newFilename="${file#./}"
dirName=$ENVIRONMENT-$(dirname "$newFilename")
#get first part of dir (only root)
dirName="${dirName%%/*}"
echo bucket: $dirName
if aws s3api head-bucket --bucket "$dirName" 2>/dev/null; then
echo "bucket already exists"
else
if [[ $dirName == *"/"* ]]; then
echo $dirName
echo "This bucket is a subfolder and will not be created"
else
aws s3 mb s3://$dirName
fi
fi
aws s3 cp $newFilename s3://$ENVIRONMENT-$newFilename
done
the scripts retrieves all the files that it can find;
then it will check the root directory (relative to the current folder)
it will check it the directory exists as a bucket. If not; it will be created.
And then every file will be copied.
Since i do not know if a root-directory exists (as a bucket) we have to manually check it.
I couldn't use the sync because I might not have an existing bucket.
If you do know that your root directory as a bucket exists; then i would use the sync, one liner vs 10-liner :see_no_evil:.
anyway, that was it for me!

Download s3 bucket files on user's local using aws cli

How to download simple storage service(s3) bucket files directly on user's local machine?

you can check the aws s3 cli so to copy a file from s3.
The following cp command copies a single object to a specified file locally:
aws s3 cp s3://mybucket/test.txt test2.txt
Make sure to use quotes " in case you have spaces in your key
aws s3 cp "s3://mybucket/test with space.txt" "./test with space.txt"

Compress file on S3

I have a 17.7GB file on S3. It was generated as the output of a Hive query, and it isn't compressed.
I know that by compressing it, it'll be about 2.2GB (gzip). How can I download this file locally as quickly as possible when transfer is the bottleneck (250kB/s).
I've not found any straightforward way to compress the file on S3, or enable compression on transfer in s3cmd, boto, or related tools.

S3 does not support stream compression nor is it possible to compress the uploaded file remotely.
If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html
If you need this more frequently
Serving gzipped CSS and JavaScript from Amazon CloudFront via S3

Late answer but I found this working perfectly.
aws s3 sync s3://your-pics .
for file in "$(find . -name "*.jpg")"; do gzip "$file"; echo "$file"; done
aws s3 sync . s3://your-pics --content-encoding gzip --dryrun
This will download all files in s3 bucket to the machine (or ec2 instance), compresses the image files and upload them back to s3 bucket.
Verify the data before removing dryrun flag.

There are now pre-built apps in Lambda that you could use to compress images and files in S3 buckets. So just create a new Lambda function and select a pre-built app of your choice and complete the configuration.
Step 1 - Create a new Lambda function
Step 2 - Search for prebuilt app
Step 3 - Select the app that suits your need and complete the configuration process by providing the S3 bucket names.

bulk upload video files from URL to Amazon S3

After some googling it appears there is no API or tool to upload files from a URL directly to S3 without downloading them first?
I could probably download the files locally first and then upload them to S3. Is thee a good tool (Mac) that lets me batch upload all files in a given directory?
Or are there any PHP scripts I could install on a shared hosting account to download a file at a time and then upload to S3?

The AWS Command Line Interface (CLI) can upload files to Amazon S3, eg:
aws s3 cp file s3://my-bucket/file
aws s3 cp . s3://my-bucket/path --recursive
aws s3 sync . s3://my-bucket/path
The sync command is probably best for your use-case. It can synchronize local files with remote files (only copy new/changed files), or use cp to copy specific files.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Fastest way of transferring an unstructured collection of S3 files to EC2 - amazon-s3

If we are downloading an entire virtual directory then there is the method MultipleFileDownload(...) from TransferManager. However MultipleFileDownload(...) does not support downloading an arbitrary list of S3 files objects. What is the best way of downloading such a large list of files?

If you just want to transfer S3 files to EC2's local directory, you can just use aws cli command on your EC2 as follows: aws s3 cp s3://<bucketname> <local directory> --recursive --include "*"

Related

move files to s3 in EC2

sync local folder structure to s3 root structure

Download s3 bucket files on user's local using aws cli

Compress file on S3

bulk upload video files from URL to Amazon S3

Categories

Resources