Upload folder with subfolders using S3 and the AWS console - amazon-s3

When I try to upload a folder with subfolders to S3 through the AWS console, only the files are uploaded not the subfolders.
You also can't select a folder. It always requires opening the folder first before you can select anything.
Is this even possible?

I suggest you to use AWS CLI. As it is very easy using command line and awscli
aws s3 cp SOURCE_DIR s3://DEST_BUCKET/ --recursive
or you can use sync by
aws s3 sync SOURCE_DIR s3://DEST_BUCKET/
Remember that you have to install aws cli and configure it by using your Access Key ID and Secrect Access Key ID
pip install --upgrade --user awscli
aws configure

You don't need Enhanced Uploader (which I believe does not exist anymore) or any third-party software (that always has a risk that someone will steal your private data or access keys from the S3 bucket or even from all AWS resources).
Since the new AWS S3 Web Upload manager supports drag'n'drop for files and folders, just login to https://console.aws.amazon.com/s3/home and start the uploading process as usual, then just drag the folder from your desktop directly to the S3 page.

The Amazon S3 Console now supports uploading entire folder hierarchies. Enable the Ehanced Uploader in the Upload dialog and then add one or more folders to the upload queue.
http://console.aws.amazon.com/s3

Normally I use the Enhanced Uploader available via the AWS management console. However, since that requires Java it can cause problems. I found s3cmd to be a great command-line replacement. Here's how I used it:
s3cmd --configure # enter access keys, enable HTTPS, etc.
s3cmd sync <path-to-folder> s3://<path-to-s3-bucket>/

Execute something similar to the following command:
aws s3 cp local_folder_name s3://s3_bucket_name/local_folder_name/ --recursive

I was having problem with finding the enhanced uploader tool for uploading folder and subfolders inside it in S3. But rather than finding a tool I could upload the folders along with the subfolders inside it by simply dragging and dropping it in the S3 bucket.
Note: This drag and drop feature doesn't work in Safari. I've tested it in Chrome and it works just fine.
After you drag and drop the files and folders, this screen opens up finally to upload the content.

Solution 1:
var AWS = require('aws-sdk');
var path = require("path");
var fs = require('fs');
const uploadDir = function(s3Path, bucketName) {
let s3 = new AWS.S3({
accessKeyId: process.env.S3_ACCESS_KEY,
secretAccessKey: process.env.S3_SECRET_KEY
});
function walkSync(currentDirPath, callback) {
fs.readdirSync(currentDirPath).forEach(function (name) {
var filePath = path.join(currentDirPath, name);
var stat = fs.statSync(filePath);
if (stat.isFile()) {
callback(filePath, stat);
} else if (stat.isDirectory()) {
walkSync(filePath, callback);
}
});
}
walkSync(s3Path, function(filePath, stat) {
let bucketPath = filePath.substring(s3Path.length+1);
let params = {Bucket: bucketName, Key: bucketPath, Body: fs.readFileSync(filePath) };
s3.putObject(params, function(err, data) {
if (err) {
console.log(err)
} else {
console.log('Successfully uploaded '+ bucketPath +' to ' + bucketName);
}
});
});
};
uploadDir("path to your folder", "your bucket name");
Solution 2:
aws s3 cp SOURCE_DIR s3://DEST_BUCKET/ --recursive

Custom endpoint
if you have a custom endpoint implemented by your IT, try this
aws s3 cp <local-dir> s3://bucket-name/<destination-folder>/ --recursive --endpoint-url https://<s3-custom-endpoint.lan>

It's worth mentioning that if you are simply using S3 for backups, you should just zip the folder and then upload that. This Will save you upload time and costs.
If you are not sure how to do efficient zipping from the terminal have a look here for OSX.
And $ zip -r archive_name.zip folder_to_compress for Windows.
Alternatively a client such as 7-Zip would be sufficient for Windows users

I do not see Python answers here.
You can script folder upload using Python/boto3.
Here's how to recursively get all file names from directory tree:
def recursive_glob(treeroot, extention):
results = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(treeroot)
for f in files if f.endswith(extention)]
return results
Here's how to upload a file to S3 using Python/boto:
k = Key(bucket)
k.key = s3_key_name
k.set_contents_from_file(file_handle, cb=progress, num_cb=20, reduced_redundancy=use_rr )
I used these ideas to write Directory-Uploader-For-S3

I ended up here when trying to figure this out. With the version that's up there right now you can drag and drop a folder into it and it works, even though it doesn't allow you to select a folder when you open the upload dialogue.

You can drag and drop those folders. Drag and drop functionality is supported only for the Chrome and Firefox browsers.
Please refer this link
https://docs.aws.amazon.com/AmazonS3/latest/user-guide/upload-objects.html

You can use Transfer Manager to upload multiple files, directories etc
More info on:
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-s3-transfermanager.html

You can upload files by dragging and dropping or by pointing and clicking. To upload folders, you must drag and drop them. Drag and drop functionality is supported only for the Chrome and Firefox browsers

Drag and drop is only usable for a relatively small set of files. If you need to upload thousands of them in one go, then the CLI is the way to go. I managed to upload 2,000,00+ files using 1 command...

Related

Is it possible to trigger lambda by changing the file of local s3 manually in serverless framework?

I used the serverless-s3-local to trigger aws lambda locally with serverless framework.
Now it worked when I created or updated a file by function in local s3 folder, but when I added a file or changed the context of the file in local s3 folder manually, it didn’t trigger the lambda.
Is there any good way to solve it?
Thanks for using serverlss-s3-local. I'm the author of serverless-s3-local.
How did you add a file or change the context of the file? Did you use the AWS command as following?
$ AWS_ACCESS_KEY_ID=S3RVER AWS_SECRET_ACCESS_KEY=S3RVER aws --endpoint http://localhost:8000 s3 cp ./face.jpg s3://local-bucket/incoming/face.jpg
{
"ETag": "\"6fa1ab0763e315d8b1a0e82aea14a9d0\""
}
If you don't use the aws command and apply these operations to the files directory, these modifications aren't detected by S3rver which is the local S3 emurator. resize_image example may be useful for you.

Could we use AWS Glue just copy a file from one S3 folder to another S3 folder?

I need to copy a zipped file from one AWS S3 folder to another and would like to make that a scheduled AWS Glue job. I cannot find an example for such a simple task. Please help if you know the answer. May be the answer is in AWS Lambda, or other AWS tools.
Thank you very much!
You can do this, and there may be a reason to use AWS Glue: if you have chained Glue jobs and glue_job_#2 is triggered on the successful completion of glue_job_#1.
The simple Python script below moves a file from one S3 folder (source) to another folder (target) using the boto3 library, and optionally deletes the original copy in source directory.
import boto3
bucketname = "my-unique-bucket-name"
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(bucketname)
source = "path/to/folder1"
target = "path/to/folder2"
for obj in my_bucket.objects.filter(Prefix=source):
source_filename = (obj.key).split('/')[-1]
copy_source = {
'Bucket': bucketname,
'Key': obj.key
}
target_filename = "{}/{}".format(target, source_filename)
s3.meta.client.copy(copy_source, bucketname, target_filename)
# Uncomment the line below if you wish the delete the original source file
# s3.Object(bucketname, obj.key).delete()
Reference: Boto3 Docs on S3 Client Copy
Note: I would use f-strings for generating the target_filename, but f-strings are only supported in >= Python3.6 and I believe the default AWS Glue Python interpreter is still 2.7.
Reference: PEP on f-strings
I think you can do it with Glue, but wouldn't it be easier to use the CLI?
You can do the following:
aws s3 sync s3://bucket_1 s3://bucket_2
You could do this with Glue but it's not the right tool for the job.
Far simpler would be to have a Lambda job triggered by a S3 created-object event. There's even a tutorial on AWS Docs on doing (almost) this exact thing.
http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
We ended up using Databricks to do everything.
Glue is not ready. It returns error messages that make no sense. We created tickets and waited for five days still no reply.
the S3 API lets you do a COPY command (really a PUT with a header to indicate source URL) to copy objects within or between buckets. It's used to fake rename()s regularly but you could initiate the call yourself, from anything.
There is no need to D/L any data; within the same S3 region the copy has a bandwidth of about 6-10 MB/s.
AWS CLI cp command can do this.
You can do that by downloading your zip file from s3 to tmp/ directory and then re-uploading the same to s3.
s3 = boto3.resource('s3')
Download file to local spark directory tmp:
s3.Bucket(bucket_name).download_file(DATA_DIR+file,'tmp/'+file)
Upload file from local spark directory tmp:
s3.meta.client.upload_file('tmp/'+file,bucket_name,TARGET_DIR+file)
Now you can write python shell job in glue to do it. Just select Type in Glue job Creation wizard to Python Shell. You can run normal python script in it.
Nothing required. I believe aws data pipeline is a best options. Just use command line option. Scheduled run also possible. I already tried. Successfully worked.

Amazon S3 console: download multiple files at once

When I log to my S3 console I am unable to download multiple selected files (the WebUI allows downloads only when one file is selected):
https://console.aws.amazon.com/s3
Is this something that can be changed in the user policy or is it a limitation of Amazon?
It is not possible through the AWS Console web user interface.
But it's a very simple task if you install AWS CLI.
You can check the installation and configuration steps on Installing in the AWS Command Line Interface
After that you go to the command line:
aws s3 cp --recursive s3://<bucket>/<folder> <local_folder>
This will copy all the files from given S3 path to your given local path.
Selecting a bunch of files and clicking Actions->Open opened each in a browser tab, and they immediately started to download (6 at a time).
If you use AWS CLI, you can use the exclude along with --include and --recursive flags to accomplish this
aws s3 cp s3://path/to/bucket/ . --recursive --exclude "*" --include "things_you_want"
Eg.
--exclude "*" --include "*.txt"
will download all files with .txt extension. More details - https://docs.aws.amazon.com/cli/latest/reference/s3/
I believe it is a limitation of the AWS console web interface, having tried (and failed) to do this myself.
Alternatively, perhaps use a 3rd party S3 browser client such as http://s3browser.com/
If you have Visual Studio with the AWS Explorer extension installed, you can also browse to Amazon S3 (step 1), select your bucket (step 2), select al the files you want to download (step 3) and right click to download them all (step 4).
The S3 service has no meaningful limits on simultaneous downloads (easily several hundred downloads at a time are possible) and there is no policy setting related to this... but the S3 console only allows you to select one file for downloading at a time.
Once the download starts, you can start another and another, as many as your browser will let you attempt simultaneously.
In case someone is still looking for an S3 browser and downloader I have just tried Fillezilla Pro (it's a paid version). It worked great.
I created a connection to S3 with Access key and secret key set up via IAM. Connection was instant and downloading of all folders and files was fast.
Using AWS CLI, I ran all the downloads in the background using "&" and then waited on all the pids to complete. It was amazingly fast. Apparently the "aws s3 cp" knows to limit the number of concurrent connections because it only ran 100 at a time.
aws --profile $awsProfile s3 cp "$s3path" "$tofile" &
pids[${npids}]=$! ## save the spawned pid
let "npids=npids+1"
followed by
echo "waiting on $npids downloads"
for pid in ${pids[*]}; do
echo $pid
wait $pid
done
I downloaded 1500+ files (72,000 bytes) in about a minute
I wrote a simple shell script to download NOT JUST all files but also all versions of every file from a specific folder under AWS s3 bucket. Here it is & you may find it useful
# Script generates the version info file for all the
# content under a particular bucket and then parses
# the file to grab the versionId for each of the versions
# and finally generates a fully qualified http url for
# the different versioned files and use that to download
# the content.
s3region="s3.ap-south-1.amazonaws.com"
bucket="your_bucket_name"
# note the location has no forward slash at beginning or at end
location="data/that/you/want/to/download"
# file names were like ABB-quarterly-results.csv, AVANTIFEED--quarterly-results.csv
fileNamePattern="-quarterly-results.csv"
# AWS CLI command to get version info
content="$(aws s3api list-object-versions --bucket $bucket --prefix "$location/")"
#save the file locally, if you want
echo "$content" >> version-info.json
versions=$(echo "$content" | grep -ir VersionId | awk -F ":" '{gsub(/"/, "", $3);gsub(/,/, "", $3);gsub(/ /, "", $3);print $3 }')
for version in $versions
do
echo ############### $fileId ###################
#echo $version
url="https://$s3region/$bucket/$location/$fileId$fileNamePattern?versionId=$version"
echo $url
content="$(curl -s "$url")"
echo "$content" >> $fileId$fileNamePattern-$version.csv
echo ############### $i ###################
done
Also you could use the --include "filename" many times in a single command with each time including a different filename within the double quotes, e.g.
aws s3 mycommand --include "file1" --include "file2"
It will save your time rather than repeating the command to download one file at a time.
Also if you are running Windows(tm), WinSCP now allows drag and drop of a selection of multiple files. Including sub-folders.
Many enterprise workstations will have WinSCP installed for editing files on servers by means of SSH.
I am not affiliated, I simply think this was really worth doing.
In my case Aur's didn't work and if you're looking for a quick solution to download all files in a folder just using the browser, you can try entering this snippet in your dev console:
(function() {
const rows = Array.from(document.querySelectorAll('.fix-width-table tbody tr'));
const downloadButton = document.querySelector('[data-e2e-id="button-download"]');
const timeBetweenClicks = 500;
function downloadFiles(remaining) {
if (!remaining.length) {
return
}
const row = remaining[0];
row.click();
downloadButton.click();
setTimeout(() => {
downloadFiles(remaining.slice(1));
}, timeBetweenClicks)
}
downloadFiles(rows)
}())
I have done, by creating shell script using aws cli (i.e : example.sh)
#!/bin/bash
aws s3 cp s3://s3-bucket-path/example1.pdf LocalPath/Download/example1.pdf
aws s3 cp s3://s3-bucket-path/example2.pdf LocalPath/Download/example2.pdf
give executable rights to example.sh (i.e sudo chmod 777 example.sh)
then run your shell script ./example.sh
I think simplest way to download or upload files is to use aws s3 sync command. You can also use it to sync two s3 buckets in same time.
aws s3 sync <LocalPath> <S3Uri> or <S3Uri> <LocalPath> or <S3Uri> <S3Uri>
# Download file(s)
aws s3 sync s3://<bucket_name>/<file_or_directory_path> .
# Upload file(s)
aws s3 sync . s3://<bucket_name>/<file_or_directory_path>
# Sync two buckets
aws s3 sync s3://<1st_s3_path> s3://<2nd_s3_path>
What I usually do is mount the s3 bucket (with s3fs) in a linux machine and zip the files I need into one, then I just download that file from any pc/browser.
# mount bucket in file system
/usr/bin/s3fs s3-bucket -o use_cache=/tmp -o allow_other -o uid=1000 -o mp_umask=002 -o multireq_max=5 /mnt/local-s3-bucket-mount
# zip files into one
cd /mnt/local-s3-bucket-mount
zip all-processed-files.zip *.jpg
import os
import boto3
import json
s3 = boto3.resource('s3', aws_access_key_id="AKIAxxxxxxxxxxxxJWB",
aws_secret_access_key="LV0+vsaxxxxxxxxxxxxxxxxxxxxxry0/LjxZkN")
my_bucket = s3.Bucket('s3testing')
# download file into current directory
for s3_object in my_bucket.objects.all():
# Need to split s3_object.key into path and file name, else it will give error file not found.
path, filename = os.path.split(s3_object.key)
my_bucket.download_file(s3_object.key, filename)

gcloud-node leaves behind files in /tmp?

I have been using gcloud-node to upload files to GC Storage, but I noticed that I'm getting a multitude of these remnants in my /tmp directory:
/tmp/3a1a48fa-3d83-4996-8e88-32bc01c36e86
/tmp/3a1a48fa-3d83-4996-8e88-32bc01c36e86/.config
/tmp/3a1a48fa-3d83-4996-8e88-32bc01c36e86/.config/configstore
/tmp/3a1a48fa-3d83-4996-8e88-32bc01c36e86/.config/configstore/gcs-resumable-upload.json
...
Is this normal? Is it just for large upload files?
Is there a way to configure which directory these files goes to (if they must be created?)
Thanks!
This is used when a resumable upload is started. You can opt-out of resumable uploads by setting { resumable: false } as an option to either bucket.upload or file.createWriteStream.
The library used to write the file is configstore. ~/.config is default, but you can override this by setting XDG_CONFIG_HOME.

how to copy file from amazon server to s3 bucket

i am working with s3 bucket. i need to copy an image from my amazon server to s3 bucket. any idea how can i do it? i saw some sample codes but i dont know how to use it.
if (S3::copyObject($sourceBucket, $sourceFile, $destinationBucket, $destinationFile, S3::ACL_PRIVATE)) {
echo "Copied file";
} else {
echo "Failed to copy file";
}
it seems that this code is used only to bucket but not for the server?
thanks for help.
Copy between S3 Buckets
AWS released a command line interface for copying between buckets.
http://aws.amazon.com/cli/
$ aws s3 sync s3://mybucket-src s3://mybucket-target --exclude *.tmp
..
This will copy from one target bucket to another bucket.
I have no tested this, but I believe that this will operate in series, by downloading the files to your system and then uploading to the bucket.
See the documentation here : S3 CLI Documentation
I've used s3cmd for several years, and it's been very reliable. If you're using Ubuntu it's available with:
apt-get install s3cmd
You can also use one of the SDKs to develop your own tool.