I used the serverless-s3-local to trigger aws lambda locally with serverless framework.
Now it worked when I created or updated a file by function in local s3 folder, but when I added a file or changed the context of the file in local s3 folder manually, it didn’t trigger the lambda.
Is there any good way to solve it?
Thanks for using serverlss-s3-local. I'm the author of serverless-s3-local.
How did you add a file or change the context of the file? Did you use the AWS command as following?
$ AWS_ACCESS_KEY_ID=S3RVER AWS_SECRET_ACCESS_KEY=S3RVER aws --endpoint http://localhost:8000 s3 cp ./face.jpg s3://local-bucket/incoming/face.jpg
{
"ETag": "\"6fa1ab0763e315d8b1a0e82aea14a9d0\""
}
If you don't use the aws command and apply these operations to the files directory, these modifications aren't detected by S3rver which is the local S3 emurator. resize_image example may be useful for you.
I need to copy a zipped file from one AWS S3 folder to another and would like to make that a scheduled AWS Glue job. I cannot find an example for such a simple task. Please help if you know the answer. May be the answer is in AWS Lambda, or other AWS tools.
Thank you very much!
You can do this, and there may be a reason to use AWS Glue: if you have chained Glue jobs and glue_job_#2 is triggered on the successful completion of glue_job_#1.
The simple Python script below moves a file from one S3 folder (source) to another folder (target) using the boto3 library, and optionally deletes the original copy in source directory.
import boto3
bucketname = "my-unique-bucket-name"
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(bucketname)
source = "path/to/folder1"
target = "path/to/folder2"
for obj in my_bucket.objects.filter(Prefix=source):
source_filename = (obj.key).split('/')[-1]
copy_source = {
'Bucket': bucketname,
'Key': obj.key
}
target_filename = "{}/{}".format(target, source_filename)
s3.meta.client.copy(copy_source, bucketname, target_filename)
# Uncomment the line below if you wish the delete the original source file
# s3.Object(bucketname, obj.key).delete()
Reference: Boto3 Docs on S3 Client Copy
Note: I would use f-strings for generating the target_filename, but f-strings are only supported in >= Python3.6 and I believe the default AWS Glue Python interpreter is still 2.7.
Reference: PEP on f-strings
I think you can do it with Glue, but wouldn't it be easier to use the CLI?
You can do the following:
aws s3 sync s3://bucket_1 s3://bucket_2
You could do this with Glue but it's not the right tool for the job.
Far simpler would be to have a Lambda job triggered by a S3 created-object event. There's even a tutorial on AWS Docs on doing (almost) this exact thing.
http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
We ended up using Databricks to do everything.
Glue is not ready. It returns error messages that make no sense. We created tickets and waited for five days still no reply.
the S3 API lets you do a COPY command (really a PUT with a header to indicate source URL) to copy objects within or between buckets. It's used to fake rename()s regularly but you could initiate the call yourself, from anything.
There is no need to D/L any data; within the same S3 region the copy has a bandwidth of about 6-10 MB/s.
AWS CLI cp command can do this.
You can do that by downloading your zip file from s3 to tmp/ directory and then re-uploading the same to s3.
s3 = boto3.resource('s3')
Download file to local spark directory tmp:
s3.Bucket(bucket_name).download_file(DATA_DIR+file,'tmp/'+file)
Upload file from local spark directory tmp:
s3.meta.client.upload_file('tmp/'+file,bucket_name,TARGET_DIR+file)
Now you can write python shell job in glue to do it. Just select Type in Glue job Creation wizard to Python Shell. You can run normal python script in it.
Nothing required. I believe aws data pipeline is a best options. Just use command line option. Scheduled run also possible. I already tried. Successfully worked.
When I log to my S3 console I am unable to download multiple selected files (the WebUI allows downloads only when one file is selected):
https://console.aws.amazon.com/s3
Is this something that can be changed in the user policy or is it a limitation of Amazon?
It is not possible through the AWS Console web user interface.
But it's a very simple task if you install AWS CLI.
You can check the installation and configuration steps on Installing in the AWS Command Line Interface
After that you go to the command line:
aws s3 cp --recursive s3://<bucket>/<folder> <local_folder>
This will copy all the files from given S3 path to your given local path.
Selecting a bunch of files and clicking Actions->Open opened each in a browser tab, and they immediately started to download (6 at a time).
If you use AWS CLI, you can use the exclude along with --include and --recursive flags to accomplish this
aws s3 cp s3://path/to/bucket/ . --recursive --exclude "*" --include "things_you_want"
Eg.
--exclude "*" --include "*.txt"
will download all files with .txt extension. More details - https://docs.aws.amazon.com/cli/latest/reference/s3/
I believe it is a limitation of the AWS console web interface, having tried (and failed) to do this myself.
Alternatively, perhaps use a 3rd party S3 browser client such as http://s3browser.com/
If you have Visual Studio with the AWS Explorer extension installed, you can also browse to Amazon S3 (step 1), select your bucket (step 2), select al the files you want to download (step 3) and right click to download them all (step 4).
The S3 service has no meaningful limits on simultaneous downloads (easily several hundred downloads at a time are possible) and there is no policy setting related to this... but the S3 console only allows you to select one file for downloading at a time.
Once the download starts, you can start another and another, as many as your browser will let you attempt simultaneously.
In case someone is still looking for an S3 browser and downloader I have just tried Fillezilla Pro (it's a paid version). It worked great.
I created a connection to S3 with Access key and secret key set up via IAM. Connection was instant and downloading of all folders and files was fast.
Using AWS CLI, I ran all the downloads in the background using "&" and then waited on all the pids to complete. It was amazingly fast. Apparently the "aws s3 cp" knows to limit the number of concurrent connections because it only ran 100 at a time.
aws --profile $awsProfile s3 cp "$s3path" "$tofile" &
pids[${npids}]=$! ## save the spawned pid
let "npids=npids+1"
followed by
echo "waiting on $npids downloads"
for pid in ${pids[*]}; do
echo $pid
wait $pid
done
I downloaded 1500+ files (72,000 bytes) in about a minute
I wrote a simple shell script to download NOT JUST all files but also all versions of every file from a specific folder under AWS s3 bucket. Here it is & you may find it useful
# Script generates the version info file for all the
# content under a particular bucket and then parses
# the file to grab the versionId for each of the versions
# and finally generates a fully qualified http url for
# the different versioned files and use that to download
# the content.
s3region="s3.ap-south-1.amazonaws.com"
bucket="your_bucket_name"
# note the location has no forward slash at beginning or at end
location="data/that/you/want/to/download"
# file names were like ABB-quarterly-results.csv, AVANTIFEED--quarterly-results.csv
fileNamePattern="-quarterly-results.csv"
# AWS CLI command to get version info
content="$(aws s3api list-object-versions --bucket $bucket --prefix "$location/")"
#save the file locally, if you want
echo "$content" >> version-info.json
versions=$(echo "$content" | grep -ir VersionId | awk -F ":" '{gsub(/"/, "", $3);gsub(/,/, "", $3);gsub(/ /, "", $3);print $3 }')
for version in $versions
do
echo ############### $fileId ###################
#echo $version
url="https://$s3region/$bucket/$location/$fileId$fileNamePattern?versionId=$version"
echo $url
content="$(curl -s "$url")"
echo "$content" >> $fileId$fileNamePattern-$version.csv
echo ############### $i ###################
done
Also you could use the --include "filename" many times in a single command with each time including a different filename within the double quotes, e.g.
aws s3 mycommand --include "file1" --include "file2"
It will save your time rather than repeating the command to download one file at a time.
Also if you are running Windows(tm), WinSCP now allows drag and drop of a selection of multiple files. Including sub-folders.
Many enterprise workstations will have WinSCP installed for editing files on servers by means of SSH.
I am not affiliated, I simply think this was really worth doing.
In my case Aur's didn't work and if you're looking for a quick solution to download all files in a folder just using the browser, you can try entering this snippet in your dev console:
(function() {
const rows = Array.from(document.querySelectorAll('.fix-width-table tbody tr'));
const downloadButton = document.querySelector('[data-e2e-id="button-download"]');
const timeBetweenClicks = 500;
function downloadFiles(remaining) {
if (!remaining.length) {
return
}
const row = remaining[0];
row.click();
downloadButton.click();
setTimeout(() => {
downloadFiles(remaining.slice(1));
}, timeBetweenClicks)
}
downloadFiles(rows)
}())
I have done, by creating shell script using aws cli (i.e : example.sh)
#!/bin/bash
aws s3 cp s3://s3-bucket-path/example1.pdf LocalPath/Download/example1.pdf
aws s3 cp s3://s3-bucket-path/example2.pdf LocalPath/Download/example2.pdf
give executable rights to example.sh (i.e sudo chmod 777 example.sh)
then run your shell script ./example.sh
I think simplest way to download or upload files is to use aws s3 sync command. You can also use it to sync two s3 buckets in same time.
aws s3 sync <LocalPath> <S3Uri> or <S3Uri> <LocalPath> or <S3Uri> <S3Uri>
# Download file(s)
aws s3 sync s3://<bucket_name>/<file_or_directory_path> .
# Upload file(s)
aws s3 sync . s3://<bucket_name>/<file_or_directory_path>
# Sync two buckets
aws s3 sync s3://<1st_s3_path> s3://<2nd_s3_path>
What I usually do is mount the s3 bucket (with s3fs) in a linux machine and zip the files I need into one, then I just download that file from any pc/browser.
# mount bucket in file system
/usr/bin/s3fs s3-bucket -o use_cache=/tmp -o allow_other -o uid=1000 -o mp_umask=002 -o multireq_max=5 /mnt/local-s3-bucket-mount
# zip files into one
cd /mnt/local-s3-bucket-mount
zip all-processed-files.zip *.jpg
import os
import boto3
import json
s3 = boto3.resource('s3', aws_access_key_id="AKIAxxxxxxxxxxxxJWB",
aws_secret_access_key="LV0+vsaxxxxxxxxxxxxxxxxxxxxxry0/LjxZkN")
my_bucket = s3.Bucket('s3testing')
# download file into current directory
for s3_object in my_bucket.objects.all():
# Need to split s3_object.key into path and file name, else it will give error file not found.
path, filename = os.path.split(s3_object.key)
my_bucket.download_file(s3_object.key, filename)
I have been using gcloud-node to upload files to GC Storage, but I noticed that I'm getting a multitude of these remnants in my /tmp directory:
/tmp/3a1a48fa-3d83-4996-8e88-32bc01c36e86
/tmp/3a1a48fa-3d83-4996-8e88-32bc01c36e86/.config
/tmp/3a1a48fa-3d83-4996-8e88-32bc01c36e86/.config/configstore
/tmp/3a1a48fa-3d83-4996-8e88-32bc01c36e86/.config/configstore/gcs-resumable-upload.json
...
Is this normal? Is it just for large upload files?
Is there a way to configure which directory these files goes to (if they must be created?)
Thanks!
This is used when a resumable upload is started. You can opt-out of resumable uploads by setting { resumable: false } as an option to either bucket.upload or file.createWriteStream.
The library used to write the file is configstore. ~/.config is default, but you can override this by setting XDG_CONFIG_HOME.
i am working with s3 bucket. i need to copy an image from my amazon server to s3 bucket. any idea how can i do it? i saw some sample codes but i dont know how to use it.
if (S3::copyObject($sourceBucket, $sourceFile, $destinationBucket, $destinationFile, S3::ACL_PRIVATE)) {
echo "Copied file";
} else {
echo "Failed to copy file";
}
it seems that this code is used only to bucket but not for the server?
thanks for help.
Copy between S3 Buckets
AWS released a command line interface for copying between buckets.
http://aws.amazon.com/cli/
$ aws s3 sync s3://mybucket-src s3://mybucket-target --exclude *.tmp
..
This will copy from one target bucket to another bucket.
I have no tested this, but I believe that this will operate in series, by downloading the files to your system and then uploading to the bucket.
See the documentation here : S3 CLI Documentation
I've used s3cmd for several years, and it's been very reliable. If you're using Ubuntu it's available with:
apt-get install s3cmd
You can also use one of the SDKs to develop your own tool.