AWS CLI create-function for Lambda with S3 suffixes returns Unknown options - amazon-s3

I have the below batch script for creating my Lambda function via the AWS CLI:
rem -----------------------------------------
rem create or update the lambda function
aws lambda create-function ^
--function-name %LAMBDA_FUNCTION_NAME% ^
--runtime python3.9 ^
--role %LAMBDA_ROLE_ARN% ^
--handler %LAMBDA_HANDLER% ^
--zip-file fileb://%LAMBDA_ZIP_FILE% ^
--profile %AWS_PROFILE% ^
--region %REGION% ^
--timeout 180 ^
--memory-size 1024 ^
--layers %LAMBDA_ARN_LAYER% ^
--environment Variables={PYTHONPATH=python/lib}
#echo on
#echo Deployed the AWS Lambda function %LAMBDA_FUNCTION_NAME% in region %REGION%
#echo off
rem -----------------------------------------
rem add S3 trigger
aws lambda create-event-source-mapping ^
--function-name %LAMBDA_FUNCTION_NAME% ^
--event-source-arn arn:aws:s3:::%S3_BUCKET_NAME% ^
--batch-size 1 ^
--starting-position "LATEST" ^
--profile %AWS_PROFILE% ^
--region %REGION% ^
--event-source-request-parameters Events=s3:ObjectCreated:* Filter='{"Key": {"Suffix": [".MF4",".MFC",".MFE",".MFM"]}}'
However, I get an error for the last part of the create-function:
Unknown options: --event-source-request-parameters, Filter='{Key:, {Suffix:, [.MF4,.MFC,.MFE,.MFM]}}', Events=s3:ObjectCreated:*
In what way is my syntax wrong? I want to use my S3 bucket as the trigger whenever a file with one of the file extensions listed is uploaded.

S3 event notification doesn't need an event source mapping. You should use the following CLI command instead to use your S3 bucket as the trigger:
aws s3api put-bucket-notification-configuration --bucket S3_BUCKET_NAME ...

Related

AWS CLI for S3 Select

I have the following code, which is used to run a SQL query on a keyfile, located in a S3 bucket. This runs perfectly. My question is, I do not wish to have the output written over to an output file. Could I see the output on the screen (my preference #1)? If not, what about an ability to append to the output file, rather than over-write it (my preference #2). I am using the AWS-CLI binaries to run this query. If there is another way, I am happy to try (as long as it is within bash)
aws s3api select-object-content \
--bucket "project2" \
--key keyfile1 \
--expression "SELECT * FROM s3object s where Lower(s._1) = 'email#search.com'" \
--expression-type 'SQL' \
--input-serialization '{"CSV": {"FieldDelimiter": ":"}, "CompressionType": "GZIP"}' \
--output-serialization '{"CSV": {"FieldDelimiter": ":"}}' "OutputFile"
Of course, you can use AWS CLI to do this since stdout is just a special file in linux.
aws s3api select-object-content \
--bucket "project2" \
--key keyfile1 \
--expression "SELECT * FROM s3object s where Lower(s._1) = 'email#search.com'" \
--expression-type 'SQL' \
--input-serialization '{"CSV": {"FieldDelimiter": ":"}, "CompressionType": "GZIP"}' \
--output-serialization '{"CSV": {"FieldDelimiter": ":"}}' /dev/stdout
Note the /dev/stdout in the end.
The AWS CLI does not offer such options.
However, you are welcome to instead call it via an AWS SDK of your choice.
For example, in the boto3 Python SDK, there is a select_object_content() function that returns the data as a stream. You can then read, manipulate, print or save it however you wish.
I think it opens /dev/stdout twice causing kaos.

aws-cli fails to work with one particular S3 bucket on one particular machine

I'm trying to remove the objects (empty bucket) and then copy new ones into an AWS S3 bucket:
aws s3 rm s3://BUCKET_NAME --region us-east-2 --recursive
aws s3 cp ./ s3://BUCKET_NAME/ --region us-east-2 --recursive
The first command fails with the following error:
An error occurred (InvalidRequest) when calling the ListObjects
operation: You are attempting to operate on a bucket in a region that
requires Signature Version 4. You can fix this issue by explicitly
providing the correct region location using the --region argument, the
AWS_DEFAULT_REGION environment variable, or the region variable in the
AWS CLI configuration file. You can get the bucket's location by
running "aws s3api get-bucket-location --bucket BUCKET". Completed 1
part(s) with ... file(s) remaining
Well, the error prompt is self-explanatory but the problem is that I've already applied the solution (I've added the --region argument) and I'm completely sure that it is the correct region (I got the region the same way the error message is suggesting).
Now, to make things even more interesting, the error happens in a gitlab CI environment (let's just say some server). But just before this error occurs, there are other buckets which the exact same command can be executed against and they work. It's worth mentioning that those other buckets are in different regions.
Now, to top it all off, I can execute the command on my personal computer with the same credentials as in CI server!!! So to summarize:
server$ aws s3 rm s3://OTHER_BUCKET --region us-west-2 --recursive <== works
server$ aws s3 rm s3://BUCKET_NAME --region us-east-2 --recursive <== fails
my_pc$ aws s3 rm s3://BUCKET_NAME --region us-east-2 --recursive <== works
Does anyone have any pointers what might the problem be?
For anyone else that might be facing the same problem, make sure your aws is up-to-date!!!
server$ aws --version
aws-cli/1.10.52 Python/2.7.14 Linux/4.13.9-coreos botocore/1.4.42
my_pc$ aws --version
aws-cli/1.14.58 Python/3.6.5 Linux/4.13.0-38-generic botocore/1.9.11
Once I updated the server's aws cli tool, everything worked. Now my server is:
server$ aws --version
aws-cli/1.14.49 Python/2.7.14 Linux/4.13.5-coreos-r2 botocore/1.9.2

Resuming interrupted s3 download with awscli

I was downloading a file using awscli:
$ aws s3 cp s3://mybucket/myfile myfile
But the download was interrupted (computer went to sleep). How can I continue the download? S3 supports the Range header, but awscli s3 cp doesn't let me specify it.
The file is not publicly accessible so I can't use curl to specify the header manually.
There is a "hidden" command in the awscli tool which allows lower level access to S3: s3api.† It is less user friendly (no s3:// URLs and no progress bar) but it does support the range specifier on get-object:
--range (string) Downloads the specified range bytes of an object. For
more information about the HTTP range header, go to
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.
Here's how to continue the download:
$ size=$(stat -f%z myfile) # assumes OS X. Change for your OS
$ aws s3api get-object \
--bucket mybucket \
--key myfile \
--range "bytes=$size-" \
/dev/fd/3 3>>myfile
You can use pv for a rudimentary progress bar:
$ aws s3api get-object \
--bucket mybucket \
--key myfile \
--range "bytes=$size-" \
/dev/fd/3 3>&1 >&2 | pv >> myfile
(The reason for this unnamed pipe rigmarole is that s3api writes a debug message to stdout at the end of the operation, polluting your file. This solution rebinds stdout to stderr and frees up the pipe for regular file contents through an alias. The version without pv could technically write to stderr (/dev/fd/2 and 2>), but if an error occurs s3api writes to stderr, which would then get appended to your file. Thus, it is safer to use a dedicated pipe there, as well.)
† In git speak, s3 is porcelain, and s3api is plumbing.
Use s3cmd it has a --continue function built in. Example:
# Start a download
> s3cmd get s3://yourbucket/yourfile ./
download: 's3://yourbucket/yourfile' -> './yourfile' [1 of 1]
123456789 of 987654321 12.5% in 235s 0.5 MB/s
[ctrl-c] interrupt
# Pick up where you left off
> s3cmd --continue get s3://yourbucket/yourfile ./
Note that S3 cmd is not multithreaded where awscli is multithreaded, e.g. awscli is faster. A currently maintained fork of s3cmd, called s4cmd appears to provide the multi-threaded capabilities while maintaining the usability features of s3cmd:
https://github.com/bloomreach/s4cmd

Travis AWS S3 SDK set cache header for particular file

In my Travis script is there a way when uploading contents to S3 Bucket as follows :
# deploy:
# provider: script
# skip_cleanup: true
# script: "~/.local/bin/aws s3 sync dist s3://mybucket --region=eu-west-1
# --delete"
# before_deploy:
# - npm run build
# - pip install --user awscli
I also want to set a no cache header on a particular file in that bucket (i.e. sw.js). Is that currently possible in the SDK ?
I am afraid that this is not possible using a single s3 sync command. But you may try to execute two commands using exclude and include options. One to sync all except the sw.js and the other one just for sw.js.
script: ~/.local/bin/aws s3 sync dist s3://mybucket --include "*" --exclude "sw.js" --region eu-west-1 --delete ; ~/.local/bin/aws s3 sync dist s3://mybucket --exclude "*" --include "sw.js" --region eu-west-1 --delete --cache-control "no-cache" --metadata-directive REPLACE
Note: --metadata-directive REPLACE option is necessary for non-multipart copies.

Filter S3 list-objects results to find a key matching a pattern

I would like to use the AWS CLI to query the contents of a bucket and see if a particular file exists, but the bucket contains thousands of files. How can I filter the results to only show key names that match a pattern? For example:
aws s3api list-objects --bucket myBucketName --query "Contents[?Key==*mySearchPattern*]"
The --query argument uses JMESPath expressions. JMESPath has an internal function contains that allows you to search for a string pattern.
This should give the desired results:
aws s3api list-objects --bucket myBucketName --query "Contents[?contains(Key, `mySearchPattern`)]"
(With Linux I needed to use single quotes ' rather than back ticks ` around mySearchPattern.)
If you want to search for keys starting with certain characters, you can also use the --prefix argument:
aws s3api list-objects --bucket myBucketName --prefix "myPrefixToSearchFor"
I tried on Ubuntu 14, awscli 1.2
--query "Contents[?contains(Key,'stati')].Key"
--query "Contents[?contains(Key,\'stati\')].Key"
--query "Contents[?contains(Key,`stati`)].Key"
Illegal token value '?contains(Key,'stati')].Key'
After upgraded the aws version to 1.16 , worked with
--query "Contents[?contains(Key,'stati')].Key"