Filter S3 list-objects results to find a key matching a pattern - amazon-s3

I would like to use the AWS CLI to query the contents of a bucket and see if a particular file exists, but the bucket contains thousands of files. How can I filter the results to only show key names that match a pattern? For example:
aws s3api list-objects --bucket myBucketName --query "Contents[?Key==*mySearchPattern*]"

The --query argument uses JMESPath expressions. JMESPath has an internal function contains that allows you to search for a string pattern.
This should give the desired results:
aws s3api list-objects --bucket myBucketName --query "Contents[?contains(Key, `mySearchPattern`)]"
(With Linux I needed to use single quotes ' rather than back ticks ` around mySearchPattern.)
If you want to search for keys starting with certain characters, you can also use the --prefix argument:
aws s3api list-objects --bucket myBucketName --prefix "myPrefixToSearchFor"

I tried on Ubuntu 14, awscli 1.2
--query "Contents[?contains(Key,'stati')].Key"
--query "Contents[?contains(Key,\'stati\')].Key"
--query "Contents[?contains(Key,`stati`)].Key"
Illegal token value '?contains(Key,'stati')].Key'
After upgraded the aws version to 1.16 , worked with
--query "Contents[?contains(Key,'stati')].Key"

Related

AWS CLI for S3 Select

I have the following code, which is used to run a SQL query on a keyfile, located in a S3 bucket. This runs perfectly. My question is, I do not wish to have the output written over to an output file. Could I see the output on the screen (my preference #1)? If not, what about an ability to append to the output file, rather than over-write it (my preference #2). I am using the AWS-CLI binaries to run this query. If there is another way, I am happy to try (as long as it is within bash)
aws s3api select-object-content \
--bucket "project2" \
--key keyfile1 \
--expression "SELECT * FROM s3object s where Lower(s._1) = 'email#search.com'" \
--expression-type 'SQL' \
--input-serialization '{"CSV": {"FieldDelimiter": ":"}, "CompressionType": "GZIP"}' \
--output-serialization '{"CSV": {"FieldDelimiter": ":"}}' "OutputFile"
Of course, you can use AWS CLI to do this since stdout is just a special file in linux.
aws s3api select-object-content \
--bucket "project2" \
--key keyfile1 \
--expression "SELECT * FROM s3object s where Lower(s._1) = 'email#search.com'" \
--expression-type 'SQL' \
--input-serialization '{"CSV": {"FieldDelimiter": ":"}, "CompressionType": "GZIP"}' \
--output-serialization '{"CSV": {"FieldDelimiter": ":"}}' /dev/stdout
Note the /dev/stdout in the end.
The AWS CLI does not offer such options.
However, you are welcome to instead call it via an AWS SDK of your choice.
For example, in the boto3 Python SDK, there is a select_object_content() function that returns the data as a stream. You can then read, manipulate, print or save it however you wish.
I think it opens /dev/stdout twice causing kaos.

Using s3cmd, how do I retreived the newest folder by "Last modfied" date in an s3 directory

I have a directory containing folders whose folder names are created using timestamps. I want use s3cmd to find the file with the most recent "Last Modified" value. If that is not possible, are the solutions to these previous questions the way to go?
looking for s3cmd download command for a certain date
Using S3cmd, how do I get the first and last file in a folder?
Can s3cmd do this natively, or do I have to retrieve all the folder names and sort through them?
Using the AWS Command-Line Interface (CLI), you can list the most recent file with:
aws s3api list-objects --bucket my-bucket-name --prefix folder1/folder2/ --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
The first (oldest) object would be:
aws s3api list-objects --bucket my-bucket-name --prefix folder1/folder2/ --query 'sort_by(Contents, &LastModified)[0].Key' --output text

How to update ACL for all S3 objects in a folder with AWS CLI?

As part of an automated process in CodeBuild I want to update Access Control List for all files in a given folder (or more specifically all objects with given prefix). How to do it in a single line of bash code?
The following one liner works perfectly
aws s3api list-objects --bucket $BUCKET_NAME$ --prefix $FOLDER_NAME$
--query "(Contents)[].[Key]" --output text | while read line ; do aws s3api put-object-acl --acl public-read --bucket $BUCKET_NAME$ --key
$line ; done
it's not formatted as code, so that it's readable without scrolling!
You can use aws s3 cp
aws s3 cp --grants foo=bar=baz s3://mybucket/mydir s3://mybucket/mydir
Reference https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
list all objects and modify acl by put-object-acl
acl=public-read
aws s3 ls s3://$bucket --recursive --endpoint-url=$endpoint | awk '{print $4}' > tos-objects.txt
cat tos-objects.txt | while read object
do
echo -e "set acl of \033[31m $object \033[0m as $acl"
aws s3api put-object-acl --bucket $bucket --key $object --acl $acl --endpoint-url=$endpoint
done

aws-cli fails to work with one particular S3 bucket on one particular machine

I'm trying to remove the objects (empty bucket) and then copy new ones into an AWS S3 bucket:
aws s3 rm s3://BUCKET_NAME --region us-east-2 --recursive
aws s3 cp ./ s3://BUCKET_NAME/ --region us-east-2 --recursive
The first command fails with the following error:
An error occurred (InvalidRequest) when calling the ListObjects
operation: You are attempting to operate on a bucket in a region that
requires Signature Version 4. You can fix this issue by explicitly
providing the correct region location using the --region argument, the
AWS_DEFAULT_REGION environment variable, or the region variable in the
AWS CLI configuration file. You can get the bucket's location by
running "aws s3api get-bucket-location --bucket BUCKET". Completed 1
part(s) with ... file(s) remaining
Well, the error prompt is self-explanatory but the problem is that I've already applied the solution (I've added the --region argument) and I'm completely sure that it is the correct region (I got the region the same way the error message is suggesting).
Now, to make things even more interesting, the error happens in a gitlab CI environment (let's just say some server). But just before this error occurs, there are other buckets which the exact same command can be executed against and they work. It's worth mentioning that those other buckets are in different regions.
Now, to top it all off, I can execute the command on my personal computer with the same credentials as in CI server!!! So to summarize:
server$ aws s3 rm s3://OTHER_BUCKET --region us-west-2 --recursive <== works
server$ aws s3 rm s3://BUCKET_NAME --region us-east-2 --recursive <== fails
my_pc$ aws s3 rm s3://BUCKET_NAME --region us-east-2 --recursive <== works
Does anyone have any pointers what might the problem be?
For anyone else that might be facing the same problem, make sure your aws is up-to-date!!!
server$ aws --version
aws-cli/1.10.52 Python/2.7.14 Linux/4.13.9-coreos botocore/1.4.42
my_pc$ aws --version
aws-cli/1.14.58 Python/3.6.5 Linux/4.13.0-38-generic botocore/1.9.11
Once I updated the server's aws cli tool, everything worked. Now my server is:
server$ aws --version
aws-cli/1.14.49 Python/2.7.14 Linux/4.13.5-coreos-r2 botocore/1.9.2

Create Sub folder in S3 Bucket?

Already i have Root bucket(Bigdate).now i want to create NEWFOLDER (year) inside Bigdate bucket in s3 bucket. then create NEWFOLDER(MONTH) inside year.
aws s3 mb s3://bigdata -->Bucket created
aws s3 mb s3://bigdata/Year/ --> It not working
Use the below syntax, this is what I am using to create bucket and subfolders. Don't forget the "/" at end of the folder name.
aws s3api put-object --bucket <your-bucket-name> --key <folder-name>/test.txt --body yourfile.txt
After Googling for 2 hours, this CLI Command worked for me:
aws s3api put-object --bucket root-bucket-name --key new-dir-name/
Windows bat file example:
SET today=%Date:~-10,2%%Date:~-7,2%%Date:~-4,4%
aws s3api put-object --bucket root-backup-sets --key %today%/
If local file is foo.txt, and remote "folder" Year does not yet exist, then to create it, just put the file at the designated path:
$ aws s3 cp foo.txt s3://bigdata/Year/ --recursive
Or if local folder is YearData containing foo.txt, bar.txt,
$ aws s3 cp YearData s3://bigdata/Year/ --recursive
upload: YearData/foo.txt to s3://bigdata/Year/foo.txt
upload: YearData/bar.txt to s3://bigdata/Year/bar.txt
See also:
http://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
and first, http://docs.aws.amazon.com/cli/latest/reference/configure