Move files in S3 bucket to folder based on file name pattern - amazon-s3

I have an S3 bucket with a few thousand files where the file names always match the pattern {hostname}.{contenttype}.{yyyyMMddHH}.zip. I want to create a script that will run once a day to move these files into folders based on the year and month in the file name.
If I try the following aws-cli command
aws s3 mv s3://mybucket/*.202001* s3://mybucket/202001/
I get the following error:
fatal error: An error occurred (404) when calling the HeadObject operation: Key "*.202001*" does not exist
Is there an aws-cli command that I could run on a schedule to achieve this?

I think the way forward would be through the --filter parameter used in S3 CLI commands.
So, for your case,
aws s3 mv s3://mybucket/ s3://mybucket/202001/ --recursive --exclude "*" --include "*.202001*"
should probably do the trick.
For scheduling the CLI command to run daily, I think you can refer to On AWS, run an AWS CLI command daily

Related

AWS cli throws error when copying large files

I'm trying to copy objects from an s3 bucket to another using aws cli tool.
It works OK for small objects, but on large file buckets, as soon as the copy starts, I get one of the following errors:
copy failed: s3://bucket/file.ogv to s3://bucket-tmp/file.ogv ('Connection aborted.', OSError(0, 'Error'))
or
copy failed: s3://bucket/file.ogv to s3://bucket-tmp/file.ogv An error occurred (NoSuchKey) when calling the UploadPartCopy operation: Unknown
if I include the --no-guess-mime-type I get
fatal error: ('Connection aborted.', OSError(0, 'Error'))
I tryied --debug, but I really didn't understand much of the debug output but I could see OSError(0, 'Error') again in the log.
Anyone has seen anything like this ? in another answer (this one), people told about another tool s3cmd, but I couldn't make it work.
I'm trying to access ceph on a corporate server with path-style urls and https endpoint.
My command:
aws --endpoint-url https://myendpoint.url s3 cp s3://mybucket s3://mybucket-tmp --recursive
Also when I tried to configure s3cmd I get an ungly python debug output with OSError: [Errno 0] Error in the middle.
I discovered that if I use s3api command instead of s3 command it works. Format of working command:
aws --endpoint-url <my-endpoint-url> s3api copy-object --copy-source my-source-bucket/whatever/path/file.txt --key whatever/path/file.txt --bucket my-destination-bucket
It only copys one file at once. You can grab a list of objects in the bucket using s3 command ls or s3api command list-objects

How to upload a directory to a AWS S3 bucket along with a KMS ID through CLI?

I want to upload a directory (A folder consist of other folders and .txt files) to a folder(partition) in a specific S3 bucket along with a given KMS-id via CLI. The following command which is to upload a jar file to an S3 bucket, was found.
The command I found for upload a jar:
aws s3 sync /?? s3://???-??-dev-us-east-2-813426848798/build/tmp/snapshot --sse aws:kms --sse-kms-key-id alias/nbs/dev/data --delete --region us-east-2 --exclude "*" --include "*.?????"
Suppose;
Location (Bucket Name with folder name) - "s3://abc-app-us-east-2-12345678/tmp"
KMS-id - https://us-east-2.console.aws.amazon.com/kms/home?region=us-east-2#/kms/keys/aa11-123aa-45/
Directory to be uploaded - myDirectory
And I want to know;
Whether the same command can be used to upload a directory with a
bunch of files and folders in it?
If so, how this command should be changed?
the cp command works this way:
aws s3 cp ./localFolder s3://awsexamplebucket/abc --recursive --sse aws:kms --sse-kms-key-id a1b2c3d4-e5f6-7890-g1h2-123456789abc
I haven't tried sync command with kms, but the way you use sync is,
aws s3 sync ./localFolder s3://awsexamplebucket/remotefolder

Files will not move or copy from folder on file system to local bucket

I am using the command
aws s3 mv --recursive Folder s3://bucket/dsFiles/
The aws console is not giving me any feedback. I change the permissions of the directory
sudo chmod -R 666 ds000007_R2.0.1/
It looks like AWS is passing through those files and giving "File does not exist" for every directory.
I am confused about why AWS is not actually performing the copy is there some size limitation or recursion depth limitation?
I believe you want to cp, not mv. Try the following:
aws s3 cp $local/folder s3://your/bucket --recursive --include "*".
Source, my answer here.

AWS EMR --steps

I am running the following .sh to run a command on AWS using EMR:
aws emr create-cluster --name "Big Matrix Re Run 5" --ami-version 3.1.0 --auto-terminate --log-uri FILE LOCATION --enable-debugging --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=c3.xlarge InstanceGroupType=CORE,InstanceCount=3,InstanceType=c3.xlarge --steps NAME AND LOCATION OF FILE
I've deleted the pertinent file name and locations as those aren't my issue, but I am having an issue with the --steps portion of the script.
How do I specify the steps that I want to run in the cluster? The documentation doesn't give any examples.
Here is the error:
Error parsing parameter '--steps': should be: Key value pairs, where values are separated by commas, and multiple pairs are separated by spaces.
--steps Name=string1,Jar=string1,ActionOnFailure=string1,MainClass=string1,Type=string1,Properties=string1,Args=string1,string2 Name=string1,Jar=string1,ActionOnFailure=string1,MainClass=string1,Type=string1,Properties=string1,Args=string1,string2
Thanks!
The documentation page for the AWS Command-Line Interface create-cluster command shows examples for using the --steps parameter.
Steps can be supplied on the command-line, or can refer to files available within HDFS or Amazon S3.
Within HDFS:
aws emr create-cluster --steps file://./multiplefiles.json --ami-version 3.3.1 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate
Within Amazon S3:
aws emr create-cluster --steps Type=HIVE,Name='Hive program',ActionOnFailure=CONTINUE,ActionOnFailure=TERMINATE_CLUSTER,Args=[-f,s3://elasticmapreduce/samples/hive-ads/libs/model-build.q,-d,INPUT=s3://elasticmapreduce/samples/hive-ads/tables,-d,OUTPUT=s3://mybucket/hive-ads/output/2014-04-18/11-07-32,-d,LIBS=s3://elasticmapreduce/samples/hive-ads/libs] --applications Name=Hive --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge

working with ebextensions in aws

I am working with AWS elastic beanstalk and since I can't modify the httdp conf file to AllowOverride All I was suggested to work with ebextensions:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers.html
Hence, I have created an .ebextensions folder, and within it a setup.config file with the following command:
container_commands:
01_setup_apache:
command: "cp .ebextensions/enable_mod_rewrite.conf /etc/httpd/conf.d/enable_mod_rewrite.conf"
I am not even sure if this is the proper command to enable mod rewrite, but I get the following error while trying to upload the instance:
[Instance: i-80bbbd77] Command failed on instance. Return code: 1 Output: cp: cannot stat '.ebextensions/enable_mod_rewrite.conf': No such file or directory. container_command 01_setup_apache in .ebextensions/setup.config failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
You can't copy from ".ebextensions/enable_mod_rewrite.conf" because that relative path will not be valid from the init script. Using absolute paths may work, but there i'd suggest you fetch from S3 instead:
container_commands:
01_setup_apache:
command: "aws s3 cp s3://[my-ebextensions-bucket]/enable_mod_rewrite.conf /etc/httpd/conf.d/enable_mod_rewrite.conf"
But if you need complex changes to your instance, it may be a better option to run a docker container instead: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_docker.html