AWS cli throws error when copying large files - amazon-s3

I'm trying to copy objects from an s3 bucket to another using aws cli tool.
It works OK for small objects, but on large file buckets, as soon as the copy starts, I get one of the following errors:
copy failed: s3://bucket/file.ogv to s3://bucket-tmp/file.ogv ('Connection aborted.', OSError(0, 'Error'))
or
copy failed: s3://bucket/file.ogv to s3://bucket-tmp/file.ogv An error occurred (NoSuchKey) when calling the UploadPartCopy operation: Unknown
if I include the --no-guess-mime-type I get
fatal error: ('Connection aborted.', OSError(0, 'Error'))
I tryied --debug, but I really didn't understand much of the debug output but I could see OSError(0, 'Error') again in the log.
Anyone has seen anything like this ? in another answer (this one), people told about another tool s3cmd, but I couldn't make it work.
I'm trying to access ceph on a corporate server with path-style urls and https endpoint.
My command:
aws --endpoint-url https://myendpoint.url s3 cp s3://mybucket s3://mybucket-tmp --recursive
Also when I tried to configure s3cmd I get an ungly python debug output with OSError: [Errno 0] Error in the middle.

I discovered that if I use s3api command instead of s3 command it works. Format of working command:
aws --endpoint-url <my-endpoint-url> s3api copy-object --copy-source my-source-bucket/whatever/path/file.txt --key whatever/path/file.txt --bucket my-destination-bucket
It only copys one file at once. You can grab a list of objects in the bucket using s3 command ls or s3api command list-objects

Related

Gitlab CI/CD error: nested array of strings up to 10 levels deep

I am writing a gitlab ci/cd to put encryption on the s3 bucket . I am following the official documentation link from AWS. But while running it on gitlab ci/cd pipeline, I am getting this error on the editor.
This GitLab CI configuration is invalid: jobs:onestage:script config should be a string or a nested array of strings up to 10 levels deep.
The error line is as follow:
aws s3api put-bucket-encryption --bucket bucket-name --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'
Thanks #Joachim-isaksson for your help. It indeed helped me to solve this error. Meanwhile, I am putting the code to solve this error that i have used.
'aws s3api put-bucket-encryption --bucket my-bucket --server-side-encryption-configuration "{\"Rules\": [{\"ApplyServerSideEncryptionByDefault\": {\"SSEAlgorithm\": \"AES256\"}}]}"'

Trying to restore glacier deep archive to different s3 bucket

I am trying to restore the glacier deep archive to a different s3 bucket, but when I run the below command getting error : fatal error: An error occurred (404) when calling the HeadObject operation: Key "cf-ant-prod" does not exist
aws s3 cp s3://xxxxxxx/cf-ant-prod s3://xxxxxxx/atest --force-glacier-transfer --storage-class STANDARD --profile xxx

Move files in S3 bucket to folder based on file name pattern

I have an S3 bucket with a few thousand files where the file names always match the pattern {hostname}.{contenttype}.{yyyyMMddHH}.zip. I want to create a script that will run once a day to move these files into folders based on the year and month in the file name.
If I try the following aws-cli command
aws s3 mv s3://mybucket/*.202001* s3://mybucket/202001/
I get the following error:
fatal error: An error occurred (404) when calling the HeadObject operation: Key "*.202001*" does not exist
Is there an aws-cli command that I could run on a schedule to achieve this?
I think the way forward would be through the --filter parameter used in S3 CLI commands.
So, for your case,
aws s3 mv s3://mybucket/ s3://mybucket/202001/ --recursive --exclude "*" --include "*.202001*"
should probably do the trick.
For scheduling the CLI command to run daily, I think you can refer to On AWS, run an AWS CLI command daily

aws-cli fails to work with one particular S3 bucket on one particular machine

I'm trying to remove the objects (empty bucket) and then copy new ones into an AWS S3 bucket:
aws s3 rm s3://BUCKET_NAME --region us-east-2 --recursive
aws s3 cp ./ s3://BUCKET_NAME/ --region us-east-2 --recursive
The first command fails with the following error:
An error occurred (InvalidRequest) when calling the ListObjects
operation: You are attempting to operate on a bucket in a region that
requires Signature Version 4. You can fix this issue by explicitly
providing the correct region location using the --region argument, the
AWS_DEFAULT_REGION environment variable, or the region variable in the
AWS CLI configuration file. You can get the bucket's location by
running "aws s3api get-bucket-location --bucket BUCKET". Completed 1
part(s) with ... file(s) remaining
Well, the error prompt is self-explanatory but the problem is that I've already applied the solution (I've added the --region argument) and I'm completely sure that it is the correct region (I got the region the same way the error message is suggesting).
Now, to make things even more interesting, the error happens in a gitlab CI environment (let's just say some server). But just before this error occurs, there are other buckets which the exact same command can be executed against and they work. It's worth mentioning that those other buckets are in different regions.
Now, to top it all off, I can execute the command on my personal computer with the same credentials as in CI server!!! So to summarize:
server$ aws s3 rm s3://OTHER_BUCKET --region us-west-2 --recursive <== works
server$ aws s3 rm s3://BUCKET_NAME --region us-east-2 --recursive <== fails
my_pc$ aws s3 rm s3://BUCKET_NAME --region us-east-2 --recursive <== works
Does anyone have any pointers what might the problem be?
For anyone else that might be facing the same problem, make sure your aws is up-to-date!!!
server$ aws --version
aws-cli/1.10.52 Python/2.7.14 Linux/4.13.9-coreos botocore/1.4.42
my_pc$ aws --version
aws-cli/1.14.58 Python/3.6.5 Linux/4.13.0-38-generic botocore/1.9.11
Once I updated the server's aws cli tool, everything worked. Now my server is:
server$ aws --version
aws-cli/1.14.49 Python/2.7.14 Linux/4.13.5-coreos-r2 botocore/1.9.2

AWS EMR --steps

I am running the following .sh to run a command on AWS using EMR:
aws emr create-cluster --name "Big Matrix Re Run 5" --ami-version 3.1.0 --auto-terminate --log-uri FILE LOCATION --enable-debugging --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=c3.xlarge InstanceGroupType=CORE,InstanceCount=3,InstanceType=c3.xlarge --steps NAME AND LOCATION OF FILE
I've deleted the pertinent file name and locations as those aren't my issue, but I am having an issue with the --steps portion of the script.
How do I specify the steps that I want to run in the cluster? The documentation doesn't give any examples.
Here is the error:
Error parsing parameter '--steps': should be: Key value pairs, where values are separated by commas, and multiple pairs are separated by spaces.
--steps Name=string1,Jar=string1,ActionOnFailure=string1,MainClass=string1,Type=string1,Properties=string1,Args=string1,string2 Name=string1,Jar=string1,ActionOnFailure=string1,MainClass=string1,Type=string1,Properties=string1,Args=string1,string2
Thanks!
The documentation page for the AWS Command-Line Interface create-cluster command shows examples for using the --steps parameter.
Steps can be supplied on the command-line, or can refer to files available within HDFS or Amazon S3.
Within HDFS:
aws emr create-cluster --steps file://./multiplefiles.json --ami-version 3.3.1 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate
Within Amazon S3:
aws emr create-cluster --steps Type=HIVE,Name='Hive program',ActionOnFailure=CONTINUE,ActionOnFailure=TERMINATE_CLUSTER,Args=[-f,s3://elasticmapreduce/samples/hive-ads/libs/model-build.q,-d,INPUT=s3://elasticmapreduce/samples/hive-ads/tables,-d,OUTPUT=s3://mybucket/hive-ads/output/2014-04-18/11-07-32,-d,LIBS=s3://elasticmapreduce/samples/hive-ads/libs] --applications Name=Hive --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge