I'm trying to copy s3 object with boto3 command like below
import boto3
client = boto3.client('s3')
client.copy_object(Bucket=bucket_name, ContentEncoding='gzip', CopySource=copy_source, Key=new_key)
To copy the object succeeded, but ContentEncoding metadata was not added to the object.
When I use the console to add Content-Encoding metadata, there was no problem.
But using python boto3 copy command, it cannot do that.
Here's a document link about client.copy_object()
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy_object
And the application versions are like this.
python=2.7.16
boto3=1.0.28
botocore=1.13.50
Thank you in advance.
Try adding MetadataDirective='REPLACE' to your copy_object call
client.copy_object(Bucket=bucket_name, ContentEncoding='gzip', CopySource=copy_source, Key=new_key, MetadataDirective='REPLACE')
Related
I am using the boto3 API to update the S3 metadata on an object.
I am making use of How to update metadata of an existing object in AWS S3 using python boto3?
My code looks like this:
s3_object = s3.Object(bucket,key)
new_metadata = {'foo':'bar'}
s3_object.metadata.update(new_metadata)
s3_object.copy_from(CopySource={'Bucket':bucket,'Key':key}, Metadata=s3_object.metadata, MetadataDirective='REPLACE')
This code fails when the object is larger than 5GB. I get this error:
botocore.exceptions.ClientError: An error occurred (InvalidRequest) when calling the CopyObject operation: The specified copy source is larger than the maximum allowable size for a copy source: 5368709120
How does one update the metadata on an object larger than 5GB?
Due to the size of your object, try invoking a multipart upload and use the copy_from argument. See the boto3 docs here for more information:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.MultipartUploadPart.copy_from
Apparently, you can't just update the metadata - you need to re-copy the object to S3. You can copy it from s3 back to s3, but you can't just update, which is annoying for objects in the 100-500GB range.
Code:
import boto3
s3_cli = boto3.client('s3')
object_summary = s3_cli.head_object(
Bucket='test-cf',
Key='RDS.template',
VersionId='szA3ws4bH6k.rDXOEAchlh1x3OgthNEB'
)
print('LastModified: {}'.format(object_summary.get('LastModified')))
print('StorageClass: {}'.format(object_summary.get('StorageClass')))
print('Metadata: {}'.format(object_summary.get('Metadata')))
print('ContentLength(KB): {}'.format(object_summary.get('ContentLength')/1024))
Output:
LastModified: 2017-06-08 09:22:43+00:00
StorageClass: None
Metadata: {}
ContentLength(KB): 15
Am unable to get the StorageClass of the key using boto3 sdk. I can see the storage class set as standard from the aws console. I have also tried using s3.ObjectSummary and also s3.ObjectVersion methods in boto3 s3 resouces, but they also returned None.
Not sure why it is returning None. Meanwhile, use the following code to get the storage class. Let me check my version of Boto3.
bucket = s3.Bucket('test-cf')
for object in bucket.objects.all():
print object.key, object.storage_class
I am trying to upload content on amazon s3 but I am getting this error:
boto3.exceptions.unknownapiversionerror: The 's3' resource does not an
API Valid API versions are: 2006-03-01
import boto3
boto3.resource('s3',**AWS_ACCESS_KEY_ID**,**AWS_PRIVATE_KEY**)
bucket = s3.Bucket( **NAME OF BUCKET**)
obj = bucket.Object(**KEY**)
obj.upload_fileobj(**FILE OBJECT**)
The error is caused by exception raised on "DataNotFound" as in the
boto3.Session source code. Perhaps the developer didn't realize people make the mistake for NOT passing the correct object.
If you read the boto3 documentation example, this is the correct way to upload data.
import boto3
boto3.resource('s3',**AWS_ACCESS_KEY_ID**,**AWS_PRIVATE_KEY**)
bucket = s3.Bucket( **NAME OF BUCKET**)
obj = bucket.Object("prefix/object_key_name")
# You must pass the file object !
with open('filename', 'rb') as fileobject:
obj.upload_fileobj(fileobject)
I'm trying to test my Luigi pipelines inside a vagrant machine using FakeS3 to simulate my S3 endpoints. For boto to be able to interact with FakeS3 the connection must be setup with the OrdinaryCallingFormat as in:
from boto.s3.connection import S3Connection, OrdinaryCallingFormat
conn = S3Connection('XXX', 'XXX', is_secure=False,
port=4567, host='localhost',
calling_format=OrdinaryCallingFormat())
but when using Luigi this connection is buried in the s3 module. I was able to pass most of the options by modifying my luigi.cfg and adding an s3 section as in
[s3]
host=127.0.0.1
port=4567
aws_access_key_id=XXX
aws_secret_access_key=XXXXXX
is_secure=0
but I don't know how to pass the required object for the calling_format.
Now I'm stuck and don't know how to proceed. Options I can think of:
Figure out how to pass the OrdinaryCallingFormat to S3Connection through luigi.cfg
Figure out how to force boto to always use this calling format in my Vagrant machine, by setting an unknown option to me either in .aws/config or boto.cfg
Make FakeS3 to accept the default calling_format used by boto that happens to be SubdomainCallingFormat (whatever it means).
Any ideas about how to fix this?
Can you not pass it into the constructor as kwargs for the S3Client?
client = S3Client(aws_access_key, aws_secret_key,
{'calling_format':OrdinaryCallingFormat()})
target = S3Target('s3://somebucket/test', client=client)
I did not encounter any problem when using boto3 connect to fakeS3.
import boto3
s3 = boto3.client(
"s3", region_name="fakes3",
use_ssl=False,
aws_access_key_id="",
aws_secret_access_key="",
endpoint_url="http://localhost:4567"
)
no specially calling method required.
Perhaps I am wrong that you really need OrdinaryCallingFormat, If my code doesn't work, please go through the github topic boto3 support on :
https://github.com/boto/boto3/issues/334
You can set it with the calling_format parameter. Here is a configuration example for fake-s3:
[s3]
aws_access_key_id=123
aws_secret_access_key=abc
host=fake-s3
port=4569
is_secure=0
calling_format=boto.s3.connection.OrdinaryCallingFormat
I have about 1000 objects in S3 which named after
abcyearmonthday1
abcyearmonthday2
abcyearmonthday3
...
want to rename them to
abc/year/month/day/1
abc/year/month/day/2
abc/year/month/day/3
how could I do it through boto3. Is there easier way of doing this ?
As explained in Boto3/S3: Renaming an object using copy_object
you can not rename an object in S3 you have to copy object with a new name and then delete the Old object
s3 = boto3.resource('s3')
s3.Object('my_bucket','my_file_new').copy_from(CopySource='my_bucket/my_file_old')
s3.Object('my_bucket','my_file_old').delete()
There is not direct way to rename S3 object.
Below two steps need to perform :
Copy the S3 object at same location with new name.
Then delete the older object.
I had the same problem (in my case I wanted to rename files generated in S3 using the Redshift UNLOAD command). I solved creating a boto3 session and then copy-deleting file by file.
Like
import boto3
session = boto3.session.Session(aws_access_key_id=my_access_key_id,aws_secret_access_key=my_secret_access_key).resource('s3')
# Save in a list the tuples of filenames (with prefix): [(old_s3_file_path, new_s3_file_path), ..., ()] e.g. of tuple ('prefix/old_filename.csv000', 'prefix/new_filename.csv')
s3_files_to_rename = []
s3_files_to_rename.append((old_file, new_file))
for pair in s3_files_to_rename:
old_file = pair[0]
new_file = pair[1]
s3_session.Object(s3_bucket_name, new_file).copy_from(CopySource=s3_bucket_name+'/'+old_file)
s3_session.Object(s3_bucket_name, old_file).delete()