created s3 presigned url (put) with custom headers with boto3 - amazon-s3

My current code looks like this
s3 = boto3.client('s3')
presigned_url = s3.generate_presigned_url(
'put_object',
Params={'Bucket':bucket_name, 'Key':object_key},
ExpiresIn=3600,
HttpMethod='PUT' )
This is working, but I want to include custom headers like x-amz-meta-my-custom-meta-data. I'm pretty sure S3 supports this, so how can I do this with boto3?
Its not clear from the documentation.
Using Python 3.6

It is a NO and is still classified as a feature request as of Oct 2017.
https://github.com/boto/boto3/issues/1294
Hope it helps.

Send it as metadata
s3 = boto3.client('s3')
presigned_url = s3.generate_presigned_url(
'put_object',
Params={'Bucket':bucket_name, 'Key':object_key, "Metadata": {"mechaGodzilla": "anything is possible"}},
ExpiresIn=3600,
HttpMethod='PUT' )
In your request headers, you must include x-amz-meta-mechaGodzilla: "anything is possible"

Related

Read csv from s3 and upload to external api as multipart

I want to read the csv file from the s3 bucket using boto3 and upload it to external API using multipart/form-data request.
so far I am able to read the csv
response = s3.get_object(Bucket=bucket, Key=key)
body = response['Body']
Not sure on how to convert this body into multipart.
External api will be taking request in multipart/form-data.
Any Suggestions would be helpful.
Following method solved my issue.
body = response['Body'].read()
multipart_data = MultipartEncoder(
fields={
'file': (file_name, body, 'application/vnd.ms-excel'),
'field01': 'test'
}
)
.read() method will convert the file into binary string.

Boto3 generate presinged url does not work

Here is my code that I use to create a s3 client and generate a presigned url, which are some quite standard codes. They have been up running in the server for quite a while. I pulled the code out and ran it locally in a jupyter notebook
def get_s3_client():
return get_s3(create_session=False)
def get_s3(create_session=False):
session = boto3.session.Session() if create_session else boto3
S3_ENDPOINT = os.environ.get('AWS_S3_ENDPOINT')
if S3_ENDPOINT:
AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = os.environ['AWS_SECRET_ACCESS_KEY']
AWS_DEFAULT_REGION = os.environ["AWS_DEFAULT_REGION"]
s3 = session.client('s3',
endpoint_url=S3_ENDPOINT,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=AWS_DEFAULT_REGION)
else:
s3 = session.client('s3', region_name='us-east-2')
return s3
s3 = get_s3_client()
BUCKET=[my-bucket-name]
OBJECT_KEY=[my-object-name]
signed_url = s3.generate_presigned_url(
'get_object',
ExpiresIn=3600,
Params={
"Bucket": BUCKET,
"Key": OBJECT_KEY,
}
)
print(signed_url)
When I tried to download the file using the url in the browser, I got an error message and it says "The specified key does not exist." I noticed in the error message that my object key becomes "[my-bucket-name]/[my-object-name]" rather than just "[my-object-name]".
Then I used the same bucket/key combination to generate a presigned url using aws cli, which is working as expected. I found out that somehow the s3 client method (boto3) inserted [my-object-name] in front of [my-object-name] compared to the aws cli method. Here are the results
From s3.generate_presigned_url()
https://[my-bucket-name].s3.us-east-2.amazonaws.com/[my-bucket-name]/[my-object-name]?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAV17K253JHUDLKKHB%2F20210520%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20210520T175014Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=5cdcc38e5933e92b5xed07b58e421e5418c16942cb9ac6ac6429ac65c9f87d64
From aws cli s3 presign
https://[my-bucket-name].s3.us-east-2.amazonaws.com/[my-object-name]?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAYA7K15LJHUDAVKHB%2F20210520%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20210520T155926Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=58208f91985bf3ce72ccf884ba804af30151d158d6ba410dd8fe9d2457369894
I've been working on this and searching for solutions for day and half and I couldn't find out what was wrong with my implementation. I guess it might be that I ignored some basic but important settings to create a s3 client using boto3 or something else. Thanks for the help!
Ok, myth is solved, I shouldn't provide the endpoint_url=S3_ENDPOINT param when I create the s3 client, boto3 will figure it out. After i removed it, everything works as expected.

Set GitHub default branch through API call

I need to create a branch dev which is other than master branch. Also need to set dev as default branch using GITHUB API.
Please share details if anyone know which API to call or a way to do it, programmatically. I know that it can be done through the Web UI, however I am looking for a solution that does not involve manual intervention.
I don't have enough reputation to reply to the Adam's comment above but the problem is name is a required field. The JSON should actually be:
PATCH /repos/:owner/:repo
{
"name":":repo"
"default_branch": "dev"
}
Following the guide here: https://developer.github.com/v3/repos/#edit , default_branch input should make what you want
default_branch (string): Updates the default branch for this repository.
So, you should submit a PATCH request like:
PATCH /repos/:owner/:repo
{"default_branch": "dev"}
You can use requests library:
import requests
access_token = "your_access_token"
headers = {'Authorization': f'token {access_token}',
'Content-Type':'application/json'}
data={"name":"knowledge-engine", "default_branch": "development"}
owner = "username"
repo_name = "repo_name"
url = f"https://api.github.com/repos/{owner}/{repo_name}"
requests.patch(url, data=json.dumps(data), headers=headers)
<Response [200]>
Docs:
https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line
http://docs.python-requests.org/en/master/
https://developer.github.com/v3/repos/#edit
Easiest way to update the default branch if you have the github cli:
gh api repos/{owner}/{repo} --method PATCH --field 'default_branch=dev'
Note the CLI will replace the {owner} and {repo} for you if you are in the locally checked out repository.

Setting metadata on S3 multipart upload

I'd like to upload a file to S3 in parts, and set some metadata on the file. I'm using boto to interact with S3. I'm able to set metadata with single-operation uploads like so:
Is there a way to set metadata with a multipart upload? I've tried this method of copying the key to change the metadata, but it fails with the error: InvalidRequest: The specified copy source is larger than the maximum allowable size for a copy source: <size>
I've also tried doing the following:
key = bucket.create_key(key_name)
key.set_metadata('some-key', 'value')
<multipart upload>
...but the multipart upload overwrites the metadata.
I'm using code similar to this to do the multipart upload.
Sorry, I just found the answer:
Per the docs:
If you want to provide any metadata describing the object being uploaded, you must provide it in the request to initiate multipart upload.
So in boto, the metadata can be set in the initiate_multipart_upload call. Docs here.
Faced such issue earlier today and discovered that there is no information on how to do that right.
The code example on how we solved that issue provided below.
$uploader = new MultipartUploader($client, $source, [
'bucket' => $bucketName,
'key' => $filename,
'before_initiate' => function (\Aws\Command $command) {
$command['ContentType'] = 'application/octet-stream';
$command['ContentDisposition'] = 'attachment';
},
]);
Unfortunately, documentation https://docs.aws.amazon.com/aws-sdk-php/v3/guide/service/s3-multipart-upload.html#customizing-a-multipart-upload doesn't make it clear and easy to understand that if you'd like to provide alternative meta data with multipart upload you have to go this way.
I hope that will help.

boto - more concise way to get key's value from bucket?

I'm trying to figure out a concise way to get data from s3 via boto
my current code looks like this. s3 manager is simply a class that does all the s3 setup for my app.
log.debug("generating downloader")
downloader = s3_manager()
log.debug("accessing bucket")
bucket_archive = downloader.s3_buckets['#archive']
log.debug("getting key")
key = bucket_archive.get_key(archive_filename)
log.debug("getting key into string")
source = key.get_contents_as_string()
the problem is that , looking at my debug logs, i'm making two requests to amazon s3:
key = bucket_archive.get_key(archive_filename)
source = key.get_contents_as_string()
looking at the docs [ http://boto.readthedocs.org/en/latest/ref/s3.html ] , it seems that the call to get_key checks to see if it exists , while the second call gets the actual data. does anyone know of a method to do both at once ? a more concise way of doing this with one request is preferable for our app.
The get_key() method performs a HEAD request on the object to verify that it exists. If you are certain that the bucket and key exist and would prefer not to have the overhead of a HEAD request, you can simply create a Key object directly. Something like this would work:
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket('mybucket', validate=False)
key = bucket.new_key('myexistingkey')
contents = key.get_contents_as_string()
The validate=False on the call to get_bucket eliminates a GET request that also is intended to validate that the bucket exists.