S3 Python client with boto3 SDK - amazon-s3

I'd like to make a python S3 client to store data in the S3 Dynamic Storage service provided by the appcloud. So I've discovered the boto3 SDK for python and was wondering how this thing works on the appcloud. Locally you install the aws cli to configure your credentials but how you do that on the cloud? Does someone have experience with creating a S3 python client for the internal appcloud and could provide me with a short example (boto3 or different approach)?
Greetings
Edit 1:
Tried this:
import boto3
s3 = boto3.client('s3', endpoint_url='https://ds31s3.swisscom.com/', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET)
s3.create_bucket(Bucket="sc-testbucket1234")
But I got this exception:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://ds31s3.swisscom.com"

import boto3
conn = boto3.resource('s3',
region_name='eu-west-1',
endpoint_url='https://x',
aws_access_key_id='xx',
aws_secret_access_key='xx',)
conn.create_bucket(Bucket="bucketname")

Works with this configuration (with python 3.5):
import boto3
conn = boto3.resource('s3', region_name='eu-west-1', endpoint_url=HOST, aws_access_key_id=KEY, aws_secret_access_key=SECRTE)
conn.create_bucket(Bucket="pqdjmalsdnf12098")
Thanks to #user3080315

Related

How to configure my credentials s3 in heroku [duplicate]

On boto I used to specify my credentials when connecting to S3 in such a way:
import boto
from boto.s3.connection import Key, S3Connection
S3 = S3Connection( settings.AWS_SERVER_PUBLIC_KEY, settings.AWS_SERVER_SECRET_KEY )
I could then use S3 to perform my operations (in my case deleting an object from a bucket).
With boto3 all the examples I found are such:
import boto3
S3 = boto3.resource( 's3' )
S3.Object( bucket_name, key_name ).delete()
I couldn't specify my credentials and thus all attempts fail with InvalidAccessKeyId error.
How can I specify credentials with boto3?
You can create a session:
import boto3
session = boto3.Session(
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
)
Then use that session to get an S3 resource:
s3 = session.resource('s3')
You can get a client with new session directly like below.
s3_client = boto3.client('s3',
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
region_name=REGION_NAME
)
This is older but placing this here for my reference too. boto3.resource is just implementing the default Session, you can pass through boto3.resource session details.
Help on function resource in module boto3:
resource(*args, **kwargs)
Create a resource service client by name using the default session.
See :py:meth:`boto3.session.Session.resource`.
https://github.com/boto/boto3/blob/86392b5ca26da57ce6a776365a52d3cab8487d60/boto3/session.py#L265
you can see that it just takes the same arguments as Boto3.Session
import boto3
S3 = boto3.resource('s3', region_name='us-west-2', aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY)
S3.Object( bucket_name, key_name ).delete()
I'd like expand on #JustAGuy's answer. The method I prefer is to use AWS CLI to create a config file. The reason is, with the config file, the CLI or the SDK will automatically look for credentials in the ~/.aws folder. And the good thing is that AWS CLI is written in python.
You can get cli from pypi if you don't have it already. Here are the steps to get cli set up from terminal
$> pip install awscli #can add user flag
$> aws configure
AWS Access Key ID [****************ABCD]:[enter your key here]
AWS Secret Access Key [****************xyz]:[enter your secret key here]
Default region name [us-west-2]:[enter your region here]
Default output format [None]:
After this you can access boto and any of the api without having to specify keys (unless you want to use a different credentials).
If you rely on your .aws/credentials to store id and key for a user, it will be picked up automatically.
For instance
session = boto3.Session(profile_name='dev')
s3 = session.resource('s3')
This will pick up the dev profile (user) if your credentials file contains the following:
[dev]
aws_access_key_id = AAABBBCCCDDDEEEFFFGG
aws_secret_access_key = FooFooFoo
region=op-southeast-2
There are numerous ways to store credentials while still using boto3.resource().
I'm using the AWS CLI method myself. It works perfectly.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html?fbclid=IwAR2LlrS4O2gYH6xAF4QDVIH2Q2tzfF_VZ6loM3XfXsPAOR4qA-pX_qAILys
you can set default aws env variables for secret and access keys - that way you dont need to change default client creation code - though it is better to pass it as a parameter if you have non-default creds

How to integrate CEPH with Amazon-S3?

I'm trying to adapt the open-source project mmfashion on Amazon SageMaker that requires the CEPH module for backend. Unfortunately pip install ceph doesn't work. The only work-around was to build the ceph source-code manually by running in my container:
!git clone git://github.com/ceph/ceph
!git submodule update --init --recursive
This does allow me to import ceph successfully. But it throws the following error when it comes to fecthing data from Amazon S3:
AttributeError: module 'ceph' has no attribute 'S3Client'
Has someone integrated CEPH with Amazon S3 Bucket or has suggestions in the same line on how to tackle this?
you can use ceph S3 api to connect to AWS buckets , here is the simple python example script to connect to any S3 api :
import boto
import boto.s3.connection
access_key = 'put your access key here!'
secret_key = 'put your secret key here!'
conn = boto.connect_s3(
aws_access_key_id = access_key,
aws_secret_access_key = secret_key,
host = 'objects.dreamhost.com',
#is_secure=False, # uncomment if you are not using ssl
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
then you will be able to list the buckets :
for bucket in conn.get_all_buckets():
print "{name}\t{created}".format(
name = bucket.name,
created = bucket.creation_date,
)

creating boto3 s3 client on Airflow with an s3 connection and s3 hook

I am trying to move my python code to Airflow. I have the following code snippet:
s3_client = boto3.client('s3',
region_name="us-west-2",
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key)
I am trying to recreate this s3_client using Aiflow's s3 hook and s3 connection but cant find a way to do it in any documentation without specifying the aws_access_key_id and the aws_secret_access_key directly in code.
Any help would be appreciated
You need to define aws connection in Admin -> Connections or with cli (see docs).
Once the connection defined you can use it in S3Hook.
Your connection object can be set as:
Conn Id: <your_choice_of_conn_id_name>
Conn Type: Amazon Web Services
Login: <aws_access_key>
Password: <aws_secret_key>
Extra: {"region_name": "us-west-2"}
In Airflow the hooks wrap a python package. Thus if your code uses hook there shouldn't be a reason to import boto3 directly.

AWS migrate using import export

I want to migrate my VM from on prem to AWS.
I have exported my VM with OVA format and uploaded into S3, I used AWS cli to import it as an image but i got this error :
"StatusMessage": "FirstBootFailure: This import request failed because the instance failed to boot and establish network connectivity."
how can i solve this problem?
Thanks

Pyspark not using TemporaryAWSCredentialsProvider

I'm trying to read files from S3 using Pyspark using temporary session credentials but keep getting the error:
Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: null, AWS Request ID: XXXXXXXX, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: XXXXXXX
I think the issue might be that the S3A connection needs to use org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider in order to pull in the session token in addition to the standard access key and secret key, but even with setting the fs.s3a.aws.credentials.provider configuration variable, it is still attempting to authenticate with BasicAWSCredentialsProvider. Looking at the logs I see:
DEBUG AWSCredentialsProviderChain:105 - Loading credentials from BasicAWSCredentialsProvider
I've followed the directions here to add the necessary configuration values, but they do not seem to make any difference. Here is the code I'm using to set it up:
import os
import sys
import pyspark
from pyspark.sql import SQLContext
from pyspark.context import SparkContext
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk-pom:1.11.83,org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell'
sc = SparkContext()
sc.setLogLevel("DEBUG")
sc._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", os.environ.get("AWS_ACCESS_KEY_ID"))
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", os.environ.get("AWS_SECRET_ACCESS_KEY"))
sc._jsc.hadoopConfiguration().set("fs.s3a.session.token", os.environ.get("AWS_SESSION_TOKEN"))
sql_context = SQLContext(sc)
Why is TemporaryAWSCredentialsProvider not being used?
Which Hadoop version are you using?
S3A STS support was added in Hadoop 2.8.0, and this was the exact error message i got on Hadoop 2.7.
Wafle is right, its 2.8+ only.
But you might be able to get away with setting the AWS_ environment variables and have the session secrets being picked up that way, as AWS environment variable support has long been in there, and I think it will pick up the AWS_SESSION_TOKEN
See AWS docs