URI Format for Creating an Airflow S3 Connection via Environment Variables - amazon-s3

I've read the documentation for creating an Airflow Connection via an environment variable and am using Airflow v1.10.6 with Python3.5 on Debian9.
The linked documentation above shows an example S3 connection of s3://accesskey:secretkey#S3 From that, I defined the following environment variable:
AIRFLOW_CONN_AWS_S3=s3://#MY_ACCESS_KEY#:#MY_SECRET_ACCESS_KEY##S3
And the following function
def download_file_from_S3_with_hook(key, bucket_name):
"""Get file contents from S3"""
hook = airflow.hooks.S3_hook.S3Hook('aws_s3')
obj = hook.get_key(key, bucket_name)
contents = obj.get()['Body'].read().decode('utf-8')
return contents
However, when I invoke that function I get the following error:
Using connection to: id: aws_s3.
Host: #MY_ACCESS_KEY#,
Port: None,
Schema: #MY_SECRET_ACCESS_KEY#,
Login: None,
Password: None,
extra: {}
ERROR - Unable to locate credentials
It appears that when I format the URI according to Airflow's documentation it's sitting the access key as the host and the secret access key as the schema.
It's clearly reading the environment variable as it has the correct conn_id. It also has the correct values for my access key and secret, it's just parcing it under the wrong field.
When I set the connection in the UI, the function works if I set Login to my access key and Password to my token. So how am I formatting my environment variable URI wrong?

Found the issue, s3://accesskey:secretkey#S3 is the correct format, the problem was my aws_secret_access_key had a special character in it and had to be urlencoded. That fixed everything.

Related

How to configure my credentials s3 in heroku [duplicate]

On boto I used to specify my credentials when connecting to S3 in such a way:
import boto
from boto.s3.connection import Key, S3Connection
S3 = S3Connection( settings.AWS_SERVER_PUBLIC_KEY, settings.AWS_SERVER_SECRET_KEY )
I could then use S3 to perform my operations (in my case deleting an object from a bucket).
With boto3 all the examples I found are such:
import boto3
S3 = boto3.resource( 's3' )
S3.Object( bucket_name, key_name ).delete()
I couldn't specify my credentials and thus all attempts fail with InvalidAccessKeyId error.
How can I specify credentials with boto3?
You can create a session:
import boto3
session = boto3.Session(
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
)
Then use that session to get an S3 resource:
s3 = session.resource('s3')
You can get a client with new session directly like below.
s3_client = boto3.client('s3',
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
region_name=REGION_NAME
)
This is older but placing this here for my reference too. boto3.resource is just implementing the default Session, you can pass through boto3.resource session details.
Help on function resource in module boto3:
resource(*args, **kwargs)
Create a resource service client by name using the default session.
See :py:meth:`boto3.session.Session.resource`.
https://github.com/boto/boto3/blob/86392b5ca26da57ce6a776365a52d3cab8487d60/boto3/session.py#L265
you can see that it just takes the same arguments as Boto3.Session
import boto3
S3 = boto3.resource('s3', region_name='us-west-2', aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY)
S3.Object( bucket_name, key_name ).delete()
I'd like expand on #JustAGuy's answer. The method I prefer is to use AWS CLI to create a config file. The reason is, with the config file, the CLI or the SDK will automatically look for credentials in the ~/.aws folder. And the good thing is that AWS CLI is written in python.
You can get cli from pypi if you don't have it already. Here are the steps to get cli set up from terminal
$> pip install awscli #can add user flag
$> aws configure
AWS Access Key ID [****************ABCD]:[enter your key here]
AWS Secret Access Key [****************xyz]:[enter your secret key here]
Default region name [us-west-2]:[enter your region here]
Default output format [None]:
After this you can access boto and any of the api without having to specify keys (unless you want to use a different credentials).
If you rely on your .aws/credentials to store id and key for a user, it will be picked up automatically.
For instance
session = boto3.Session(profile_name='dev')
s3 = session.resource('s3')
This will pick up the dev profile (user) if your credentials file contains the following:
[dev]
aws_access_key_id = AAABBBCCCDDDEEEFFFGG
aws_secret_access_key = FooFooFoo
region=op-southeast-2
There are numerous ways to store credentials while still using boto3.resource().
I'm using the AWS CLI method myself. It works perfectly.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html?fbclid=IwAR2LlrS4O2gYH6xAF4QDVIH2Q2tzfF_VZ6loM3XfXsPAOR4qA-pX_qAILys
you can set default aws env variables for secret and access keys - that way you dont need to change default client creation code - though it is better to pass it as a parameter if you have non-default creds

getting Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1 despite having credentials in config file

I have a typescript/node-based application where the following line of code is throwing an error:
const res = await s3.getObject(obj).promise();
The error I'm getting in terminal output is:
❌ Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1
CredentialsError: Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1
However, I do actually have a credentials file in my .aws directory with values for aws_access_key_id and aws_secret_access_key. I have also exported the values for these with the variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. I have also tried this with and without running export AWS_SDK_LOAD_CONFIG=1 but to no avail (same error message). Would anyone be able to provide any possible causes/suggestions for further troubleshooting?
Install npm i dotenv
Add a .env file with your AWS_ACCESS_KEY_ID etc credentials in.
Then in your index.js or equivalent file add require("dotenv").config();
Then update the config of your AWS instance:
region: "eu-west-2",
maxRetries: 3,
httpOptions: { timeout: 30000, connectTimeout: 5000 },
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
});
Try not setting AWS_SDK_LOAD_CONFIG to anything (unset it). Unset all other AWS variables. In Mac/linux, you can do export | grep AWS_ to find others you might have set.
Next, do you have AWS connectivity from the command line? Install the AWS CLI v2 if you don't have it yet, and run aws sts get-caller-identity from a terminal window. Don't bother trying to run node until you get this working. You can also try aws configure list.
Read through all the sections of Configuring the AWS CLI, paying particular attention to how to use the credentials and config files at $HOME/.aws/credentials and $HOME/.aws/config. Are you using the default profile or a named profile?
I prefer to use named profiles, but I use more than one so that may not be needed for you. I have always found success using the AWS_PROFILE environment variable:
export AWS_PROFILE=your_profile_name # macOS/linux
setx AWS_PROFILE your_profile_name # Windows
$Env:AWS_PROFILE="your_profile_name" # PowerShell
This works for me both with an Okta/gimme-aws-creds scenario, as well as an Amazon SSO scenario. With the Okta scenario, just the AWS secret keys go into $HOME/.aws/credentials, and further configuration such as default region or output format go in $HOME/.aws/config (this separation is so that tools can completely rewrite the credentials file without touching the config). With the Amazon SSO scenario, all the settings go in the config.

Authentication Failure when Accessing Azure Blob Storage through Connection String

We got error of Authentication fail, when we try to create an azure blob client from connection string, using python v12 sdk with Azure Blob Storage v12.5.0, and Azure core 1.8.2.
I used
azure-storate-blob == 12.5.0
azure-core == 1.8.2
I tried to access my blob storage account using connection string with Python v12 SDK and received the error above. The environment I'm running in is python venv in NixShell.
The code for calling the blob_upload is as following:
blob_service_client = BlobServiceClient(account_url=<>,credential=<>)
blob_client = blob_service_client.get_blob_client(container=container_name,
blob=file)
I printed out blob_client, and it looks normal. But the next line of upload_blob gives error.
with open(os.path.join(root,file), "rb") as data:
blob_client.upload_blob(data)
The error message is as follows
File "<local_address>/.venv/lib/python3.8/site-packages/azure/storage/blob/_upload_helpers.py", in upload_block_blob
return client.upload(
File "<local_address>/.venv/lib/python3.8/site-packages/azure/storage/blob/_generated/operations/_block_blob_operations.py", in upload
raise models.StorageErrorException(response, self._deserialize)
azure.storage.blob._generated.models._models_py3.StorageErrorException: Operation returned an invalid status 'Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.'
So I printed out the http put request to azure blob storage, and get the response value of [403]
I can work the following code well with the version the same as yours.
from azure.storage.blob import BlobServiceClient
blob=BlobServiceClient.from_connection_string(conn_str="your connect string in Access Keys")
with open("./SampleSource.txt", "rb") as data:
blob.upload_blob(data)
Please check your connect-string, and check your PC's time.
There is a similar issue about the error: AzureStorage Blob Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature
UPDATE:
I tried with this code, and get the same error:
from azure.storage.blob import BlobServiceClient
from azure.identity import DefaultAzureCredential
token_credential = DefaultAzureCredential()
blob_service_client = BlobServiceClient(account_url="https://pamelastorage123.blob.core.windows.net/",credential=token_credential)
blob_client = blob_service_client.get_blob_client(container="pamelac", blob="New Text Document.txt")
with open("D:/demo/python/New Text Document.txt", "rb") as data:
blob_client.upload_blob(data)
Then I use AzureCliCredential() instead of DefaultAzureCredential(). I authenticate via the Azure CLI with az login. And it works.
If you use environment credential, you need to set the variables. Anyway, I recommend you to use the specific credentials instead DefaultAzureCredential.
For more details about Azure Identity, see here.

How to programmatically set up Airflow 1.10 logging with localstack s3 endpoint?

In attempt to setup airflow logging to localstack s3 buckets, for local and kubernetes dev environments, I am following the airflow documentation for logging to s3. To give a little context, localstack is a local AWS cloud stack with AWS services including s3 running locally.
I added the following environment variables to my airflow containers similar to this other stack overflow post in attempt to log to my local s3 buckets. This is what I added to docker-compose.yaml for all airflow containers:
- AIRFLOW__CORE__REMOTE_LOGGING=True
- AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://local-airflow-logs
- AIRFLOW__CORE__REMOTE_LOG_CONN_ID=MyS3Conn
- AIRFLOW__CORE__ENCRYPT_S3_LOGS=False
I've also added my localstack s3 creds to airflow.cfg
[MyS3Conn]
aws_access_key_id = foo
aws_secret_access_key = bar
aws_default_region = us-east-1
host = http://localstack:4572 # s3 port. not sure if this is right place for it
Additionally, I've installed apache-airflow[hooks], and apache-airflow[s3], though it's not clear which one is really needed based on the documentation.
I've followed the steps in a previous stack overflow post in attempt verify if the S3Hook can write to my localstack s3 instance:
from airflow.hooks import S3Hook
s3 = S3Hook(aws_conn_id='MyS3Conn')
s3.load_string('test','test',bucket_name='local-airflow-logs')
But I get botocore.exceptions.NoCredentialsError: Unable to locate credentials.
After adding credentials to airflow console under /admin/connection/edit as depicted:
this is the new exception, botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records. is returned. Other people have encountered this same issue and it may have been related to networking.
Regardless, a programatic setup is needed, not a manual one.
I was able to access the bucket using a standalone Python script (entering AWS credentials explicitly with boto), but it needs to work as part of airflow.
Is there a proper way to set up host / port / credentials for S3Hook by adding MyS3Conn to airflow.cfg?
Based on the airflow s3 hooks source code, it seems a custom s3 URL may not yet be supported by airflow. However, based on the airflow aws_hook source code (parent) it seems it should be possible to set the endpoint_url including port, and it should be read from airflow.cfg.
I am able to inspect and write to my s3 bucket in localstack using boto alone. Also, curl http://localstack:4572/local-mochi-airflow-logs returns the contents of the bucket from the airflow container. And aws --endpoint-url=http://localhost:4572 s3 ls returns Could not connect to the endpoint URL: "http://localhost:4572/".
What other steps might be needed to log to localstack s3 buckets from airflow running in docker, with automated setup and is this even supported yet?
I think you're supposed to use localhost not localstack for the endpoint, e.g. host = http://localhost:4572.
In Airflow 1.10 you can override the endpoint on a per-connection basis but unfortunately it only supports one endpoint at a time so you'd be changing it for all AWS hooks using the connection. To override it, edit the relevant connection and in the "Extra" field put:
{"host": "http://localhost:4572"}
I believe this will fix it?
I managed to make this work by referring to this guide. Basically you need to create a connection using the Connection class and pass the credentials that you need, in my case I needed AWS_SESSION_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, REGION_NAME to make this work. Use this function as a python_callable in a PythonOperator which should be the first part of the DAG.
import os
import json
from airflow.models.connection import Connection
from airflow.exceptions import AirflowFailException
def _create_connection(**context):
"""
Sets the connection information about the environment using the Connection
class instead of doing it manually in the Airflow UI
"""
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_SESSION_TOKEN = os.getenv("AWS_SESSION_TOKEN")
REGION_NAME = os.getenv("REGION_NAME")
credentials = [
AWS_SESSION_TOKEN,
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
REGION_NAME,
]
if not credentials or any(not credential for credential in credentials):
raise AirflowFailException("Environment variables were not passed")
extras = json.dumps(
dict(
aws_session_token=AWS_SESSION_TOKEN,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=REGION_NAME,
),
)
try:
Connection(
conn_id="s3_con",
conn_type="S3",
extra=extras,
)
except Exception as e:
raise AirflowFailException(
f"Error creating connection to Airflow :{e!r}",
)

S3KeySensor issue: Despite passing host parameter in credentials file, task fails

Airflow version: 1.8
I am using an S3keysensor in my DAG. In airflow connections, I have pointed to a credentials file for AWS. I tried passing the 'host' parameter through the credentials file, as well as the airflow connections, but I am still getting the same error.
'BotoClientError: When using SigV4, you must specify a 'host' parameter'
aws credentials file:
host=s3.us-east-2.amazonaws.com
access_key=xxxxxxxxxxxxxxxxx
secret_key=xxxxxxxxxxxxxxxxx
The host parameter is not an option in the credential file but it is an option of the class boto.s3.connection.S3Connection. So, if you set the code somehow like below, then you have to add the host parameter such as:
from boto.s3.connection import S3Connection
conn = S3Connection(host=<HOST>)
or
import boto
conn = boto.connect_s3(host=<HOST>)
not in the credentials file.