How to update aws_lambda_function Terraform resource when ZIP package is changed on S3? - amazon-s3

Zip package is uploaded to S3 not by Terraform.
Lambda is provisioned by Terraform aws_lambda_function resource. When I change Zip package on S3 and run terraform apply command, Terraform says nothing is changed.
There is source_code_hash field in aws_lambda_function resource that can be set to a hash of package content. But whatever value for this hash I provide it's not updated in Terraform state.
How to tell Terraform to update Lambda in case of Zip package update in S3?

After numerous experiments to verify how Terraform handles hash I found the following:
source_code_hash of aws_lambda_function resource is stored in Terraform state at the moment Lambda is provisioned.
source_code_hash is updated only if you provide a new value for it in aws_lambda_function resource and this new value corresponds to hash of actual Zip package in S3.
So Terraform checks actual hash code of the package on S3 only at this moment, it doesn't check it when we run terraform apply.
So to make it work we have the following options:
Download Zip package from S3, calculate its hash and pass it to source_code_hash field of aws_lambda_function resource OR
Upload Zip package to S3 by Terraform using aws_s3_bucket_object resource. Set source_hash field in that resource to save it in Terraform state. This value can be used by aws_lambda_function resource for updates.
Unfortunately this behaviour is not documented and I spent lots of time discovering it. Moreover it can be changed any moment since it's not documented and nobody knows that :-(
So how I solved this problem?
I generate base64-encoded SHA256 hash of Lambda Zip file and store it as metadata for actual Zip file. Then I read this metadata in Terraform and pass it to source_code_hash.
Details:
Generate hash using openssl dgst -binary -sha256 lambda_package.zip | openssl base64 command.
Store hash as metadata during package uploading using aws s3 cp lambda_package.zip s3://my-best-bucket/lambda_package.zip --metadata hash=[HASH_VALUE] command.
Pass hash to source_code_hash in Terraform
data "aws_s3_bucket_object" "package" {
bucket = "my-best-bucket"
key = "lambda_package.zip"
}
resource "aws_lambda_function" "main" {
...
source_code_hash = data.aws_s3_bucket_object.package.metadata.Hash
...
}

Another way of handling this if you can't do the hash of the s3 object you can do the version.
i.e.
data "aws_s3_bucket_object" "application_zip" {
bucket = var.apps_bucket
key = var.app_key
}
resource "aws_lambda_function" "lambda" {
function_name = var.function_name
s3_bucket = var.apps_bucket
s3_key = var.app_key
handler = var.handler
runtime = var.runtime
memory_size = var.memory_size
timeout = var.timeout
role = aws_iam_role.lambda.arn
s3_object_version = data.aws_s3_bucket_object.application_zip.version_id
}
This mean when the object version changes in s3 does the lambda deploy

Related

How to configure my credentials s3 in heroku [duplicate]

On boto I used to specify my credentials when connecting to S3 in such a way:
import boto
from boto.s3.connection import Key, S3Connection
S3 = S3Connection( settings.AWS_SERVER_PUBLIC_KEY, settings.AWS_SERVER_SECRET_KEY )
I could then use S3 to perform my operations (in my case deleting an object from a bucket).
With boto3 all the examples I found are such:
import boto3
S3 = boto3.resource( 's3' )
S3.Object( bucket_name, key_name ).delete()
I couldn't specify my credentials and thus all attempts fail with InvalidAccessKeyId error.
How can I specify credentials with boto3?
You can create a session:
import boto3
session = boto3.Session(
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
)
Then use that session to get an S3 resource:
s3 = session.resource('s3')
You can get a client with new session directly like below.
s3_client = boto3.client('s3',
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
region_name=REGION_NAME
)
This is older but placing this here for my reference too. boto3.resource is just implementing the default Session, you can pass through boto3.resource session details.
Help on function resource in module boto3:
resource(*args, **kwargs)
Create a resource service client by name using the default session.
See :py:meth:`boto3.session.Session.resource`.
https://github.com/boto/boto3/blob/86392b5ca26da57ce6a776365a52d3cab8487d60/boto3/session.py#L265
you can see that it just takes the same arguments as Boto3.Session
import boto3
S3 = boto3.resource('s3', region_name='us-west-2', aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY, aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY)
S3.Object( bucket_name, key_name ).delete()
I'd like expand on #JustAGuy's answer. The method I prefer is to use AWS CLI to create a config file. The reason is, with the config file, the CLI or the SDK will automatically look for credentials in the ~/.aws folder. And the good thing is that AWS CLI is written in python.
You can get cli from pypi if you don't have it already. Here are the steps to get cli set up from terminal
$> pip install awscli #can add user flag
$> aws configure
AWS Access Key ID [****************ABCD]:[enter your key here]
AWS Secret Access Key [****************xyz]:[enter your secret key here]
Default region name [us-west-2]:[enter your region here]
Default output format [None]:
After this you can access boto and any of the api without having to specify keys (unless you want to use a different credentials).
If you rely on your .aws/credentials to store id and key for a user, it will be picked up automatically.
For instance
session = boto3.Session(profile_name='dev')
s3 = session.resource('s3')
This will pick up the dev profile (user) if your credentials file contains the following:
[dev]
aws_access_key_id = AAABBBCCCDDDEEEFFFGG
aws_secret_access_key = FooFooFoo
region=op-southeast-2
There are numerous ways to store credentials while still using boto3.resource().
I'm using the AWS CLI method myself. It works perfectly.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html?fbclid=IwAR2LlrS4O2gYH6xAF4QDVIH2Q2tzfF_VZ6loM3XfXsPAOR4qA-pX_qAILys
you can set default aws env variables for secret and access keys - that way you dont need to change default client creation code - though it is better to pass it as a parameter if you have non-default creds

Insufficient log-delivery permissions when using AWS-cdk and aws lambda

I am trying to create a centralized logging bucket to then log all of other s3 buckets to using lambda and the aws-cdk. The centralized logging bucket has been created but there is an error when using lambda on it to write to it. Here is my code:
import boto3
s3 = boto3.resource('s3')
def handler(event, context):
setBucketPolicy(target_bucket='s3baselinestack-targetloggingbucketbab31bd5-b6y2hkvqz0of')
def setBucketPolicy(target_bucket):
for bucket in s3.buckets.all():
bucket_logging = s3.BucketLogging(bucket.name)
if not bucket_logging.logging_enabled:
reponse = bucket_logging.put(
BucketLoggingStatus={
'LoggingEnabled': {
'TargetBucket': target_bucket,
'TargetPrefix': f'{bucket.name}/'
}
},
)
print(reponse)
Here is my error:
START RequestId: 320e83c0-ba5e-4d54-a78c-a462d6e0cb87 Version: $LATEST
An error occurred (InvalidTargetBucketForLogging) when calling the PutBucketLogging operation: You must give the log-delivery group WRITE and READ_ACP permissions to the target bucket: ClientError
Traceback (most recent call last):
Note: Everything works but this log-delivery permission as when I enable it through the aws console it works fine but, I need to do it programmatically! Thank you in advance.
According to the documentation for S3 logging, you must grant the Log Delivery group WRITE and READ_ACP permissions on the target bucket for logs, and this is done using the S3 ACLs.
https://docs.aws.amazon.com/AmazonS3/latest/dev/enable-logging-programming.html#grant-log-delivery-permissions-general
When creating a new bucket with CDK, this is set using the accessControl property. The default value is BucketAccessControl.PRIVATE.
new s3.Bucket(this, 'bucket', {
accessControl: s3.BucketAccessControl.LOG_DELIVERY_WRITE
})
Since CloudFormation has no way to add ACLs to existing buckets this means that CDK also has no such method. With an existing bucket, add Log Delivery via the web console, the API, or the CLI with aws s3api put-bucket-acl.
Other services, such as CloudFront, don't use ACLs anymore and use IAM policies which can be added using bucket.addToResourcePolicy().
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_aws-s3.IBucket.html#add-wbr-to-wbr-resource-wbr-policypermission

How to programmatically set up Airflow 1.10 logging with localstack s3 endpoint?

In attempt to setup airflow logging to localstack s3 buckets, for local and kubernetes dev environments, I am following the airflow documentation for logging to s3. To give a little context, localstack is a local AWS cloud stack with AWS services including s3 running locally.
I added the following environment variables to my airflow containers similar to this other stack overflow post in attempt to log to my local s3 buckets. This is what I added to docker-compose.yaml for all airflow containers:
- AIRFLOW__CORE__REMOTE_LOGGING=True
- AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://local-airflow-logs
- AIRFLOW__CORE__REMOTE_LOG_CONN_ID=MyS3Conn
- AIRFLOW__CORE__ENCRYPT_S3_LOGS=False
I've also added my localstack s3 creds to airflow.cfg
[MyS3Conn]
aws_access_key_id = foo
aws_secret_access_key = bar
aws_default_region = us-east-1
host = http://localstack:4572 # s3 port. not sure if this is right place for it
Additionally, I've installed apache-airflow[hooks], and apache-airflow[s3], though it's not clear which one is really needed based on the documentation.
I've followed the steps in a previous stack overflow post in attempt verify if the S3Hook can write to my localstack s3 instance:
from airflow.hooks import S3Hook
s3 = S3Hook(aws_conn_id='MyS3Conn')
s3.load_string('test','test',bucket_name='local-airflow-logs')
But I get botocore.exceptions.NoCredentialsError: Unable to locate credentials.
After adding credentials to airflow console under /admin/connection/edit as depicted:
this is the new exception, botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records. is returned. Other people have encountered this same issue and it may have been related to networking.
Regardless, a programatic setup is needed, not a manual one.
I was able to access the bucket using a standalone Python script (entering AWS credentials explicitly with boto), but it needs to work as part of airflow.
Is there a proper way to set up host / port / credentials for S3Hook by adding MyS3Conn to airflow.cfg?
Based on the airflow s3 hooks source code, it seems a custom s3 URL may not yet be supported by airflow. However, based on the airflow aws_hook source code (parent) it seems it should be possible to set the endpoint_url including port, and it should be read from airflow.cfg.
I am able to inspect and write to my s3 bucket in localstack using boto alone. Also, curl http://localstack:4572/local-mochi-airflow-logs returns the contents of the bucket from the airflow container. And aws --endpoint-url=http://localhost:4572 s3 ls returns Could not connect to the endpoint URL: "http://localhost:4572/".
What other steps might be needed to log to localstack s3 buckets from airflow running in docker, with automated setup and is this even supported yet?
I think you're supposed to use localhost not localstack for the endpoint, e.g. host = http://localhost:4572.
In Airflow 1.10 you can override the endpoint on a per-connection basis but unfortunately it only supports one endpoint at a time so you'd be changing it for all AWS hooks using the connection. To override it, edit the relevant connection and in the "Extra" field put:
{"host": "http://localhost:4572"}
I believe this will fix it?
I managed to make this work by referring to this guide. Basically you need to create a connection using the Connection class and pass the credentials that you need, in my case I needed AWS_SESSION_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, REGION_NAME to make this work. Use this function as a python_callable in a PythonOperator which should be the first part of the DAG.
import os
import json
from airflow.models.connection import Connection
from airflow.exceptions import AirflowFailException
def _create_connection(**context):
"""
Sets the connection information about the environment using the Connection
class instead of doing it manually in the Airflow UI
"""
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_SESSION_TOKEN = os.getenv("AWS_SESSION_TOKEN")
REGION_NAME = os.getenv("REGION_NAME")
credentials = [
AWS_SESSION_TOKEN,
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
REGION_NAME,
]
if not credentials or any(not credential for credential in credentials):
raise AirflowFailException("Environment variables were not passed")
extras = json.dumps(
dict(
aws_session_token=AWS_SESSION_TOKEN,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=REGION_NAME,
),
)
try:
Connection(
conn_id="s3_con",
conn_type="S3",
extra=extras,
)
except Exception as e:
raise AirflowFailException(
f"Error creating connection to Airflow :{e!r}",
)

x-amz-server-side-encryption header is not supported for this operation

I am running my application in an AWS AMI. The AMI is launched via a cloud formation template that creates AWS::IAM::Role role with sts:AssumeRole. Once the EC2 instance is up, I create an S3 bucket from the Ec2 instance using boto3.create_bucket.
In my application I upload a file to the created bucket with encryption flag on. But while uploading I'm getting an error:
com.amazonaws.services.s3.model.AmazonS3Exception:
x-amz-server-side-encryption header is not supported for this
operation. (Service: Amazon S3; Status Code: 400; Error Code:
InvalidArgument; Request ID: 04DD9259D04F92CA), S3 Extended Request
ID:
EVdqFn6jUNshxUejZFWa6VN/lHPXHyi0F+TG+UZ3K9Sh8Gy0MPABi1AnxZloIajypLb39/5UAVA=
This is the server side encryption part of my code:
ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(contentLength);
meta.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION)
What am I doing wrong? This works as expected when I run my code elsewhere and use an S3 bucket. Is this somehow tied to cloud formation or sts:AssumeRole?
The Put object function in boto3 has options for setting the object level encryption.
object = bucket.put_object(
ServerSideEncryption='AES256'|'aws:kms',
SSECustomerAlgorithm='string',
SSECustomerKey='string',
SSEKMSKeyId='string',
)
ServerSideEncryption (string) -- The Server-side encryption algorithm used when storing this object in S3 (e.g., AES256, aws:kms).
StorageClass (string) -- The type of storage to use for the object. Defaults to 'STANDARD'.
SSECustomerAlgorithm (string) -- Specifies the algorithm to use to when encrypting the object (e.g., AES256).
SSECustomerKey (string) -- Specifies the customer-provided encryption key for Amazon S3 to use in encrypting data. This value is used to store the object and then it is discarded; Amazon does not store the encryption key. The key must be appropriate for use with the algorithm specified in the x-amz-server-side​-encryption​-customer-algorithm header.> -
SSECustomerKeyMD5 (string) -- Specifies the 128-bit MD5 digest of the encryption key according to RFC 1321. Amazon S3 uses this header for a message integrity check to ensure the encryption key was transmitted without error. Please note that this parameter is automatically populated if it is not provided. Including this parameter is not required
SSEKMSKeyId (string) -- Specifies the AWS KMS key ID to use for object encryption. All GET and PUT requests for an object protected by AWS KMS will fail if not made via SSL or using SigV4. Documentation on configuring any of the officially supported AWS SDKs and CLI can be found at http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html#specify-signature-version)
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html

S3 checkpointing for spark streaming leads to an error

I have enabled checkpointing for my sparkstreaming application using the getOrCreate method. The checkpoint directory points to an S3 bucket.
The problem i have is a credential issue in accessing S3 :
Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).
I have already set the environment variables (AWS_SECRET_KEY and AWS_ACCESS_KEY).
Also my fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey have been specified in the application.conf.. So i dont know why it still fails.
The environment variables (AWS_SECRET_KEY and AWS_ACCESS_KEY) no longer works after Spark 1.3.
Please refer to for the new approach:
How to read input from S3 in a Spark Streaming EC2 cluster application
val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
val sc = new SparkContext(conf)
val hadoopConf=sc.hadoopConfiguration;
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId",myAccessKey)
hadoopConf.set("fs.s3.awsSecretAccessKey",mySecretKey)