Python Boto3 Lambda Upload Temp File - amazon-s3

I'm working on a Lambda that compresses image files in an S3 bucket. I'm able to download the image in the Lambda, compress it as a new file. I'm trying to upload the new file to the same S3 bucket and I keep on getting hit with the following error:
module initialization error: expected string or bytes-like object
Here's the code to upload:
s3 = boto3.client('s3')
s3.upload_file(filename,my_bucket,basename)
Here are the logs from one of the test uploads:
Getting ready to download Giggidy.png
This is what we're calling our temp file: /tmp/tmp6i7fvb6z.png
Let's compress /tmp/tmp6i7fvb6z.png
Compressed /tmp/tmp6i7fvb6z.png to /tmp/tmpmq23jj5c.png
Getting ready to upload /tmp/tmpmq23jj5c.png
File to Upload, filename: /tmp/tmpmq23jj5c.png
Mime Type: image/png
Name in Bucket, basename: tmpmq23jj5c.png
START RequestId: e9062ca9-ed2c-11e9-99ee-e3a40680ga9d Version: $LATEST
module initialization error: expected string or bytes-like object
END RequestId: e9062ca9-ed2c-11e9-99ee-e3a40680ga9d
How can I upload a file within the context of a Lambda?
UPDATE: I've uploaded my code to a gist for review: https://gist.github.com/kjenney/068531ffe01e14bb7a2351dc55592551
I also moved the boto3 client connection up in my script thinking that might be hosing the upload but I still get the same error in the same order. 'process' is my handler function.

Your problem is this line:
client.upload_file(filename,my_bucket,basename)
From the documentation, the format is:
client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')
Note that the bucket name is a string. That's why the error says expected string.
However, your code sets my_bucket as:
my_bucket = s3.Bucket(bucket)
You should use the name of the bucket rather than the bucket object.

Related

Quicksight Dataset suddenly cant refresh from S3 anymore

I have a Quicksight dataset that’s been working fine for months pulling data from S3 via a manifest file but all of a sudden all the refreshes started failing with the errors below since yesterday: FAILURE_TO_PROCESS_JSON_FILE= Error details: S3 Iterator: S3 problem reading data
I’ve double checked the manifest file format and the S3 bucket permissions for Quicksight and everything seems fine and nothing has changed on our end for this to suddenly stop working out of the blue…
Manifest file:
{
"fileLocations": [
{
"URIPrefixes": [
"https://s3.amazonaws.com/solar-dash-live/"
]
}
],
"globalUploadSettings": {
"format": "JSON"
}
}
The error I get in the email alert is different and says "Amazon QuickSight couldn't parse a manifest file as valid JSON." However I verified that the above JSON is formatted correctly.
Also, if I create a new dataset with the same manifest file, it will show the data in the preview tool but its just the Refresh that fails so clearly the manifest file is formatted correctly if Quicksight is able to initially pull data from S3 but only fails later.

Run Cypher file present in S3 using apoc.cypher.runFile

I am trying to run call apoc.cypher.runFile(""), but it returns Failed to invoke procedure apoc.cypher.runFile: Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: (my presigned url for the s3 file)
I want to know if it is possible to import cypher scripts stored in the s3 bucket, by using the presigned url and the apoc.cypher.runFile stored procedure.
Please help!!
TIA.
Based on this documentation below, you need to add this in your config and restart the server.
We can enable reading files from anywhere on the file system by setting the following property in apoc.conf:
apoc.import.file.use_neo4j_config=false
Ref: https://neo4j.com/labs/apoc/4.4/overview/apoc.cypher/apoc.cypher.runFile/

How to update aws_lambda_function Terraform resource when ZIP package is changed on S3?

Zip package is uploaded to S3 not by Terraform.
Lambda is provisioned by Terraform aws_lambda_function resource. When I change Zip package on S3 and run terraform apply command, Terraform says nothing is changed.
There is source_code_hash field in aws_lambda_function resource that can be set to a hash of package content. But whatever value for this hash I provide it's not updated in Terraform state.
How to tell Terraform to update Lambda in case of Zip package update in S3?
After numerous experiments to verify how Terraform handles hash I found the following:
source_code_hash of aws_lambda_function resource is stored in Terraform state at the moment Lambda is provisioned.
source_code_hash is updated only if you provide a new value for it in aws_lambda_function resource and this new value corresponds to hash of actual Zip package in S3.
So Terraform checks actual hash code of the package on S3 only at this moment, it doesn't check it when we run terraform apply.
So to make it work we have the following options:
Download Zip package from S3, calculate its hash and pass it to source_code_hash field of aws_lambda_function resource OR
Upload Zip package to S3 by Terraform using aws_s3_bucket_object resource. Set source_hash field in that resource to save it in Terraform state. This value can be used by aws_lambda_function resource for updates.
Unfortunately this behaviour is not documented and I spent lots of time discovering it. Moreover it can be changed any moment since it's not documented and nobody knows that :-(
So how I solved this problem?
I generate base64-encoded SHA256 hash of Lambda Zip file and store it as metadata for actual Zip file. Then I read this metadata in Terraform and pass it to source_code_hash.
Details:
Generate hash using openssl dgst -binary -sha256 lambda_package.zip | openssl base64 command.
Store hash as metadata during package uploading using aws s3 cp lambda_package.zip s3://my-best-bucket/lambda_package.zip --metadata hash=[HASH_VALUE] command.
Pass hash to source_code_hash in Terraform
data "aws_s3_bucket_object" "package" {
bucket = "my-best-bucket"
key = "lambda_package.zip"
}
resource "aws_lambda_function" "main" {
...
source_code_hash = data.aws_s3_bucket_object.package.metadata.Hash
...
}
Another way of handling this if you can't do the hash of the s3 object you can do the version.
i.e.
data "aws_s3_bucket_object" "application_zip" {
bucket = var.apps_bucket
key = var.app_key
}
resource "aws_lambda_function" "lambda" {
function_name = var.function_name
s3_bucket = var.apps_bucket
s3_key = var.app_key
handler = var.handler
runtime = var.runtime
memory_size = var.memory_size
timeout = var.timeout
role = aws_iam_role.lambda.arn
s3_object_version = data.aws_s3_bucket_object.application_zip.version_id
}
This mean when the object version changes in s3 does the lambda deploy

Databricks to S3 - The backend could not get session tokens for path

I'm trying to move data from the dbfs databricks to my S3 bucket, however, I'm stuck in this error: The backend could not get session tokens for path /mnt/s3/mybucket-upload/product.csv.gz. Did you remove the AWS key for the mount point?
Moving dbfs:/tmp/databricks2s3/product/part-00000-tid-7154689887306924257-8bd689b8-fc4d-46a1-b207-8a6b51aade55-411806-1-c000.csv.gz to /mnt/s3/mybucket-upload/product.csv.gz
An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.mv.
: com.databricks.backend.daemon.data.common.InvalidMountException:
The backend could not get session tokens for path /mnt/s3/mybucket-upload/product.csv.gz. Did you remove the AWS key for the mount point?
at com.databricks.backend.daemon.data.common.InvalidMountException$.apply(DataMessages.scala:520)
at com.databricks.backend.daemon.data.filesystem.MountEntryResolver.resolve(MountEntryResolver.scala:61)
at com.databricks.backend.daemon.data.client.DBFSV2.resolve(DatabricksFileSystemV2.scala:81)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2$$anonfun$getFileStatus$1$$anonfun$apply$15.apply(DatabricksFileSystemV2.scala:757)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2$$anonfun$getFileStatus$1$$anonfun$apply$15.apply(DatabricksFileSystemV2.scala:756)
at com.databricks.s3a.S3AExeceptionUtils$.convertAWSExceptionToJavaIOException(DatabricksStreamUtils.scala:119)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2$$anonfun$getFileStatus$1.apply(DatabricksFileSystemV2.scala:756)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2$$anonfun$getFileStatus$1.apply(DatabricksFileSystemV2.scala:756)
at com.databricks.logging.UsageLogging$$anonfun$recordOperation$1.apply(UsageLogging.scala:440)
at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:251)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:246)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionContext(DatabricksFileSystemV2.scala:450)
at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:288)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionTags(DatabricksFileSystemV2.scala:450)
at com.databricks.logging.UsageLogging$class.recordOperation(UsageLogging.scala:421)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:450)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.getFileStatus(DatabricksFileSystemV2.scala:755)
at com.databricks.backend.daemon.data.client.DatabricksFileSystem.getFileStatus(DatabricksFileSystem.scala:201)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:496)
And here's how I set up the bucket:
try:
dbutils.fs.mount(
f's3n://{s3_accesskey_id}:{parse.quote(s3_secret_access_key, "")}#mybucket-upload/mylink',
'/mnt/s3/mybucket-upload/mylink')
except Exception as error:
if ('Directory already mounted' not in str(error)):
raise error
I tried to pass AWS credentials directly into the code, but it also doesn't work.
Interestingly, everything works perfectly in DEV

Insufficient log-delivery permissions when using AWS-cdk and aws lambda

I am trying to create a centralized logging bucket to then log all of other s3 buckets to using lambda and the aws-cdk. The centralized logging bucket has been created but there is an error when using lambda on it to write to it. Here is my code:
import boto3
s3 = boto3.resource('s3')
def handler(event, context):
setBucketPolicy(target_bucket='s3baselinestack-targetloggingbucketbab31bd5-b6y2hkvqz0of')
def setBucketPolicy(target_bucket):
for bucket in s3.buckets.all():
bucket_logging = s3.BucketLogging(bucket.name)
if not bucket_logging.logging_enabled:
reponse = bucket_logging.put(
BucketLoggingStatus={
'LoggingEnabled': {
'TargetBucket': target_bucket,
'TargetPrefix': f'{bucket.name}/'
}
},
)
print(reponse)
Here is my error:
START RequestId: 320e83c0-ba5e-4d54-a78c-a462d6e0cb87 Version: $LATEST
An error occurred (InvalidTargetBucketForLogging) when calling the PutBucketLogging operation: You must give the log-delivery group WRITE and READ_ACP permissions to the target bucket: ClientError
Traceback (most recent call last):
Note: Everything works but this log-delivery permission as when I enable it through the aws console it works fine but, I need to do it programmatically! Thank you in advance.
According to the documentation for S3 logging, you must grant the Log Delivery group WRITE and READ_ACP permissions on the target bucket for logs, and this is done using the S3 ACLs.
https://docs.aws.amazon.com/AmazonS3/latest/dev/enable-logging-programming.html#grant-log-delivery-permissions-general
When creating a new bucket with CDK, this is set using the accessControl property. The default value is BucketAccessControl.PRIVATE.
new s3.Bucket(this, 'bucket', {
accessControl: s3.BucketAccessControl.LOG_DELIVERY_WRITE
})
Since CloudFormation has no way to add ACLs to existing buckets this means that CDK also has no such method. With an existing bucket, add Log Delivery via the web console, the API, or the CLI with aws s3api put-bucket-acl.
Other services, such as CloudFront, don't use ACLs anymore and use IAM policies which can be added using bucket.addToResourcePolicy().
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_aws-s3.IBucket.html#add-wbr-to-wbr-resource-wbr-policypermission