Why python code is unable to run inside a EKS pod container? - amazon-s3

I have EKS cluster setup, where in a pod I'm downloading s3 bucket objects. I have added service account with role of s3 full access and KMS. But I'm unable to download.
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
During handling of the above exception, another exception occurred:
Things I have tried:
Exec into pod and run python code python3 s3_downloads.py
In this s3 botocore config adding access key and secret key works well.
s3 = boto3.resource('s3', access-key,secret-key)
Making buckets public.
Even though I have added proper role attached to service account,im unable to download. Is any configs am i missng ? Any help would really be appreaciated.

In my case, the issue was that I was missing the session creation. Here is my original code:
client = boto3.client('s3')
The fixed code:
session = boto3.Session()
s3 = session.client('s3')
I was executing this code on a Pod running on a EKS cluster and the missing line was preventing me to use the ServiceAccount role I had defined for this deployment.
Details about boto3 and sessions management on this post https://ben11kehoe.medium.com/boto3-sessions-and-why-you-should-use-them-9b094eb5ca8e

Related

Databricks to S3 - The backend could not get session tokens for path

I'm trying to move data from the dbfs databricks to my S3 bucket, however, I'm stuck in this error: The backend could not get session tokens for path /mnt/s3/mybucket-upload/product.csv.gz. Did you remove the AWS key for the mount point?
Moving dbfs:/tmp/databricks2s3/product/part-00000-tid-7154689887306924257-8bd689b8-fc4d-46a1-b207-8a6b51aade55-411806-1-c000.csv.gz to /mnt/s3/mybucket-upload/product.csv.gz
An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.mv.
: com.databricks.backend.daemon.data.common.InvalidMountException:
The backend could not get session tokens for path /mnt/s3/mybucket-upload/product.csv.gz. Did you remove the AWS key for the mount point?
at com.databricks.backend.daemon.data.common.InvalidMountException$.apply(DataMessages.scala:520)
at com.databricks.backend.daemon.data.filesystem.MountEntryResolver.resolve(MountEntryResolver.scala:61)
at com.databricks.backend.daemon.data.client.DBFSV2.resolve(DatabricksFileSystemV2.scala:81)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2$$anonfun$getFileStatus$1$$anonfun$apply$15.apply(DatabricksFileSystemV2.scala:757)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2$$anonfun$getFileStatus$1$$anonfun$apply$15.apply(DatabricksFileSystemV2.scala:756)
at com.databricks.s3a.S3AExeceptionUtils$.convertAWSExceptionToJavaIOException(DatabricksStreamUtils.scala:119)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2$$anonfun$getFileStatus$1.apply(DatabricksFileSystemV2.scala:756)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2$$anonfun$getFileStatus$1.apply(DatabricksFileSystemV2.scala:756)
at com.databricks.logging.UsageLogging$$anonfun$recordOperation$1.apply(UsageLogging.scala:440)
at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:251)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:246)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionContext(DatabricksFileSystemV2.scala:450)
at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:288)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionTags(DatabricksFileSystemV2.scala:450)
at com.databricks.logging.UsageLogging$class.recordOperation(UsageLogging.scala:421)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:450)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.getFileStatus(DatabricksFileSystemV2.scala:755)
at com.databricks.backend.daemon.data.client.DatabricksFileSystem.getFileStatus(DatabricksFileSystem.scala:201)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:496)
And here's how I set up the bucket:
try:
dbutils.fs.mount(
f's3n://{s3_accesskey_id}:{parse.quote(s3_secret_access_key, "")}#mybucket-upload/mylink',
'/mnt/s3/mybucket-upload/mylink')
except Exception as error:
if ('Directory already mounted' not in str(error)):
raise error
I tried to pass AWS credentials directly into the code, but it also doesn't work.
Interestingly, everything works perfectly in DEV

AWS S3 Connection in druid

I have set up a clustered Druid with the configuration as mentioned in the Druid documentation
https://druid.apache.org/docs/latest/tutorials/cluster.html
I am using AWS S3 for deep storage. Following is the snippet of my common configuration file
druid.extensions.loadList=["druid-datasketches", "mysql-metadata-storage", "druid-s3-extensions", "druid-orc-extensions", "druid-lookups-cached-global"]
# For S3:
druid.storage.type=s3
druid.storage.bucket=bucket-name
druid.storage.baseKey=druid/segments
#druid.storage.disableAcl=true
druid.storage.sse.type=s3
#druid.s3.accessKey=...
#druid.s3.secretKey=...
# For S3:
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=bucket-name
druid.indexer.logs.s3Prefix=druid/stage/indexing-logs
While running any ingestion task I am getting Access denied error
Java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ; S3 Extended Request ID: ), S3 Extended Request ID:
at org.apache.druid.storage.s3.S3DataSegmentPusher.push(S3DataSegmentPusher.java:103) ~[?:?]
at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$mergeAndPush$4(AppenderatorImpl.java:791) ~[druid-server-0.19.0.jar:0.19.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) ~[druid-core-0.19.0.jar:0.19.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) ~[druid-core-0.19.0.jar:0.19.0]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) ~[druid-core-0.19.0.jar:0.19.0]
I am using s3 for two purposes
read data from s3 and ingest it. This connection is working fine and data is being from s3 location
for deep storage. I am getting error over here.
I am using Profile information authentication method to provide s3 credential. So I already have configured aws cli with appropriate credentials. Also, s3 data is encrypted by AES256 so i have added druid.storage.sse.type=s3 in config file.
Can someone help me out here as I am not able to debug the issue.
You asked how to approach debugging this. Normally I would:
Ssh onto the ec2 instance and run aws sts get-caller-identity. This will tell you what principal your requests are sent from. Then, I would confirm that principal has the S3 access that is expected.
I would confirm that I can write to the bucket in your configuration.
druid.storage.type=s3
druid.storage.bucket=<bucket-name>
druid.storage.baseKey=druid/segments
I would try some of the other auth methods such as exporting the keys into the environment mentioned in the third option since that is a simple test. Then I would run step 1 again to confirm my principal reflects those keys. And then I would try running your code again.

Insufficient log-delivery permissions when using AWS-cdk and aws lambda

I am trying to create a centralized logging bucket to then log all of other s3 buckets to using lambda and the aws-cdk. The centralized logging bucket has been created but there is an error when using lambda on it to write to it. Here is my code:
import boto3
s3 = boto3.resource('s3')
def handler(event, context):
setBucketPolicy(target_bucket='s3baselinestack-targetloggingbucketbab31bd5-b6y2hkvqz0of')
def setBucketPolicy(target_bucket):
for bucket in s3.buckets.all():
bucket_logging = s3.BucketLogging(bucket.name)
if not bucket_logging.logging_enabled:
reponse = bucket_logging.put(
BucketLoggingStatus={
'LoggingEnabled': {
'TargetBucket': target_bucket,
'TargetPrefix': f'{bucket.name}/'
}
},
)
print(reponse)
Here is my error:
START RequestId: 320e83c0-ba5e-4d54-a78c-a462d6e0cb87 Version: $LATEST
An error occurred (InvalidTargetBucketForLogging) when calling the PutBucketLogging operation: You must give the log-delivery group WRITE and READ_ACP permissions to the target bucket: ClientError
Traceback (most recent call last):
Note: Everything works but this log-delivery permission as when I enable it through the aws console it works fine but, I need to do it programmatically! Thank you in advance.
According to the documentation for S3 logging, you must grant the Log Delivery group WRITE and READ_ACP permissions on the target bucket for logs, and this is done using the S3 ACLs.
https://docs.aws.amazon.com/AmazonS3/latest/dev/enable-logging-programming.html#grant-log-delivery-permissions-general
When creating a new bucket with CDK, this is set using the accessControl property. The default value is BucketAccessControl.PRIVATE.
new s3.Bucket(this, 'bucket', {
accessControl: s3.BucketAccessControl.LOG_DELIVERY_WRITE
})
Since CloudFormation has no way to add ACLs to existing buckets this means that CDK also has no such method. With an existing bucket, add Log Delivery via the web console, the API, or the CLI with aws s3api put-bucket-acl.
Other services, such as CloudFront, don't use ACLs anymore and use IAM policies which can be added using bucket.addToResourcePolicy().
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_aws-s3.IBucket.html#add-wbr-to-wbr-resource-wbr-policypermission

How to programmatically set up Airflow 1.10 logging with localstack s3 endpoint?

In attempt to setup airflow logging to localstack s3 buckets, for local and kubernetes dev environments, I am following the airflow documentation for logging to s3. To give a little context, localstack is a local AWS cloud stack with AWS services including s3 running locally.
I added the following environment variables to my airflow containers similar to this other stack overflow post in attempt to log to my local s3 buckets. This is what I added to docker-compose.yaml for all airflow containers:
- AIRFLOW__CORE__REMOTE_LOGGING=True
- AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://local-airflow-logs
- AIRFLOW__CORE__REMOTE_LOG_CONN_ID=MyS3Conn
- AIRFLOW__CORE__ENCRYPT_S3_LOGS=False
I've also added my localstack s3 creds to airflow.cfg
[MyS3Conn]
aws_access_key_id = foo
aws_secret_access_key = bar
aws_default_region = us-east-1
host = http://localstack:4572 # s3 port. not sure if this is right place for it
Additionally, I've installed apache-airflow[hooks], and apache-airflow[s3], though it's not clear which one is really needed based on the documentation.
I've followed the steps in a previous stack overflow post in attempt verify if the S3Hook can write to my localstack s3 instance:
from airflow.hooks import S3Hook
s3 = S3Hook(aws_conn_id='MyS3Conn')
s3.load_string('test','test',bucket_name='local-airflow-logs')
But I get botocore.exceptions.NoCredentialsError: Unable to locate credentials.
After adding credentials to airflow console under /admin/connection/edit as depicted:
this is the new exception, botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records. is returned. Other people have encountered this same issue and it may have been related to networking.
Regardless, a programatic setup is needed, not a manual one.
I was able to access the bucket using a standalone Python script (entering AWS credentials explicitly with boto), but it needs to work as part of airflow.
Is there a proper way to set up host / port / credentials for S3Hook by adding MyS3Conn to airflow.cfg?
Based on the airflow s3 hooks source code, it seems a custom s3 URL may not yet be supported by airflow. However, based on the airflow aws_hook source code (parent) it seems it should be possible to set the endpoint_url including port, and it should be read from airflow.cfg.
I am able to inspect and write to my s3 bucket in localstack using boto alone. Also, curl http://localstack:4572/local-mochi-airflow-logs returns the contents of the bucket from the airflow container. And aws --endpoint-url=http://localhost:4572 s3 ls returns Could not connect to the endpoint URL: "http://localhost:4572/".
What other steps might be needed to log to localstack s3 buckets from airflow running in docker, with automated setup and is this even supported yet?
I think you're supposed to use localhost not localstack for the endpoint, e.g. host = http://localhost:4572.
In Airflow 1.10 you can override the endpoint on a per-connection basis but unfortunately it only supports one endpoint at a time so you'd be changing it for all AWS hooks using the connection. To override it, edit the relevant connection and in the "Extra" field put:
{"host": "http://localhost:4572"}
I believe this will fix it?
I managed to make this work by referring to this guide. Basically you need to create a connection using the Connection class and pass the credentials that you need, in my case I needed AWS_SESSION_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, REGION_NAME to make this work. Use this function as a python_callable in a PythonOperator which should be the first part of the DAG.
import os
import json
from airflow.models.connection import Connection
from airflow.exceptions import AirflowFailException
def _create_connection(**context):
"""
Sets the connection information about the environment using the Connection
class instead of doing it manually in the Airflow UI
"""
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_SESSION_TOKEN = os.getenv("AWS_SESSION_TOKEN")
REGION_NAME = os.getenv("REGION_NAME")
credentials = [
AWS_SESSION_TOKEN,
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
REGION_NAME,
]
if not credentials or any(not credential for credential in credentials):
raise AirflowFailException("Environment variables were not passed")
extras = json.dumps(
dict(
aws_session_token=AWS_SESSION_TOKEN,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=REGION_NAME,
),
)
try:
Connection(
conn_id="s3_con",
conn_type="S3",
extra=extras,
)
except Exception as e:
raise AirflowFailException(
f"Error creating connection to Airflow :{e!r}",
)

spring-cloud-aws not able to put files to S3 when run from EC2

I am trying to put some files in S3 bucket through my Spring Boot app using AmazonS3Client. In AWS, I created an IAM user (test_user1) and granted S3 full access rights to this user. Also in S3, I granted "s3:*" actions to this user. The same user's credentials are specified for cloud.aws.credentials.accessKey and cloud.aws.credentials.secretKey in my config files.
When I run the app from my local computer, it works fine. I am able to put multiple files in S3 bucket and view the files.
But, when the same app is run from an AWS EC2 instance, I get bellow errors at application start:
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate org.springframework.cloud.aws.core.env.stack.config.StackResourceRegistryFactoryBean]: Factory method 'stackResourceRegistryFactoryBean' threw exception; nested exception is com.amazonaws.AmazonServiceException: User: arn:aws:iam::560600000009:user/test_user1 is not authorized to perform: cloudformation:DescribeStackResources (Service: AmazonCloudFormation; Status Code: 403; Error Code: AccessDenied;
Is there something else I have to set when accessing S3 from code running in EC2 instance? I am not using Amazon Cloud Formation.
Here is how my project looks like:
build.gradle :
compile 'org.springframework.cloud:spring-cloud-aws-autoconfigure:1.0.3.RELEASE'
compile 'org.springframework.cloud:spring-cloud-aws-context:1.0.3.RELEASE'
application.yml:
bucket: test-bucket-1
cloud.aws.credentials.accessKey: AxxxxxxxxxxxxxxA
cloud.aws.credentials.secretKey: jxxxxxxxxxxxxxxR
cloud.aws.credentials.instanceProfile: true
AmazonS3Client is autowired in my service class.
#Autowired
public FileService(AmazonS3Client s3Client) {..}
Spring Cloud AWS tries to autoconfigure CloudFormation (when the app runs in EC2).
I solved this error disabling autoconf in application.properties
cloud.aws.stack.auto=false
Read this for more info http://cloud.spring.io/spring-cloud-aws/spring-cloud-aws.html#_automatic_cloudformation_configuration.