Spark unable to write to S3 Encrypted Bucket even after specifying the hadoopConfigs - amazon-emr

When i try to write to an S3 Bucket which is AES-256 Encrypted from my Spark Streaming App running on EMR it is throwing 403. For what ever reason the Spark Session is not honoring the "fs.s3a.server-side-encryption-algorithm" config option.
Here is the code i am using.
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.access.key",accessKeyId);
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", secretKeyId);
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.server-side-encryption-algorithm","AES256");
When i use regular Java Code using AWS SDK i can upload the files without any issues.
Some how the Spark Session is not honoring this.
Thanks
Sateesh

Able to resolve it. Silly mistake on my part.
We need to have the following property as well.
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3.enableServerSideEncryption","true");

Related

Issues writing to S3 using s3-streamlogger

I'm writing in NodeJS and trying to send winston log data to S3 bucket using s3-streamlogger, but I get access denied.
Testing it from the CLI it is working fine read and write. the only reason I can think of is that we are using MFA in our AWS account.
Any ideas? workarounds?
Thanks

Redshift COPY command failing to Load Data from S3

We are facing error while we are trying to load a huge zip file from S3 bucket to redshift from EC2 instance and even aginity. Waht is the real issue here?
As far as we have checked this can be because of the VPC NACL rules but not sure.
Error :
ERROR: Connection timed out after 50000 milliseconds
I also got this error and the Enhanced VPC Routing is enabled , check the routing from your Redshift cluster to S3.
There are several ways to let the Redshift cluster reach S3 , you can see the link below:
https://docs.aws.amazon.com/redshift/latest/mgmt/enhanced-vpc-routing.html
I solved this error by setting NAT for my private subnet which is used by my Redshift cluster.
I think you are correct, it might be because bucket access rules or secret/access keys.
Here are some pointers to debug it further if above doesn't work.
Create a small zip file, then try again if its something because of Size(but I don't think it is possible case.)
Split your zip file into multiple zip files and create Manifest file for loading rather then single file.
I hope your will find this useful.
You should create an IAM role which authorizes Amazon Redshift to access other AWS services like S3 on your behalf, you must associate that role with an Amazon Redshift cluster before you can use the role to load or unload data.
Check below link for setting up IAM role:
https://docs.aws.amazon.com/redshift/latest/mgmt/copy-unload-iam-role.html
I got this error when the Redshift cluster had Enhanced VPC Routing enabled, but no route in the route table for S3. Adding the S3 endpoint fixed the issue. Link to docs.

Can Apache Drill handle KMS encrypted files?

I've been experimenting with Apache drill and can successfully query a CSV file in my S3 bucket that is not KMS encrypted. But, when I try to query the exact same file that has been KMS encrypted, I get an error.
Is Apache capable of handling KMS encrypted files? And, if so, how?
Looks like Drill doesn't support it yet. Feel free to create a Jira ticket for it with providing details and use cases.
https://issues.apache.org/jira/projects/DRILL

Can't access personal s3 server with boto3

I have a private installation of a server which is fully s3-compatible. I have one bucket there and I can check it using s3 browser. I am trying to interact with the server using boto3 for python (using the same credentials that I use in s3 browser), however, for any request I get NoSuchBucket error. This is my code:
s3 = boto3.resource('s3',
endpoint_url=hostname,
use_ssl=False,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key
)
for bucket in s3.buckets.all():
print(bucket.name)
Initially I thought there was an issue with credentials, but then I was able to interact with the server through s3 browser client.
So the problem is: I really don't understand the error code, since I am not querying any particular bucket. What could be the cause of the problem?
Problem solved! It was a DNS resolution issue.

broken pipe error with rails 3 while trying to upload data to AWS-S3

I am trying to upload some static data to my aws s3 account.
I am using aws/s3 gem for this purpose.
I have a simple upload button on my webpage which hits the controller where it create the AWS connection and tries uploading data to AWS S3.
The connection to the AWS is successful, how-ever while trying to store data in S3, i get following error : Errno::EPIPE:Broken pipe" ...always.
I tried running the same piece of code from s3sh (S3 Shell) and i am able to execute all calls properly.
Am i missing something here?? its been quite some time now since i am facing this issue.
My config are : ruby 1.8, rails 3, mongrel, s3 bucket region us.
any help will be great.
I think the broken pipe error could mean a lot of things. I was experiencing it just now and it was because the bucket name in my s3.yml configuration file didn't match the name of the bucket I created on Amazon (typo).
So for people running into this answer in the future, it could be something as silly and simple as that.
In my case the problem was with the file size. S3 puts a limit of 5GB on single file uploads. Chopping up the file into several 500MB files worked for me.
I also had this issue uploading my application.css which had compiled file size > 1.1MB. I set the fog region with:
config.fog_region = 'us-west-2'
and that seems to have fixed the issue for me...