Best practice to make S3 file accessible for Redshift through COPY operation for anyone

Best practice to make S3 file accessible for Redshift through COPY operation for anyone - amazon-s3

I want to publish a tutorial where a data from sample tsv file S3 is used by Redshift. Ideally I want it to be simple copy paste operation required to follow the exercises step by step, similar to what's in Load Sample Data from Amazon S3. The problem is with the first data import task using COPY command as it only supports S3, or EMR based load.
This seems like a simple requirement but no hassle-free way to really do it with Redshift COPY (I can make the file available for browser download without any problem but COPY requires CREDENTIALS parameter…)
Variety of options for Redshift COPY Authorization parameters is quite rich:
Should I ask user to Create an IAM Role for Amazon Redshift
himself?
Should I create it myself and publish the IAM role ARN? Sounds most hassle
free (copy paste) but security wise doesn't sound well…? Do I need to restrict S3 permissions to limit the access to only that particular file for that role?
Should I try temporary access instead?

You are correct:
Data can be imported into Amazon Redshift from Amazon S3 via the COPY command
The COPY command requires permission to access the data stored in Amazon S3. This can be granted either via:
Credentials (Access Key + Secret Key) associated with an IAM User, or
An IAM Role
You cannot create a Role for people and let them use it, because their Amazon Redshift cluster will be running in a different AWS Account than your IAM Role. You could possibly grant trust access so that other accounts can use the role, but this is not necessarily a wise thing to do.
As for credentials, they could either use their own or ones the you supply. They can access their own Access Key + Secret Key in the IAM console.
If you wish to supply credentials for them to use, you could create an IAM User that has permission only to access the Amazon S3 files they need. It is normally unwise to publish your AWS credentials because they might expose a security hole, so you should think carefully before doing this.
At the end of the day, it's probably best to show them the correct process so they understand how to obtain their own credentials. Security is very important in the cloud, so you would also be teaching them good security practice, in additional to Amazon Redshift itself.

Related

AWS Backup from S3 Access Denied

I am trying to setup a simple on-demand backup of an s3 bucket in AWS and anything I try I always get an access denied. See screenshot:
I have tried create a new bucket which is completely public, I've tried setting the access policy on the Vault, I've tried in different regions, all have the same result. Access Denied!
The messaging doesn't advise anything other than Access Denied, really helpful!
Can anyone give me some insight into what this message is referring to and more over how I can resolve this issue.

For aws backup, you need to set up a service role.
Traditionally you need 2 policies attached.
[AWSBackupServiceRolePolicyForBackup]
[AWSBackupServiceRolePolicyForRestore]
For S3, it seems there is a separate policy that you need to attach to your service role.
[AWSBackupServiceRolePolicyForS3Backup]
[AWSBackupServiceRolePolicyForS3Restore]

Just putting this here for those who will be looking for this answer.
To solve this problem for AWS CDK (javascript/typescript) you can use the following examples:
https://github.com/SimonJang/blog-aws-backup-s3/blob/68a05f8cb443411a23f02aa0c188adfe15bab0ff/infrastructure/lib/infrastructure-stack.ts#L63-L200
or this:
https://github.com/finnishtransportagency/hassu/blob/8adc0bea3193ff016a9aaa6abe0411292714bbb8/deployment/lib/hassu-database.ts#L230-L312

S3 Replication - s3:PutReplicationConfiguration

I have been attempting to introduce S3 bucket replication into my existing project's stack. I kept getting an 'API: s3:PutBucketReplication Access Denied' error in CloudFormation when updating my stack through my CodeBuild/CodePipeline project after adding the Replication rule on the source bucket + S3 replication role. For testing, I've added full S3 permission ( s3:* ) to the CodeBuild Role for all resources ( "*" ), as well as full S3 permissions on the S3 replication role -- again I got the same result.
Additionally, I tried running a stand-alone, stripped down version of the CF template (so not updating my existing application infrastructure stack) - which creates the buckets (source + target) and the S3 replication role. It was deployed/run through CloudFormation while logged in with my Admin role via the console and again I got the same error as when attempting the deployment with my CodeBuild role in CodePipeline.
As a last ditch sanity check, again being logged in using my admin role for the account, I attempted to perform the replication setup manually on buckets that I created using the S3 console and I got the below error:
You don't have permission to update the replication configuration
You or your AWS admin must update your IAM permissions to allow s3:PutReplicationConfiguration, and then try again. Learn more about Identity and access management in Amazon S3 API response
Access Denied
I confirmed that my role has full S3 access across all resources. This message seems to suggest to me that the permission s3:PutReplicationConfiguration may be different then other S3 permissions somehow - needing to be configured with root access to the account or something?
Also, it seems strange to me that CloudFormation indicates the s3:PutBucketReplication permission, where as the S3 console error references the permission s3:PutReplicationConfiguration. There doesn't seem to be an IAM action for s3:PutBucketReplication (ref: https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazons3.html) only s3:PutReplicationConfiguration.

Have you checked Permission Boundary? Is this in a corporate control tower or stand alone account?
Deny always wins so if you have a Permission Boundary that excludes some actions even when you have explicitly allowed it you may run into issues like this.

It turns out that the required permission (s3:PutReplicationConfiguration) was actually being blocked by a preventive ControlTower Guard Rail that was put in place on the OU the AWS account exists in. Unfortunately, this DENY is not visible as a user from anywhere within the AWS account, as it exists outside of any Permission Boundary or IAM Policy. This required some investigation from our internal IT team to identify the source of the DENY from the guard rail control.
https://docs.aws.amazon.com/controltower/latest/userguide/elective-guardrails.html#disallow-s3-ccr

S3 and semi-public bucket

I am doing some small devices running Debian. They need to sync a S3 bucket to a folder locally. I Have installed S3Tools and s3cmd sync seems to be the perfect tool. But I have to supply the Access Credentials and that seems VERY insecure. I will not be controlling the units once they ship so I need to somehow use the tool without supplying the credentials - AND I need to make sure the credentials can not delete in the bucket.
Does anyone have an idea as to how I go about this?
Regards, Jacob

Use IAM. It allows creation of AWS credentials with predefined permissions, which are under your control.
So you will create one identity per device. You are free to restrict access only to some buckets, keys.
You will not be able updating "device" credentials on your devices (this is simply your constrain), but in case some of your credentials will turn out as compromised, you still have the option to block it via IAM.
And for your primary "root" identity, I strongly recommend using two factor authentication (and of-course never put it to a device, you do not have control of).

Can I easily limit which files a user can download from an Amazon S3 server?

I have tried looking for an answer to this but I think I am perhaps using the wrong terminology so I figure I will give this a shot.
I have a Rails app where a company can have an account with multiple users each with various permissions etc. Part of the system will be the ability to upload files and I am looking at S3 for storage. What I want is the ability to say that users from Company A can only download the files associated with that company?
I get the impression I can't unless I restrict the downloads to my deployment servers IP range (which will be Heroku) and then feed the files through a controller and a send_file() call. This would work but then I am reading data from S3 to Heroku then back to the user vs. direct from S3 to the user.
If I went with the send_file method can I close off my S3 server to the outside world and have my Heroku app send the file direct?
A less secure idea I had was to create a unique slug for each file and store it under that name to prevent random guessing of files i.e. http://mys3server/W4YIU5YIU6YIBKKD.jpg etc. This would be quick and dirty but not 100% secure.

Amazon S3 Buckets support policies for granting or denying access based on different conditions. You could probably use those to protect your files from different user groups. Have a look at the policy documentation to get an idea what is possible. After that you can switch over to the AWS policy generator to generate a valid policy depending on your needs.

Should I use the account-level access keys in AWS or should I stick with user-specific ones?

I'm storing all my content in AWS S3 and I would like to know which is the best approach to retrieve my images:
should I use the account access keys or should I create a user with the correct policies and then use the access keys for that "user"?

Always always always create users with their own IAM policies. You should never use the root account credentials to do anything if you can help it.
It's like permanently running commands on your local machine as the root user. The account-level access and secret access keys are the absolute keys to the kingdom. With them, a hacker, malicious employee, or well-intentioned-but-prone-to-accidents administrator could completely destroy every AWS resource you have, download anything off them, and in general cause chaos and discord. Even machines with pem files aren't safe. A root-level user could just cut an AMI off an existing machine.
Take a look at the IAM policy generator. Writing JSON policies is not fun and error prone, but tools like that one will help you get most of the way there.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Best practice to make S3 file accessible for Redshift through COPY operation for anyone - amazon-s3

Related

AWS Backup from S3 Access Denied

S3 Replication - s3:PutReplicationConfiguration

S3 and semi-public bucket

Can I easily limit which files a user can download from an Amazon S3 server?

Should I use the account-level access keys in AWS or should I stick with user-specific ones?

Categories

Resources