Terraform Shared State - amazon-s3

Terraform 0.9.5.
I am in the process of putting together a group of modules that our infrastructure team and automation team will use to create resources in a standard fashion and in turn create stacks to provision different envs. All working well.
Like all teams using terraform shared state becomes a concern. I have configured terraform to use a s3 backend, that is versioned and encrypted, added a lock via a dynamo db table. Perfect. All works with local accounts... Okay the problem...
We have multiple aws accounts, 1 for IAM, 1 for billing, 1 for production, 1 for non-production, 1 for shared services etc... you get where I am going. My problem is as follows.
I authenticate as user in our IAM account and assume the required role. This has been working like a dream until i introduced terraform backend configuration to utilise s3 for shared state. It looks like the backend config within terraform requires default credentials to be set within ~/.aws/credentials. It also looks like these have to be a user that is local to the account where the s3 bucket was created.
Is there a way to get the backend configuration setup in such a way that it will use the creds and role configured within the provider? Is there a better way to configured shared state and locking? Any suggestions welcome :)
Update:Got this working. I created a new user within the account where the s3 bucket is created. Created a policy to just allow that new user s3:DeleteObject,GetObject,PutObject,ListBucket and dynamodb:* on the specific s3 bucket and dynamodb table. Created a custom credentials file and added default profile with access and secret keys assigned to that new user. Used the backend config similar to
terraform {
required_version = ">= 0.9.5"
backend "s3" {
bucket = "remote_state"
key = "/NAME_OF_STACK/terraform.tfstate"
region = "us-east-1"
encrypt = "true"
shared_credentials_file = "PATH_TO_CUSTOM_CREDENTAILS_FILE"
lock_table = "MY_LOCK_TABLE"
}
}
It works but there is an initial configuration that needs to happen within your profile to get it working. If anybody knows of a better setup or can identify problems with my backend config please let me know.

Terraform expects backend configuration to be static, and does not allow it to include interpolated variables as might be true elsewhere in the config due to the need for the backend to be initialized before any other work can be done.
Due to this, applying the same config multiple times using different AWS accounts can be tricky, but is possible in one of two ways.
The lowest-friction way is to create a single S3 bucket and DynamoDB table dedicated to state storage across all environments, and use S3 permissions and/or IAM policies to impose granular access controls.
Organizations adopting this strategy will sometimes create the S3 bucket in a separate "adminstrative" AWS account, and then grant restrictive access to the individual state objects in the bucket to the specific roles that will run Terraform in each of the other accounts.
This solution has the advantage that once it has been set up correctly in S3 Terraform can be used routinely without any unusual workflow: configure the single S3 bucket in the backend, and provide appropriate credentials via environment variables to allow them to vary. Once the backend is initialized, use workspaces (known as "state environments" prior to Terraform 0.10) to create a separate state for each of the target environments of a single configuration.
The disadvantage is the need to manage a more-complicated access configuration around S3, rather than simply relying on coarse access control with whole AWS accounts. It is also more challenging with DynamoDB in the mix, since the access controls on DynamoDB are not as flexible.
There is a more complete description of this option in the Terraform s3 provider documentation, Multi-account AWS Architecture.
If a complex S3 configuration is undesirable, the complexity can instead be shifted into the Terraform workflow by using partial configuration. In this mode, only a subset of the backend settings are provided in config and additional settings are provided on the command line when running terraform init.
This allows options to vary between runs, but since it requires extra arguments to be provided most organizations adopting this approach will use a wrapper script to configure Terraform appropriately based on local conventions. This can be just a simple shell script that runs terraform init with suitable arguments.
This then allows to vary, for example, the custom credentials file by providing it on the command line. In this case, state environments are not used, and instead switching between environments requires re-initializing the working directory against a new backend configuration.
The advantage of this solution is that it does not impose any particular restrictions on the use of S3 and DynamoDB, as long as the differences can be represented as CLI options.
The disadvantage is the need for unusual workflow or wrapper scripts to configure Terraform.

Related

Read bucket object in CDK

In terraform to read an object from s3 bucket at the time of deployment I can use data source
data aws_s3_bucket_object { }
Is there a similar concept in CDK? I've seen various methods of uploading assets to s3, as well as importing an existing bucket, but not getting an object from the bucket. I need to read a configuration file from the bucket that will affect further deployment.
Its important to remember that CDK itself is not a deployment option. it can deploy, but the code you are writing in a cdk stack is the definition of your resources - not a method for deployment.
So, you can do one of a few things.
Use your SDK for your language to make a call to the s3 bucket and load the data directly. This is perfectly acceptable and an understood way to gather information you need before deployment - each time the stack Synths (which it does before every cdk deploy that code will run and will pull your data.
Use a CodePipeline to set up a proper pipeline, and give it two sources - one your version control repo and the second your s3 bucket:
https://docs.aws.amazon.com/codebuild/latest/userguide/sample-multi-in-out.html
The preferred way - drop the json file, and use Parameter Store. CDK contains modules that will create a token version of this parameter on synth, and when it deploys it will reference that properly back to the Systems Manager Parameter store
https://docs.aws.amazon.com/cdk/v2/guide/get_ssm_value.html
If your parameters change after deployment, you can have that as part of your cdk stack pretty easily (using cfn outputs). If they change in the middle/during deployment, you really need to be using a CodePipeline to manage these steps instead of just CDK.
Because remember: The cdk deploy option is just a convenience. It will execute everything and has no way to pause in the middle and execute specific steps. (other than a very basic, this depends on this resources)

Does Serverless, Inc ever see my AWS credentials?

I would like to start using serverless-framework to manage lambda deploys at my company, but we handle PHI so security’s tight. Our compliance director and CTO had concerns about passing our AWS key and secret to another company.
When doing a serverless deploy, do AWS credentials ever actually pass through to Serverless, Inc?
If not, can someone point me to where in the code I can prove that?
Thanks!
Running serverless deploy isn't just one call, it's many.
AWS example (oversimplification):
Check if deployment s3 bucket already exists
Create an S3 bucket
Upload packages to s3 bucket
Call CloudFormation
Check CloudFormation stack status
Get info of created recourses (e.g. endpoint urls of created APIs)
And those calls can change dependent on what you are doing and what you have done before.
The point I'm trying to make is is that these calls which contain your credentials are not all located in one place and if you want to do a full code review of Serverless Framework and all it's dependencies, have fun with that.
But under the hood, we know that it's actually using the JavaScript aws-sdk (go check out the package.json), and we know what endpoints that uses {service}.{region}.amazonaws.com.
So to prove to your employers that nothing with your credentials is going anywhere except AWS you can just run a serverless deploy with wireshark running (other network packet analyzers are available). That way you can see anything that's not going to amazonaws.com
But wait, why are calls being made to serverless.com and serverlessteam.com when I run a deploy?
Well that's just tracking some stats and you can see what they track here. But if you are uber paranoid, this can be turned off with serverless slstats --disable.

Access files stored on Amazon S3 through web browser

Current Situation
I have a project on GitHub that builds after every commit on Travis-CI. After each successful build Travis uploads the artifacts to an S3 bucket. Is there some way for me to easily let anyone access the files in the bucket? I know I could generate a read-only access key, but it'd be easier for the user to access the files through their web browser.
I have website hosting enabled with the root document of "." set.
However, I still get an 403 Forbidden when trying to go to the bucket's endpoint.
The Question
How can I let users easily browse and download artifacts stored on Amazon S3 from their web browser? Preferably without a third-party client.
I found this related question: Directory Listing in S3 Static Website
As it turns out, if you enable public read for the whole bucket, S3 can serve directory listings. Problem is they are in XML instead of HTML, so not very user-friendly.
There are three ways you could go for generating listings:
Generate index.html files for each directory on your own computer, upload them to s3, and update them whenever you add new files to a directory. Very low-tech. Since you're saying you're uploading build files straight from Travis, this may not be that practical since it would require doing extra work there.
Use a client-side S3 browser tool.
s3-bucket-listing by Rufus Pollock
s3-file-list-page by Adam Pritchard
Use a server-side browser tool.
s3browser (PHP)
s3index Scala. Going by the existence of a Procfile, it may be readily deployable to Heroku. Not sure since I don't have any experience with Scala.
Filestash is the perfect tool for that:
login to your bucket from https://www.filestash.app/s3-browser.html:
create a shared link:
Share it with the world
Also Filestash is open source. (Disclaimer: I am the author)
I had the same problem and I fixed it by using the
new context menu "Make Public".
Go to https://console.aws.amazon.com/s3/home,
select the bucket and then for each Folder or File (or multiple selects) right click and
"make public"
You can use a bucket policy to give anonymous users full read access to your objects. Depending on whether you need them to LIST or just perform a GET, you'll want to tweak this. (I.e. permissions for listing the contents of a bucket have the action set to "s3:ListBucket").
http://docs.aws.amazon.com/AmazonS3/latest/dev/AccessPolicyLanguage_UseCases_s3_a.html
Your policy will look something like the following. You can use the S3 console at http://aws.amazon.com/console to upload it.
{
"Version":"2008-10-17",
"Statement":[{
"Sid":"AddPerm",
"Effect":"Allow",
"Principal": {
"AWS": "*"
},
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::bucket/*"
]
}
]
}
If you're truly opening up your objects to the world, you'll want to look into setting up CloudWatch rules on your billing so you can shut off permissions to your objects if they become too popular.
https://github.com/jupierce/aws-s3-web-browser-file-listing is a solution I developed for this use case. It leverages AWS CloudFront and Lambda#Edge functions to dynamically render and deliver file listings to a client's browser.
To use it, a simple CloudFormation template will create an S3 bucket and have your file server interface up and running in just a few minutes.
There are many viable alternatives, as already suggested by other posters, but I believe this approach has a unique range of benefits:
Completely serverless and built for web-scale.
Open source and free to use (though, of course, you must pay AWS for resource utilization -- such S3 storage costs).
Simple / static client browser content:
No Ajax or third party libraries to worry about.
No browser compatibility worries.
All backing systems are native AWS components.
You never share account credentials or rely on 3rd party services.
The S3 bucket remains private - allowing you to only expose parts of the bucket.
A custom hostname / SSL certificate can be established for your file server interface.
Some or all of the host files can be protected behind Basic Auth username/password.
An AWS WebACL can be configured to prevent abusive access to the service.

Is it possible to restrict access from EC2 instance to use only S3 buckets from specific account?

Goal: I would like to keep sensitive data in s3 buckets and process it on EC2 instances, located in the private cloud. I researched that there is possbility to set up S3 buckets policy by IP and user(iam) arn's thus i consider that data in s3 bucket is 'on the safe side'. But i am worriyng about the next scenario: 1) there is vpc 2) inside theres is an ec2 isntance 3) there is an user under controlled(allowed) account with permissions to connect and work with ec2 instance and buckets. Buckets are defined and configured to work with only with known(authorized) ec2-instances. Security leak: user uploads malware application on ec2 instance and during processing data executes malware application that transfer data to other(unauthorized) buckets under different AWS account. Disabling uploading data to ec2-instance is not an option in my case. Question: is it possible to restrict access on vpc firewal in such way that it will be access to some specific s3 buckets but it will be denied access to any other buckets? Assumed that user might upload malware application to ec2 instance and within it upload data to other buckets(under third-party AWS account).
There is not really a solution for what you are asking, but then again, you seem to be attempting to solve the wrong problem (if I understand your question correctly).
If you have a situation where untrustworthy users are in a position where they are able to "connect and work with ec2 instance and buckets" and upload and execute application code inside your VPC, then all bets are off and the game is already over. Shutting down your application is the only fix available to you. Trying to limit the damage by preventing the malicious code from uploading sensitive data to other buckets in S3 should be the absolute least of your worries. There are so many other options available to a malicious user other than putting the data back into S3 but in a different bucket.
It's also possible that I am interpreting "connect and work with ec2 instance and buckets" more broadly than you intended, and all you mean is that users are able to upload data to your application. Well, okay... but your concern still seems to be focused on the wrong point.
I have applications where users can upload data. They can upload all the malware they want, but there's no way any code -- malicious or benign -- that happens to be contained in the data they upload will ever get executed. My systems will never confuse uploaded data with something to be executed or handle it in a way that this is even remotely possible. If your code will, then you again have a problem that can only be fixed by fixing your code -- not by restricting which buckets your instance can access.
Actually, I lied, when I said there wasn't a solution. There is a solution, but it's fairly preposterous:
Set up a reverse web proxy, either in EC2 or somewhere outside, but of course make its configuration inaccessible to the malicious users. In this proxy's configuration, configure it to only allow access to the desired bucket. With apache, for example, if the bucket were called "mybucket," that might look something like this:
ProxyPass /mybucket http://s3.amazonaws.com/mybucket
Additional configuration on the proxy would deny access to the proxy from anywhere other than your instance. Then instead of allowing your instance to access the s3 endpoints directly, only allow outbound http toward the proxy (via the security group for the compromised instance). Requests for buckets other than yours will not make it through the proxy, which is now the only way "out." Problem solved. At least, the specific problem you were hoping to solved should be solvable by some variation of this approach.
Update to clarify:
To access the bucket called "mybucket" in the normal way, there are two methods:
http://s3.amazonaws.com/mybucket/object_key
http://mybucket.s3.amazonaws.com/object_key
With this configuration, you would block (not allow) all access to all S3 endpoints from your instances via your security group configuration, which would prevent accessing buckets with either method. You would, instead, allow access from your instances to the proxy.
If the proxy, for example, were at 172.31.31.31 then you would access buckets and their objects like this:
http://172.31.31.31/mybucket/object_key
The proxy, being configured to only permit certain patterns in the path to be forwarded -- and any others denied -- would be what controls whether a particular bucket is accessible or not.
Use VPC Endpoints. This allows you to restrict which S3 buckets your EC2 instances in a VPC can access. It also allows you to create a private connection between your VPC and the S3 service, so you don't have to allow wide open outbound internet access. There are sample IAM policies showing how to control access to buckets.
There's an added bonus with VPC Endpoints for S3 that certain major software repos, such as Amazon's yum repos and Ubuntu's apt-get repos, are hosted in S3 so you can also allow your EC2 instances to get their patches without giving them wide open internet access. That's a big win.

Should I use the account-level access keys in AWS or should I stick with user-specific ones?

I'm storing all my content in AWS S3 and I would like to know which is the best approach to retrieve my images:
should I use the account access keys or should I create a user with the correct policies and then use the access keys for that "user"?
Always always always create users with their own IAM policies. You should never use the root account credentials to do anything if you can help it.
It's like permanently running commands on your local machine as the root user. The account-level access and secret access keys are the absolute keys to the kingdom. With them, a hacker, malicious employee, or well-intentioned-but-prone-to-accidents administrator could completely destroy every AWS resource you have, download anything off them, and in general cause chaos and discord. Even machines with pem files aren't safe. A root-level user could just cut an AMI off an existing machine.
Take a look at the IAM policy generator. Writing JSON policies is not fun and error prone, but tools like that one will help you get most of the way there.