Uploading Multiple files in AWS S3 from terraform - amazon-s3

I want to upload multiple files to AWS S3 from a specific folder in my local device. I am running into the following error.
Here is my terraform code.
resource "aws_s3_bucket" "testbucket" {
bucket = "test-terraform-pawan-1"
acl = "private"
tags = {
Name = "test-terraform"
Environment = "test"
}
}
resource "aws_s3_bucket_object" "uploadfile" {
bucket = "test-terraform-pawan-1"
key = "index.html"
source = "/home/pawan/Documents/Projects/"
}
How can I solve this problem?

As of Terraform 0.12.8, you can use the fileset function to get a list of files for a given path and pattern. Combined with for_each, you should be able to upload every file as its own aws_s3_bucket_object:
resource "aws_s3_bucket_object" "dist" {
for_each = fileset("/home/pawan/Documents/Projects/", "*")
bucket = "test-terraform-pawan-1"
key = each.value
source = "/home/pawan/Documents/Projects/${each.value}"
# etag makes the file update when it changes; see https://stackoverflow.com/questions/56107258/terraform-upload-file-to-s3-on-every-apply
etag = filemd5("/home/pawan/Documents/Projects/${each.value}")
}
See terraform-providers/terraform-provider-aws : aws_s3_bucket_object: support for directory uploads #3020 on GitHub.
Note: This does not set metadata like content_type, and as far as I can tell there is no built-in way for Terraform to infer the content type of a file. This metadata is important for things like HTTP access from the browser working correctly. If that's important to you, you should look into specifying each file manually instead of trying to automatically grab everything out of a folder.

You are trying to upload a directory, whereas Terraform expects a single file in the source field. It is not yet supported to upload a folder to an S3 bucket.
However, you can invoke awscli commands using null_resource provisioner, as suggested here.
resource "null_resource" "remove_and_upload_to_s3" {
provisioner "local-exec" {
command = "aws s3 sync ${path.module}/s3Contents s3://${aws_s3_bucket.site.id}"
}
}

Since June 9, 2020, terraform has a built-in way to infer the content type (and a few other attributes) of a file which you may need as you upload to a S3 bucket
HCL format:
module "template_files" {
source = "hashicorp/dir/template"
base_dir = "${path.module}/src"
template_vars = {
# Pass in any values that you wish to use in your templates.
vpc_id = "vpc-abc123"
}
}
resource "aws_s3_bucket_object" "static_files" {
for_each = module.template_files.files
bucket = "example"
key = each.key
content_type = each.value.content_type
# The template_files module guarantees that only one of these two attributes
# will be set for each file, depending on whether it is an in-memory template
# rendering result or a static file on disk.
source = each.value.source_path
content = each.value.content
# Unless the bucket has encryption enabled, the ETag of each object is an
# MD5 hash of that object.
etag = each.value.digests.md5
}
JSON format:
{
"resource": {
"aws_s3_bucket_object": {
"static_files": {
"for_each": "${module.template_files.files}"
#...
}}}}
#...
}
Source: https://registry.terraform.io/modules/hashicorp/dir/template/latest

My objective was to make this dynamic, so whenever i create a folder in a directory, terraform automatically uploads that new folder and its contents into S3 bucket with the same key structure.
Heres how i did it.
First you have to get a local variable with a list of each Folder and the files under it. Then we can loop through that list to upload the source to S3 bucket.
Example: I have a folder called "Directories" with 2 sub folders called "Folder1" and "Folder2" each with their own files.
- Directories
- Folder1
* test_file_1.txt
* test_file_2.txt
- Folder2
* test_file_3.txt
Step 1: Get the local var.
locals{
folder_files = flatten([for d in flatten(fileset("${path.module}/Directories/*", "*")) : trim( d, "../") ])
}
Output looks like this:
folder_files = [
"Folder1/test_file_1.txt",
"Folder1/test_file_2.txt",
"Folder2/test_file_3.txt",
]
Step 2: dynamically upload s3 objects
resource "aws_s3_object" "this" {
for_each = { for idx, file in local.folder_files : idx => file }
bucket = aws_s3_bucket.this.bucket
key = "/Directories/${each.value}"
source = "${path.module}/Directories/${each.value}"
etag = "${path.module}/Directories/${each.value}"
}
This loops over the local var,
So in your S3 bucket, you will have uploaded in the same structure, the local Directory and its sub directories and files:
Directory
- Folder1
- test_file_1.txt
- test_file_2.txt
- Folder2
- test_file_3.txt

Related

How to Output Terraform Module Variable Names

I'm fairly new to Terraform and I have a question.
I have a bunch of terraform modules calling a main module to create a number of s3 buckets.
module "s3_1" {
source = "../modules/s3-arc"
ENVIRONMENT = var.ENV
bucket_name = var.s3_dep["one"]
}
module "s3_2" {
source = "../modules/s3-arc"
ENVIRONMENT = var.ENV
bucket_name = var.s3_dep["two"]
}
module "s3_3" {
source = "../modules/s3-arc"
ENVIRONMENT = var.ENV
bucket_name = var.s3_dep["three"]
}
It so happens that the policies are are being created separately, and so there appears to be a race condition resulting in a NoSuchBucket: The specified bucket does not exist error because the policies are being created first.
I feel like in order to resolve this, I need to add an explicit dependency using depends_on but I can't seem to figure out how to output the bucket names being created by modules s3-1, s3_2, and s3_3 so that I can add the depends_on under the policy section.
How do I output these bucket names please?
Inside your module you can declare an output value which returns some attribute of the S3 bucket, and optionally any other objects that contribute to the functionality of the bucket.
For example:
terraform {
required_providers {
aws = {
# I'm using resource types introduced in v4
# below, so we'll need at least that version.
source = "hashicorp/aws"
version = ">= 4.0.0"
}
}
}
variable "bucket_name" {
type = string
}
resource "aws_s3_bucket" "example" {
bucket = var.bucket_name
# ...
}
resource "aws_s3_bucket_acl" "example" {
bucket = aws_s3_bucket.example.bucket
acl = "private"
}
resource "aws_s3_bucket_versioning" "example" {
bucket = aws_s3_bucket.example.bucket
versioning_configuration {
status = "Enabled"
}
}
output "bucket" {
value = {
name = aws_s3_bucket.example.bucket
arn = aws_s3_bucket.example.arn
}
# The bucket won't be "ready to use" until
# these other resources are created, so
# these are "hidden dependencies" as described
# in the documentation for depends_on
depends_on = [
aws_s3_bucket_acl.example,
aws_s3_bucket_versioning.example,
]
}
Using depends_on with an output value means that any object which refers to this output value in the calling module indirectly depends on those other resources too, and so all three of the S3-related resources must be created completely before anything in the caller can make use of the S3 bucket.
When you separately declare the a policy for one of these buckets in the root module, you'd refer to the bucket name or ARN via the bucket output value, which therefore completes the necessary dependency edges to get a correct ordering:
module "s3_1" {
source = "../modules/s3-arc"
bucket_name = var.s3_dep["one"]
}
resource "aws_s3_bucket_policy" "example" {
# This reference to module.s3_1.bucket.name establishes
# the needed dependency relationships.
bucket = module.s3_1.bucket.name
policy = jsonencode({
# ...
})
}

List out files inside folder in S3 bucket using minio

I am trying to read files from S3 bucket using minio client.
https://docs.min.io/docs/java-client-quickstart-guide.html
I am able to make connection using this client and able to access the bucket also. Now, I need to access a file inside a folder in the bucket but I am not sure how to do it. I thought once I have access to the bucket, I can list out the file names using File library but not able to do it.
File path : s3 bucket endpoint/4275/input/test.csv
Code :
public void listS3BucketObject() {
MinioClient minioClient =
MinioClient.builder()
.endpoint(s3BucketEndpoint)
.credentials(s3BucketAccessKey, s3BucketSecretKey)
.build();
String fileUrl = s3BucketEndpoint + "/" + "4275" + "/" + "input";
File[] fileList = new File(fileUrl).listFiles();
for(File file : fileList) {
System.out.println("File name: "+file.getName()); // getting null exception here
To list a "folder" (called a prefix in S3 terms), use the listObjects call.
See this for an example: https://docs.min.io/docs/java-client-api-reference.html#listObjects

Terraform wants to replace existing resources

TF Version: 0.12.28 and 0.13.3
My Goal:
Have an AWS S3 bucket for PROD env to store tf state
Have an AWS S3 bucket for NONPROD env to store tf state
Following this tutorial I successfully accomplished the following:
a AWS S3 bucket and a dynamodb from a folder called TEST:
provider "aws" {
region = var.aws_region_id
}
resource "aws_s3_bucket" "terraform_state" {
bucket = var.aws_bucket_name
versioning {
enabled = true
}
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
}
resource "aws_dynamodb_table" "terraform_locks" {
name = var.aws_bucket_name
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
terraform {
backend "s3" {
bucket = "test-myproject-poc"
key = "global/s3/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "test-myproject-poc"
encrypt = true
}
}
Up to this point everything was successfully deployed
However when I wanted to have another S3 bucket/Dynamodb for PROD env the following happened:
I went to another folder called PRODUCTION, I did terraform init (initialization was ok)
copied the same module I have on PROD to this folder. And I renamed PROD with TEST to match the env
Terrarom plan now says it wants to replace my actual deployment to create the new one:
➜ S3 tf plan
Acquiring state lock. This may take a few moments...
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
aws_dynamodb_table.terraform_locks: Refreshing state... [id=test-myproject-poc]
aws_s3_bucket.terraform_state: Refreshing state... [id=test-myproject-poc]
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# aws_dynamodb_table.terraform_locks must be replaced
-/+ resource "aws_dynamodb_table" "terraform_locks" {
~ arn = "arn:aws:dynamodb:us-east-1:1234567890:table/test-myproject-poc" -> (known after apply)
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
~ id = "test-myproject-poc" -> (known after apply)
~ name = "test-myproject-poc" -> "prod-myproject-poc" # forces replacement
The state is actually on global/s3/terraform.tfstate
I'm not using workspaces
What is the proper way to create S3_PROD without deleting the first one?
I solved the issue! Just found out that I needed to remove this block:
terraform {
backend "s3" {
bucket = "test-myproject-poc"
key = "global/s3/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "test-myproject-poc"
encrypt = true
}
}
dropped .terraform folder and run init again.
After doing these steps, plan ran as expected (it didn't try to remove my deployment).
What I think, but not sure tough, is that it was trying to use the same state file previously deployed. So I just left tf to create the bucket and dynamo table to finally run the process of storing the new state of the new folder (PROD) in S3.
HTH

how to download zip file from aws s3 using terraform

i am working on terraform,i am facing issue in download the zip file from s3 to local using terraform.
creating the lambda function using zip file. Can any one please help on this.
I believe you can use the aws_s3_bucket_object data_source. This allows you to download the contents of an s3 bucket. Sample code snippet is shown below:
data "aws_s3_bucket_object" "secret_key" {
bucket = "awesomecorp-secret-keys"
key = "awesomeapp-secret-key"
}
resource "aws_instance" "example" {
## ...
provisioner "file" {
content = "${data.aws_s3_bucket_object.secret_key.body}"
}
}
Hope this helps!
I you want to create a lamdba function using a file in an S3 Bucket you can simply reference it in your ressource :
resource aws_lambda_function lambda {
function_name = "my_function"
s3_bucket = "some_bucket"
s3_key = "lambda.zip"
...
}

How do I download an S3 file only if it has changed?

I have a 900 MB file that I'd like to download to disk from S3 if it isn't already in place downloaded. Is there an easy way for me to only download the file if it isn't already in place? I know S3 supports querying MD5 checksum of files, but I'm hoping not to have to build this logic myself.
You can use AWS CLI's s3 sync command.
Syncs directories and S3 prefixes. Recursively copies new and updated files from the source directory to the destination.
According to this forum thread, you can use sync to synchronize only one file:
aws s3 sync s3://bucket/path/ local/path/ --exclude "*" --include "File.txt"
It says: sync the given paths, exclude all files, but include "File.txt" - so it will sync only "File.txt" under those given paths.
Or with the Java SDK:
According to the javadoc, there is a getObjectMetadata method which will return information about an S3 object (file) without downloading it's contents.
The method returns an ObjectMetadata object which can give you some useful information:
getLastModified method:
Gets the value of the Last-Modified header, indicating the date and time at which Amazon S3 last recorded a modification to the associated object.
getContentMD5 method:
Gets the base64 encoded 128-bit MD5 digest of the associated object (content - not including headers) according to RFC 1864.
getETag method:
Gets the hex encoded 128-bit MD5 digest of the associated object according to RFC 1864.
I have used below code to download S3 files which have timestamp greater than the local folder timestamp. First it's check if any of the files in S3 folder have timestamp greater than the local folder timestamp. If yes then download those files only.
TransferManager transferManager = TransferManagerBuilder.standard().build();
AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard().build();
Path location = Paths.get("/data/test/");
FileTime lastModifiedTime = null;
try {
lastModifiedTime = Files.getLastModifiedTime(location, LinkOption.NOFOLLOW_LINKS);
} catch (IOException e) {
e.printStackTrace();
}
Date lastUpdatedTime = new Date(lastModifiedTime.toMillis());
ObjectListing listing = amazonS3.listObjects("bucket", "test-folder");
List<S3ObjectSummary> summaries = listing.getObjectSummaries();
for (S3ObjectSummary os: summaries) {
if(os.getLastModified().after(lastUpdatedTime)) {
try {
String fileName="/data/test/"+os.getKey();
Download multipleFileDownload = transferManager.download(bucket, os.getKey(), new File(fileName));
while (multipleFileDownload.isDone() == false) {
Thread.sleep(1000);
}
}catch(InterruptedException i){
LOG.error("Exception Occurred while downloading the file ",i);
}
}
}