Load local CSV file into BigQuery table with Terraform? - google-bigquery

I'm new to terraform. Is it possible to load the content of a CSV file into a BigQuery table without uploading it to GCS?
I've studied the document below, but the solution doesn't seem to work on local files:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/bigquery_job
Question:
Is it possible somehow to do this without uploading the file into Google's environment?
resource "google_bigquery_table" "my_tyable" {
dataset_id = google_bigquery_dataset.bq_config_dataset.dataset_id
table_id = "my_tyable"
schema = file("${path.cwd}/path/to/schema.json")
}
resource "google_bigquery_job" "load_data" {
job_id = "load_data"
load {
source_uris = [
#"gs://cloud-samples-data/bigquery/us-states/us-states-by-date.csv", # this would work
"${path.cwd}/path/to/data.csv", # this is not working
]
destination_table {
project_id = google_bigquery_table.my_tyable.project
dataset_id = google_bigquery_table.my_tyable.dataset_id
table_id = google_bigquery_table.my_tyable.table_id
}
skip_leading_rows = 0
schema_update_options = ["ALLOW_FIELD_RELAXATION", "ALLOW_FIELD_ADDITION"]
write_disposition = "WRITE_APPEND"
autodetect = true
}
}

I was trying this in my own project and I don't think it is possible based on the error message I am seeing:
│ Error: Error creating Job: googleapi: Error 400: Source URI must be a Google Cloud Storage location: [REDACTED].csv, invalid
│
│ with module.[REDACTED].google_bigquery_job.load_data,
│ on modules\[REDACTED]\main.tf line 73, in resource "google_bigquery_job" "load_data":
│ 73: resource "google_bigquery_job" "load_data" {
│
Ended up putting the CSV file into the same bucket as the Terraform state with prefix data/

Probably best option is to load it using file function
file("${path.module}/data.csv")

Related

how to same variable with different values in different env in terraform module

I am using the terraform s3 module https://github.com/terraform-aws-modules/terraform-aws-s3-bucket. I created a wrapper module around this module for creating a s3 bucket. Currently there are 3 different aws account where in this s3 modules changes should be used for creating the buckets. Since there are 3 different accounts, the grant and the owner values are different for each of the account. Currently, I have hardcoded these value for the buckets that I create
s3_bucket.tf
module "sample_bucket" {
source = "../../../../modules/aws/data/s3_bucket"
bucket = "sample_bucket"
lifecycle_rule = [
rule here
]
}
>../../../../modules/aws/data/s3_bucket/main.tf file
module "s3_bucket" {
source = "terraform-aws-modules/s3-bucket/aws"
version = "3.6.0"
bucket = var.bucket
attach_public_policy = var.attach_public_policy
server_side_encryption_configuration = var.server_side_encryption_configuration
grant = var.grant
owner = var.owner
cors_rule = var.cors_rule
lifecycle_rule = var.lifecycle_rule
tags = var.tags
versioning = var.versioning
replication_configuration = var.replication_configuration
force_destroy = var.force_destroy
}
> ../../../../modules/aws/data/s3_bucket/variable.tf file
variable "grant" {
description = "An ACL policy grant. Conflicts with `acl`"
type = any
default = []
}
variable "owner" {
description = "Bucket owner's display name and ID. Conflicts with `acl`"
type = map(string)
default = {}
}
I want to use the grant and the owner variable from the module file in the main.tf file and not hard code these values for each of the accounts in s3_bucket.tf for the buckets I create. Can someone help me here on how to use the same variable grant and onwer with different values for each of the account
You can use Terraform Workspaces for different environments. An example of how you could use workspaces:
module "sample_bucket" {
source = "../../../../modules/aws/data/s3_bucket"
bucket = "sample_bucket"
grant = "${var.grant}-${terraform.workspace}"
owner = "${var.owner}-${terraform.workspace}"
lifecycle_rule = [
rule here
]
}
P.S You might have to handle grant and owner differently from the above code since they are list and map
Update:
In case of different folder for each environment, you can define a separate variables.tfvars and pass it in as per the need. Read more about Input Variables here
Within each folder, you can run and terraform apply/plan and it will automatically detect your terraform.tfvars file.
s3_bucket
│ main.tf
└───dev
│ │ terraform.tfvars
│ │
└───stg
│ │ terraform.tfvars
│ │
└───prod
│ │ terraform.tfvars
│ │

I am trying to add custom validation for variables in my terraform script using map but i am facing error

I am trying to add custom validation for the variables in my terraform script for S3 bucket. But i am facing an error that is mentioned as below:
Reference to undeclared input variable
on main.tf line 2, in resource "aws_s3_bucket" "gouth_bucket_1_apr_2021":
2: bucket = var.bucket #"terraform-s3-bucket"
An input variable with the name "bucket" has not been declared. This variable
can be declared with a variable "bucket" {} block."
Can anyone help me on the same.please let me know which file needs the necessary changes and how.
Thanks in Advance
Below is my code :
main.tf :
resource "aws_s3_bucket" "gouth_bucket_1_apr_2021" {
bucket = var.bucket
acl = "private"
tags= var.tags
}
s3.tfvars :
bucket = "first-bucket-gouth"
#Variables of Tags
tags= {
name = "s3bucket",
account_id = "1234567",
owner = "abc#def.com",
os= "windows",
backup = "N",
application = "abc",
description = "s3 bucket",
env = "dev",
ticketid = "101",
marketami = "NA",
patching = "NA",
dc = "bangalore"
}
validation.tf :
variable "tags" {
type = map(string)
validation {
condition = length(var.tags["env"]) > 0
error_message = "Environment tag is required !!"
}
validation {
condition = length(var.tags["owner"]) > 0
error_message = "Owner tag is required !!"
}
validation {
condition = length(var.tags["dc"]) > 0
error_message = "DC tag is required !!"
}
validation {
condition = can(var.tags["account_id"])
error_message = "Acoount ID tag is required!!"
}
}
I can see two potential issues.
You are referencing var.bucket in your resource, but you are not defining a variable for it anywhere in your definition. This could simply look like:
variable "bucket" {}
You may not be picking up your tfvars file, if you are running Terraform with the tfvars file as an option like so terraform plan -var-file=s3.tfvars then thats ok, or you can rename your tfvars file to something.auto.tfvars or terraform.tfvars to get automatically used. (See > https://www.terraform.io/docs/language/values/variables.html#variable-definitions-tfvars-files)
I hope this answers your question.

Self link modules in terraform

I have the following terraform code snippet where I'm trying to use a self_link in the subnet.network resource that references the title of the network resource.
main.tf
resource "google_compute_network" "demo-vpc-network" {
auto_create_subnetworks = "false"
delete_default_routes_on_create = "false"
name = var.GCP_COMPUTE_NETWORK_NAME
project = var.GCP_PROJECT_NAME
routing_mode = "REGIONAL"
}
resource "google_compute_subnetwork" "demo-subnet" {
ip_cidr_range = "10.200.0.0/24"
name = "kubernetes"
network = google_compute_network.vpc_network.self.link
private_ip_google_access = "false"
project = var.GCP_PROJECT_NAME
region = "us-west1"
}
However, I get the following error.
Error: Reference to undeclared resource
on main.tf line 77, in resource "google_compute_subnetwork" "demo-subnet":
77: network = google_compute_network.vpc_network.self.link
A managed resource "google_compute_network" "vpc_network" has not been
declared in the root module.
google_compute_network.vpc_network.self.link
won't work because google_compute_network.vpc_network doesn't exist.
It's easy to fix because google_compute_network.demo-vpc-network does exist.
Update: Also, as you've noted in your comment self-link (with a hyphen) won't work and needs to be self_link (with an underscore).
Here's the second resource block with the bug fixed:
resource "google_compute_subnetwork" "demo-subnet" {
ip_cidr_range = "10.200.0.0/24"
name = "kubernetes"
network = google_compute_network.demo-vpc-network.self.link
private_ip_google_access = "false"
project = var.GCP_PROJECT_NAME
region = "us-west1"
}
That's because the resource for the main network is:
resource "google_compute_network" "vpc_network"
Then you could set a name for it with the property:
name = demo-vpc-network
Check here for more details

Workaround for `count.index` in Terraform Module

I need a workaround for using count.index inside a module block for some input variables. I have a habit of over-complicating problems, so maybe there's a much easier solution.
File/Folder Structure:
modules/
main.tf
ignition/
main.tf
modules/
files/
main.tf
template_files/
main.tf
End Goal: Create an Ignition file for each instance I'm deploying. Each Ignition file has instance-specific info like hostname, IP address, etc.
All of this code works if I use a static value or a variable without cound.index. I need help coming up with a workaround for the address, gateway, and hostname variables specifically. If I need to process the count.index inside one of the child modules, that's totally fine. I can't seem to wrap my brain around that though. I've tried null_data_source and null_resource blocks from the child modules to achieve that, but so far no luck.
Variables:
workers = {
Lab1 = {
"lab1k8sc8r001" = "192.168.17.100/24"
}
Lab2 = {
"lab2k8sc8r001" = "192.168.18.100/24"
}
}
gateway = {
Lab1 = [
"192.168.17.1",
]
Lab2 = [
"192.168.18.1",
]
}
From modules/main.tf, I'm calling the ignition module:
module "ignition_workers" {
source = "./modules/ignition"
virtual_machines = var.workers[terraform.workspace]
ssh_public_keys = var.ssh_public_keys
files = [
"files_90-disable-auto-updates.yaml",
"files_90-disable-console-logs.yaml",
]
template_files = {
"files_eth0.nmconnection.yaml" = {
interface-name = "eth0",
address = element(values(var.workers[terraform.workspace]), count.index),
gateway = element(var.gateway, count.index % length(var.gateway)),
dns = join(";", var.dns_servers),
dns-search = var.domain,
}
"files_etc_hostname.yaml" = {
hostname = element(keys(var.workers[terraform.workspace]), count.index),
}
"files_chronyd.yaml" = {
ntp_server = var.ntp_server,
}
}
}
From modules/ignition/main.tf I take the files and template_files variables to build the Ignition config:
module "ingition_file_snippets" {
source = "./modules/files"
files = var.files
}
module "ingition_template_file_snippets" {
source = "./modules/template_files"
template_files = var.template_files
}
data "ct_config" "fedora-coreos-config" {
count = length(var.virtual_machines)
content = templatefile("${path.module}/assets/files_ssh_authorized_keys.yaml", {
ssh_public_keys = var.ssh_public_keys
})
pretty_print = true
snippets = setunion(values(module.ingition_file_snippets.files), values(module.ingition_template_file_snippets.files))
}
I am not quite sure what you are trying to achieve so I can not give any detailed examples.
But modules in terraform do not support count or for_each yet. So you can also not use count.index.
You might want to change your module to take lists/maps of input and create those lists/maps via for-expressions by transforming them from some input variables.
You can combine for with if to create a filtered subset of your source list/map. Like in:
[for s in var.list : upper(s) if s != ""]
I hope this helps you work around the missing count support.

Terraform 0.12: Output list of buckets, use as input for another module and iterate

I'm using Tf 0.12. I have an s3 module that outputs a list of buckets, that I would like to use as an input for a cloudfront module that I've got.
The problem I'm facing is that when I do terraform plan/apply I get the following error count.index is 0 |var.redirect-buckets is tuple with 1 element
I've tried all kinds of splats moving the count.index call around to no avail. My sample code is below.
module.s3
resource "aws_s3_bucket" "redirect" {
count = length(var.redirects)
bucket = element(var.redirects, count.index)
}
mdoule.s3.output
output "redirect-buckets" {
value = [aws_s3_bucket.redirect.*]
}
module.cdn.variables
...
variable "redirect-buckets" {
description = "Redirect buckets"
default = []
}
....
The error is thrown down here
module.cdn
resource "aws_cloudfront_distribution" "redirect" {
count = length(var.redirect-buckets)
default_cache_behavior {
// Line below throws the error, one amongst many
target_origin_id = "cloudfront-distribution-origin-${var.redirect-buckets[count.index]}.s3.amazonaws.com"
....
//Another error throwing line
target_origin_id = "cloudfront-distribution-origin-${var.redirect-buckets[count.index]}.s3.amazonaws.com"
Any help is greatly appreciated.
module.s3
resource "aws_s3_bucket" "redirects" {
for_each = var.redirects
bucket = each.value
}
Your variable definition for redirects needs to change to something like this:
variable "redirects" {
type = map(string)
}
module.s3.output:
output "redirect_buckets" {
value = aws_s3_bucket.redirects
}
module.cdn
resource "aws_cloudfront_distribution" "redirects" {
for_each = var.redirect_buckets
default_cache_behavior {
target_origin_id = "cloudfront-distribution-origin-${each.value.id}.s3.amazonaws.com"
}
Your variable definition for redirect-buckets needs to change to something like this (note underscores, using skewercase is going to behave strangely in some cases, not worth it):
variable "redirect_buckets" {
type = map(object(
{
id = string
}
))
}
root module
module "s3" {
source = "../s3" // or whatever the path is
redirects = {
site1 = "some-bucket-name"
site2 = "some-other-bucket"
}
}
module "cdn" {
source = "../cdn" // or whatever the path is
redirects_buckets = module.s3.redirect_buckets
}
From an example perspective, this is interesting, but you don't need to use outputs from S3 here since you could just hand the cdn module the same map of redirects and use for_each on those.
There is a tool called Terragrunt which wraps Terraform and supports dependencies.
https://terragrunt.gruntwork.io/docs/features/execute-terraform-commands-on-multiple-modules-at-once/#dependencies-between-modules