Remote-exec provisioner on gcp not connecting with host - ssh

I'm trying to use remote-exec provisioner for a use-case related to my project on GCP using Terraform version 12, based on the format specified in terraform docs I get a known hosts key mismatch error after the provisioner timesout.
resource "google_compute_instance" "secondvm" {
name = "secondvm"
machine_type = "n1-standard-1"
zone = "us-central1-a"
boot_disk {
initialize_params {
image = "centos-7-v20190905"
}
}
network_interface {
network = "default"
access_config {
nat_ip = google_compute_address.second.address
network_tier = "PREMIUM"
}
}
#metadata = {
#ssh-keys = "root:${file("~/.ssh/id_rsa.pub")}"
#}
metadata_startup_script = "cd /; touch makefile.txt; sudo echo \"string xyz bgv\" >>./makefile.txt"
provisioner "remote-exec" {
inline = [
"sudo sed -i 's/xyz/google_compute_address.first.address/gI' /makefile.txt"
]
connection {
type = "ssh"
#port = 22
host = self.network_interface[0].access_config[0].nat_ip
user = "root"
timeout = "120s"
#agent = false
private_key = file("~/.ssh/id_rsa")
#host_key = file("~/.ssh/google_compute_engine.pub")
host_key = file("~/.ssh/id_rsa.pub")
}
}
depends_on = [google_compute_address.second]
}
I'm not sure what exactly I'm doing wrong with the keys here but the error I get is
google_compute_instance.secondvm: Still creating... [2m10s elapsed]
google_compute_instance.secondvm (remote-exec): Connecting to remote host via SSH...
google_compute_instance.secondvm (remote-exec): Host: 104.155.186.128
google_compute_instance.secondvm (remote-exec): User: root
google_compute_instance.secondvm (remote-exec): Password: false
google_compute_instance.secondvm (remote-exec): Private key: true
google_compute_instance.secondvm (remote-exec): Certificate: false
google_compute_instance.secondvm (remote-exec): SSH Agent: false
google_compute_instance.secondvm (remote-exec): Checking Host Key: true
google_compute_instance.secondvm: Still creating... [2m20s elapsed]
Error: timeout - last error: SSH authentication failed (root#104.155.186.128:22): ssh: handshake failed: knownhosts: key mismatch

Related

Does a terraform connection w/ bastion work similarly to "ssh -J"?

I am able to connect through an existing jump server using ssh:
ssh -o "CertificateFile ~/.ssh/id_rsa-cert.pub" -J <JUMP_USER>#<JUMP_HOST> <USER>#<HOST> echo connected
And I thought this connection block would work the same way:
resource "null_resource" "connect" {
connection {
type = "ssh"
port = 22
host = "<HOST>"
user = "<USER>"
bastion_port = 22
bastion_host = "<JUMP_HOST>"
bastion_user = "<JUMP_USER>"
bastion_certificate = "~/.ssh/id_rsa-cert.pub"
agent = true
timeout = "30s"
}
provisioner "remote-exec" {
inline = [ "echo connected" ]
}
}
The result of the ssh command was successful
% ssh -o "CertificateFile ~/.ssh/id_rsa-cert.pub" -J $BASTION_USER#$BASTION_HOST $USER#$HOST echo connected
connected
But the result of the terraform appears to be a retry loop:
% terraform apply
[...]
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
null_resource.connect: Destroying... [id=5962206430386145659]
null_resource.connect: Destruction complete after 0s
null_resource.connect: Creating...
null_resource.connect: Provisioning with 'remote-exec'...
null_resource.connect (remote-exec): Connecting to remote host via SSH...
null_resource.connect (remote-exec): Host: 3.87.64.117
null_resource.connect (remote-exec): User: <USER>
null_resource.connect (remote-exec): Password: false
null_resource.connect (remote-exec): Private key: false
null_resource.connect (remote-exec): Certificate: false
null_resource.connect (remote-exec): SSH Agent: true
null_resource.connect (remote-exec): Checking Host Key: false
null_resource.connect (remote-exec): Target Platform: unix
null_resource.connect (remote-exec): Using configured bastion host...
null_resource.connect (remote-exec): Host: <JUMP_HOST>
null_resource.connect (remote-exec): User: <JUMP_USER>
null_resource.connect (remote-exec): Password: false
null_resource.connect (remote-exec): Private key: false
null_resource.connect (remote-exec): Certificate: true
null_resource.connect (remote-exec): SSH Agent: true
null_resource.connect (remote-exec): Checking Host Key: false
The Connecting to remote host via SSH... and Using configured bastion host... repeat until
null_resource.connect: Still creating... [20s elapsed]
null_resource.connect: Still creating... [30s elapsed]
╷
│ Error: remote-exec provisioner error
│
│ with null_resource.connect,
│ on t.tf line 15, in resource "null_resource" "connect":
│ 15: provisioner "remote-exec" {
│
│ timeout - last error: Error connecting to bastion: ssh: handshake failed: ssh: unable to authenticate,
│ attempted methods [none publickey], no supported methods remain
I've tried a bunch of permutations without success:
setting certificate instead of bastion_certificate
using file("~/.ssh/id_rsa-cert.pub") for certificate and bastion_certificate
setting agent = false

Can't ping Terraform created droplets with Ansible

Using Terraform I have created 3 droplets on DigitalOcean. While doing it, in folder I have been writing SSH key and creating inventory.txt file.
Here is how it look in Terraform code:
resource "local_file" "servers_ipv4" {
content = join("\n", [
for idx, s in module.openvpn_do_infrastructure_module.servers_ipv4:
<<EOT
${var.droplet_names[idx]} ansible_host=${s} ansible_user=root ansible_ssh_private_key=openvpn_do_ssh.key
EOT
])
filename = "${path.module}/ansible/inventory.txt"
}
resource "local_file" "ssh_keys" {
content = module.openvpn_do_infrastructure_module.ssh_keys
filename = "${path.module}/ansible/openvpn_do_ssh.key"
}
Then, I have ansible folder. After execution of the script and creating droplets in this folder I have 3 files. First file, is just ansible.cfg:
[defaults]
host_key_checking = false
inventory = ./inventory.txt
The other 2 are created by Terraform. It's SSH key - openvpn_do_ssh.key and inventory.txt:
certificate-authority-server ansible_host=123.123.123.121 ansible_user=root ansible_ssh_private_key=openvpn_do_ssh.key
openvpn-server ansible_host=123.123.123.122 ansible_user=root ansible_ssh_private_key=openvpn_do_ssh.key
nextcloud-server ansible_host=123.123.123.123 ansible_user=root ansible_ssh_private_key=openvpn_do_ssh.key
And here is the problem. When I do ansible all -m ping, I get errors:
certificate-authority-server | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: root#123.123.123.121: Permission denied (publickey).",
"unreachable": true
}
nextcloud-server | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: root#123.123.123.122: Permission denied (publickey).",
"unreachable": true
}
openvpn-server | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: root#123.123.123.123: Permission denied (publickey).",
"unreachable": true
}
Also, I can connect to those droplets with SSH and everything is just fine. Even when I change permission to .key file, I still have the same error. I was trying to get more logs with -vvv flags, and here is the most interesting info I found:
ESTABLISH SSH CONNECTION FOR USER: root
...
<123.123.123.121> (255, b'', b"Warning: Permanently added '123.123.123.121' (ED25519) to the list of known hosts.\r\nroot#123.123.123.121: Permission denied (publickey).\r\n")
<123.123.123.121> (255, b'', b'root#123.123.123.121: Permission denied (publickey).\r\n')
I have solved this problem. This is what has helped me:
First of all, I have changed the extension of SSH key file from .key to .pem.
To ansible.cfg I have added next line:
[defaults]
host_key_checking = false
inventory = ./inventory.txt
inventory = ./inventory.txt
private_key_file = ./openvpn_do_ssh.pem
The last thing I have done, is adding read-only file_permission for SSH key.
resource "local_file" "ssh_keys" {
content = module.openvpn_do_infrastructure_module.ssh_keys
filename = "${path.module}/ansible/openvpn_do_ssh.pem"
content = module.openvpn_do_infrastructure_module.ssh_keys
filename = "${path.module}/ansible/openvpn_do_ssh.pem"
file_permission = "0400"
}
Hope it can help someone...

How to use passphrase protected private ssh key in terraform?

I am following this tutorial, https://www.digitalocean.com/community/tutorials/how-to-use-ansible-with-terraform-for-configuration-management, to learn Terraform and Ansible.
When I execute terraform apply, it throws an error:
digitalocean_droplet.web[2]: Provisioning with 'remote-exec'...
Error: Failed to parse ssh private key: ssh: this private key is passphrase protected
Error: Error creating droplet: POST https://api.digitalocean.com/v2/droplets: 422 Failed to resolve VPC
on droplets.tf line 1, in resource "digitalocean_droplet" "web":
1: resource "digitalocean_droplet" "web" {
This is the code:
provisioner "remote-exec" {
inline = ["sudo apt update", "sudo apt install python3 -y", "echo DONE!"]
connection {
host = self.ipv4_address
type = "ssh"
user = "root"
private_key = file(var.pvt_key)
}
}
That private SSH key (~/.ssh/id_rsa) on my machine is passphrase protected. How do I use it?
You can add the desired ssh key to the ssh-agent with ssh-add ~/.ssh/id_rsa and then set the agent field in connection stanza to:
connection {
host = self.ipv4_address
type = "ssh"
user = "root"
agent = true
}

Terraform remote-exec on windows with ssh

I have setup a Windows server and installed ssh using Chocolatey. If I run this manually I have no problems connecting and running my commands. When I try to use Terraform to run my commands it connects successfully but doesn't run any commands.
I started by using winrm and then I could run commands but due to some problem with creating a service fabric cluster over winrm I decided to try using ssh instead and when running things manually it worked and the cluster went up. So that seems to be the way forward.
I have setup a Linux VM and got ssh working by using the private key. So I have tried to use the same config as I did with the Linux VM on the Windows but it still asked me to use my password.
What could the reason be for being able to run commands over ssh manually and using Terraform only connect but no commands are run? I am running this on OpenStack with Windows 2016
null_resource.sf_cluster_install (remote-exec): Connecting to remote host via SSH...
null_resource.sf_cluster_install (remote-exec): Host: 1.1.1.1
null_resource.sf_cluster_install (remote-exec): User: Administrator
null_resource.sf_cluster_install (remote-exec): Password: true
null_resource.sf_cluster_install (remote-exec): Private key: false
null_resource.sf_cluster_install (remote-exec): SSH Agent: false
null_resource.sf_cluster_install (remote-exec): Checking Host Key: false
null_resource.sf_cluster_install (remote-exec): Connected!
null_resource.sf_cluster_install: Creation complete after 4s (ID: 5017581117349235118)
Here is the script im using to run the commands:
resource "null_resource" "sf_cluster_install" {
# count = "${local.sf_count}"
depends_on = ["null_resource.copy_sf_package"]
# Changes to any instance of the cluster requires re-provisioning
triggers = {
cluster_instance_ids = "${openstack_compute_instance_v2.sf_servers.0.id}"
}
connection = {
type = "ssh"
host = "${openstack_networking_floatingip_v2.sf_floatIP.0.address}"
user = "Administrator"
# private_key = "${file("~/.ssh/id_rsa")}"
password = "${var.admin_pass}"
}
provisioner "remote-exec" {
inline = [
"echo hello",
"powershell.exe Write-Host hello",
"powershell.exe New-Item C:/tmp/hello.txt -type file"
]
}
}
Put the connection block inside the provisioner block:
provisioner "remote-exec" {
connection = {
type = "ssh"
...
}
inline = [
"echo hello",
"powershell.exe Write-Host hello",
"powershell.exe New-Item C:/tmp/hello.txt -type file"
]
}

Sometimes cannot connect via SSH to to GCP VM for a certain amount of time

I am setting up a project with Terraform and Google Compute. In this project I start up multiple VMs and configure them directly afterwards via SSH. Sometimes there is a single or multiple VMs to which I cannot connect via SSH with my usual account. The problem magically disappears after approximately 5 minutes, even if I do not do anything. After this time everything works normally again. I am however able to SSH into the instance with the web interface during the down time.
I am not able to reliably reproduce this issue. It just magically happens sometimes to a random amount of VMs for about 5 minutes.
I am pretty lost on this and would really appreciate any pointer as to where I might find a solution.
Here is a short summary of the problem:
Cannot connect to GCP VM via SSH with predefined user
Only happens sometimes (Issue is not reliably reproducable)
Only lasts for a few minutes (~5 minutes)
During this time I can SSH into the VM via GCPs web interface
This is the Terraform code I am using to start the instances:
module.google:
variable "project"{}
variable "credentials"{}
variable "count"{default = 1}
variable "name_machine"{}
variable "zone"{}
provider "google" {
credentials = "${var.credentials}"
project = "${var.project}"
}
resource "google_compute_instance" "vm" {
count = "${var.count}"
zone = "${var.zone}"
name = "${var.name_machine}${count.index}"
machine_type = "n1-standard-1"
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-1604-lts"
}
}
network_interface {
network = "default"
access_config {
}
}
}
EDIT The code I use to SSH into the instance.
resource "null_resource" "node"{
provisioner "remote-exec" {
inline="${data.template_file.start_up_script.rendered}"
}
connection {
user = "${var.ssh_user}"
host = "${var.ip_address}"
type = "ssh"
private_key="${var.ssh_private_key}"
}
}
EDIT 2 Terraform Output
null_resource.node (remote-exec): Connecting to remote host via SSH...
null_resource.node (remote-exec): Host: xx.xx.xx.xx
null_resource.node (remote-exec): User: Nopx
null_resource.node (remote-exec): Password: false
null_resource.node (remote-exec): Private key: true
null_resource.node (remote-exec): SSH Agent: false
null_resource.node: Still creating... (10s elapsed)
null_resource.node (remote-exec): Connecting to remote host via SSH...
null_resource.node (remote-exec): Host: xx.xx.xx.xx
null_resource.node (remote-exec): User: Nopx
null_resource.node (remote-exec): Password: false
null_resource.node (remote-exec): Private key: true
null_resource.node (remote-exec): SSH Agent: false
null_resource.node (remote-exec): Connecting to remote host via SSH...
null_resource.node (remote-exec): Host: xx.xx.xx.xx
null_resource.node (remote-exec): User: Nopx
null_resource.node (remote-exec): Password: false
null_resource.node (remote-exec): Private key: true
null_resource.node (remote-exec): SSH Agent: false
null_resource.node: Still creating... (20s elapsed)
null_resource.node (remote-exec): Connecting to remote host via SSH...
null_resource.node (remote-exec): Host: xx.xx.xx.xx
null_resource.node (remote-exec): User: Nopx
null_resource.node (remote-exec): Password: false
null_resource.node (remote-exec): Private key: true
null_resource.node (remote-exec): SSH Agent: false
...