Does a terraform connection w/ bastion work similarly to "ssh -J"? - ssh

I am able to connect through an existing jump server using ssh:
ssh -o "CertificateFile ~/.ssh/id_rsa-cert.pub" -J <JUMP_USER>#<JUMP_HOST> <USER>#<HOST> echo connected
And I thought this connection block would work the same way:
resource "null_resource" "connect" {
connection {
type = "ssh"
port = 22
host = "<HOST>"
user = "<USER>"
bastion_port = 22
bastion_host = "<JUMP_HOST>"
bastion_user = "<JUMP_USER>"
bastion_certificate = "~/.ssh/id_rsa-cert.pub"
agent = true
timeout = "30s"
}
provisioner "remote-exec" {
inline = [ "echo connected" ]
}
}
The result of the ssh command was successful
% ssh -o "CertificateFile ~/.ssh/id_rsa-cert.pub" -J $BASTION_USER#$BASTION_HOST $USER#$HOST echo connected
connected
But the result of the terraform appears to be a retry loop:
% terraform apply
[...]
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
null_resource.connect: Destroying... [id=5962206430386145659]
null_resource.connect: Destruction complete after 0s
null_resource.connect: Creating...
null_resource.connect: Provisioning with 'remote-exec'...
null_resource.connect (remote-exec): Connecting to remote host via SSH...
null_resource.connect (remote-exec): Host: 3.87.64.117
null_resource.connect (remote-exec): User: <USER>
null_resource.connect (remote-exec): Password: false
null_resource.connect (remote-exec): Private key: false
null_resource.connect (remote-exec): Certificate: false
null_resource.connect (remote-exec): SSH Agent: true
null_resource.connect (remote-exec): Checking Host Key: false
null_resource.connect (remote-exec): Target Platform: unix
null_resource.connect (remote-exec): Using configured bastion host...
null_resource.connect (remote-exec): Host: <JUMP_HOST>
null_resource.connect (remote-exec): User: <JUMP_USER>
null_resource.connect (remote-exec): Password: false
null_resource.connect (remote-exec): Private key: false
null_resource.connect (remote-exec): Certificate: true
null_resource.connect (remote-exec): SSH Agent: true
null_resource.connect (remote-exec): Checking Host Key: false
The Connecting to remote host via SSH... and Using configured bastion host... repeat until
null_resource.connect: Still creating... [20s elapsed]
null_resource.connect: Still creating... [30s elapsed]
╷
│ Error: remote-exec provisioner error
│
│ with null_resource.connect,
│ on t.tf line 15, in resource "null_resource" "connect":
│ 15: provisioner "remote-exec" {
│
│ timeout - last error: Error connecting to bastion: ssh: handshake failed: ssh: unable to authenticate,
│ attempted methods [none publickey], no supported methods remain
I've tried a bunch of permutations without success:
setting certificate instead of bastion_certificate
using file("~/.ssh/id_rsa-cert.pub") for certificate and bastion_certificate
setting agent = false

Related

Vagrant multi vm ssh connection setup works on one but not the others

I have searched many of the similar issues but can't seem to figure out the one I'm having. I have a Vagrantfile with which I setup 3 VMs. I add a public key to each VM so I can run Ansible against the boxes after vagrant up command (I don't want to use the ansible provisioner). I forward all the SSH ports on each box.
I can vagrant ssh <server_name> on to each box successfully.
With the following:
ssh vagrant#192.168.56.2 -p 2711 -i ~/.ssh/ansible <-- successful connection
ssh vagrant#192.168.56.3 -p 2712 -i ~/.ssh/ansible <-- connection error
ssh: connect to host 192.168.56.3 port 2712: Connection refused
ssh vagrant#192.168.56.4 -p 2713 -i ~/.ssh/ansible <-- connection error
ssh: connect to host 192.168.56.4 port 2713: Connection refused
And
ssh vagrant#localhost -p 2711 -i ~/.ssh/ansible <-- successful connection
ssh vagrant#localhost -p 2712 -i ~/.ssh/ansible <-- successful connection
ssh vagrant#localhost -p 2713 -i ~/.ssh/ansible <-- successful connection
Ansible can connect to the first one (vagrant#192.168.56.2) but not the other 2 also. I can't seem to find out why it connects to one and not the others. Any ideas what I could be doing wrong?
The Ansible inventory:
{
"all": {
"hosts": {
"kubemaster": {
"ansible_host": "192.168.56.2",
"ansible_user": "vagrant",
"ansible_ssh_port": 2711
},
"kubenode01": {
"ansible_host": "192.168.56.3",
"ansible_user": "vagrant",
"ansible_ssh_port": 2712
},
"kubenode02": {
"ansible_host": "192.168.56.4",
"ansible_user": "vagrant",
"ansible_ssh_port": 2713
}
},
"children": {},
"vars": {}
}
}
The Vagrantfile:
# Define the number of master and worker nodes
NUM_MASTER_NODE = 1
NUM_WORKER_NODE = 2
PRIV_IP_NW = "192.168.56."
MASTER_IP_START = 1
NODE_IP_START = 2
# Vagrant configuration
Vagrant.configure("2") do |config|
# The most common configuration options are documented and commented below.
# For a complete reference, please see the online documentation at
# https://docs.vagrantup.com.
# default box
config.vm.box = "ubuntu/jammy64"
# automatic box update checking.
config.vm.box_check_update = false
# Provision master nodes
(1..NUM_MASTER_NODE).each do |i|
config.vm.define "kubemaster" do |node|
# Name shown in the GUI
node.vm.provider "virtualbox" do |vb|
vb.name = "kubemaster"
vb.memory = 2048
vb.cpus = 2
end
node.vm.hostname = "kubemaster"
node.vm.network :private_network, ip: PRIV_IP_NW + "#{MASTER_IP_START + i}"
node.vm.network :forwarded_port, guest: 22, host: "#{2710 + i}"
# argo and traefik access
node.vm.network "forwarded_port", guest: 8080, host: "#{8080}"
node.vm.network "forwarded_port", guest: 9000, host: "#{9000}"
# synced folder for kubernetes setup yaml
node.vm.synced_folder "sync_folder", "/vagrant_data", create: true, owner: "root", group: "root"
node.vm.synced_folder ".", "/vagrant", disabled: true
# setup the hosts, dns and ansible keys
node.vm.provision "setup-hosts", :type => "shell", :path => "vagrant/setup-hosts.sh" do |s|
s.args = ["enp0s8"]
end
node.vm.provision "setup-dns", type: "shell", :path => "vagrant/update-dns.sh"
node.vm.provision "shell" do |s|
ssh_pub_key = File.readlines("#{Dir.home}/.ssh/ansible.pub").first.strip
s.inline = <<-SHELL
echo #{ssh_pub_key} >> /home/vagrant/.ssh/authorized_keys
echo #{ssh_pub_key} >> /root/.ssh/authorized_keys
SHELL
end
end
end
# Provision Worker Nodes
(1..NUM_WORKER_NODE).each do |i|
config.vm.define "kubenode0#{i}" do |node|
node.vm.provider "virtualbox" do |vb|
vb.name = "kubenode0#{i}"
vb.memory = 2048
vb.cpus = 2
end
node.vm.hostname = "kubenode0#{i}"
node.vm.network :private_network, ip: PRIV_IP_NW + "#{NODE_IP_START + i}"
node.vm.network :forwarded_port, guest: 22, host: "#{2711 + i}"
# synced folder for kubernetes setup yaml
node.vm.synced_folder ".", "/vagrant", disabled: true
# setup the hosts, dns and ansible keys
node.vm.provision "setup-hosts", :type => "shell", :path => "vagrant/setup-hosts.sh" do |s|
s.args = ["enp0s8"]
end
node.vm.provision "setup-dns", type: "shell", :path => "vagrant/update-dns.sh"
node.vm.provision "shell" do |s|
ssh_pub_key = File.readlines("#{Dir.home}/.ssh/ansible.pub").first.strip
s.inline = <<-SHELL
echo #{ssh_pub_key} >> /home/vagrant/.ssh/authorized_keys
echo #{ssh_pub_key} >> /root/.ssh/authorized_keys
SHELL
end
end
end
end
Your Vagrantfile confirms what I suspected:
You define port forwarding as follows:
node.vm.network :forwarded_port, guest: 22, host: "#{2710 + i}"
That means, port 22 of the guest is made reachable on the host under port 2710+i. For your 3 VMs, from the host's point of view, this means:
192.168.2.1:22 -> localhost:2711
192.168.2.2:22 -> localhost:2712
192.168.2.3:22 -> localhost:2713
As IP addresses for your VMs you have defined the range 192.168.2.0/24, but you try to access the range 192.168.56.0/24.
If a Private IP address is defined (for your 1st node e.g. 192.168.2.2), Vagrant implements this in the VM on VirtualBox as follows:
Two network adapters are defined for the VM:
NAT: this gives the VM Internet access
Host-Only: this gives the host access to the VM via IP 192.168.2.2.
For each /24 network, VirtualBox (and Vagrant) creates a separate VirtualBox Host-Only Ethernet Adapter, and the host is .1 on each of these networks.
What this means for you is that if you use an IP address from the 192.168.2.0/24 network, an adapter is created on your host that always gets the IP address 192.168.2.1/24, so you have the addresses 192.168.2.2 - 192.168.2.254 available for your VMs.
This means: You have for your master a collision of the IP address with your host!
But why does the access to your first VM work?
ssh vagrant#192.168.56.1 -p 2711 -i ~/.ssh/ansible <-- successful connection
That is relatively simple: The network 192.168.56.0/24 is the default network for Host-Only under VirtualBox, so you probably have a VirtualBox Host-Only Ethernet Adapter with the address 192.168.56.1/24.
Because you have defined a port forwarding in your Vagrantfile a mapping of the 1st VM to localhost:2711 takes place. If you now access 192.168.56.1:2711, this is your own host, thus localhost, and the SSH of the 1st VM is mapped to port 2711 on this host.
So what do you have to do now?
Change the IP addresses of your VMs, e.g. use 192.168.2.11 - 192.168.2.13.
The access to the VMs is possible as follows:
Node
via Guest-IP
via localhost
kubemaster
192.168.2.11:22
localhost:2711
kubenode01
192.168.2.12:22
localhost:2712
kubenode02
192.168.2.13:22
localhost:2713
Note: If you want to access with the guest IP address, use port 22, if you want to access via localhost, use port 2710+i defined by you.

proxycommand doesnt seem to work with ansible and my environment

I've tried many combinations to get this to work but cannot for some reason. I am not using keys in our environment so passwords will have to do.
I've tried proxyjump and sshuttle as well.
It's strange has the ping module works but when trying another module or playbook it doesn't work.
Rough set up is:
laptop running ubuntu with ansible installed
[laptop] ---> [productionjumphost] ---> [production_iosxr_router]
ansible.cfg:
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=30m -o ControlPath=/tmp/ansible-%r#%h:%p -F ssh.config
~/.ssh/config == ssh.cfg: (configured both)
Host modeljumphost
HostName modeljumphost.fqdn.com.au
User user
Port 22
Host productionjumphost
HostName productionjumphost.fqdn.com.au
User user
Port 22
Host model_iosxr_router
HostName model_iosxr_router
User user
ProxyCommand ssh -W %h:22 modeljumphost
Host production_iosxr_router
HostName production_iosxr_router
User user
ProxyCommand ssh -W %h:22 productionjumphost
inventory:
[local]
192.168.xxx.xxx
[router]
production_iosxr_router ansible_connection=network_cli ansible_user=user ansible_ssh_pass=password
[router:vars]
ansible_network_os=iosxr
ansible_ssh_common_args: '-o ProxyCommand="ssh -W %h:%p -q user#productionjumphost.fqdn.com.au"'
ansible_user=user
ansible_ssh_pass=password
playbook.yml:
---
- name: Network Getting Started First Playbook
hosts: router
gather_facts: no
connection: network_cli
tasks:
- name: show version
iosxr_command:
commands: show version
I can run an ad-hoc ansible command and a successful ping is returned:
result: ansible production_iosxr_router -i inventory -m ping -vvvvv
production_iosxr_router | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": false,
"invocation": {
"module_args": {
"data": "pong"
}
},
"ping": "pong"
}
running playbook: ansible-playbook -i inventory playbook.yml -vvvvv
production_iosxr_router | FAILED! => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": false,
"msg": "[Errno -2] Name or service not known"
}

Remote-exec provisioner on gcp not connecting with host

I'm trying to use remote-exec provisioner for a use-case related to my project on GCP using Terraform version 12, based on the format specified in terraform docs I get a known hosts key mismatch error after the provisioner timesout.
resource "google_compute_instance" "secondvm" {
name = "secondvm"
machine_type = "n1-standard-1"
zone = "us-central1-a"
boot_disk {
initialize_params {
image = "centos-7-v20190905"
}
}
network_interface {
network = "default"
access_config {
nat_ip = google_compute_address.second.address
network_tier = "PREMIUM"
}
}
#metadata = {
#ssh-keys = "root:${file("~/.ssh/id_rsa.pub")}"
#}
metadata_startup_script = "cd /; touch makefile.txt; sudo echo \"string xyz bgv\" >>./makefile.txt"
provisioner "remote-exec" {
inline = [
"sudo sed -i 's/xyz/google_compute_address.first.address/gI' /makefile.txt"
]
connection {
type = "ssh"
#port = 22
host = self.network_interface[0].access_config[0].nat_ip
user = "root"
timeout = "120s"
#agent = false
private_key = file("~/.ssh/id_rsa")
#host_key = file("~/.ssh/google_compute_engine.pub")
host_key = file("~/.ssh/id_rsa.pub")
}
}
depends_on = [google_compute_address.second]
}
I'm not sure what exactly I'm doing wrong with the keys here but the error I get is
google_compute_instance.secondvm: Still creating... [2m10s elapsed]
google_compute_instance.secondvm (remote-exec): Connecting to remote host via SSH...
google_compute_instance.secondvm (remote-exec): Host: 104.155.186.128
google_compute_instance.secondvm (remote-exec): User: root
google_compute_instance.secondvm (remote-exec): Password: false
google_compute_instance.secondvm (remote-exec): Private key: true
google_compute_instance.secondvm (remote-exec): Certificate: false
google_compute_instance.secondvm (remote-exec): SSH Agent: false
google_compute_instance.secondvm (remote-exec): Checking Host Key: true
google_compute_instance.secondvm: Still creating... [2m20s elapsed]
Error: timeout - last error: SSH authentication failed (root#104.155.186.128:22): ssh: handshake failed: knownhosts: key mismatch

Terraform remote-exec on windows with ssh

I have setup a Windows server and installed ssh using Chocolatey. If I run this manually I have no problems connecting and running my commands. When I try to use Terraform to run my commands it connects successfully but doesn't run any commands.
I started by using winrm and then I could run commands but due to some problem with creating a service fabric cluster over winrm I decided to try using ssh instead and when running things manually it worked and the cluster went up. So that seems to be the way forward.
I have setup a Linux VM and got ssh working by using the private key. So I have tried to use the same config as I did with the Linux VM on the Windows but it still asked me to use my password.
What could the reason be for being able to run commands over ssh manually and using Terraform only connect but no commands are run? I am running this on OpenStack with Windows 2016
null_resource.sf_cluster_install (remote-exec): Connecting to remote host via SSH...
null_resource.sf_cluster_install (remote-exec): Host: 1.1.1.1
null_resource.sf_cluster_install (remote-exec): User: Administrator
null_resource.sf_cluster_install (remote-exec): Password: true
null_resource.sf_cluster_install (remote-exec): Private key: false
null_resource.sf_cluster_install (remote-exec): SSH Agent: false
null_resource.sf_cluster_install (remote-exec): Checking Host Key: false
null_resource.sf_cluster_install (remote-exec): Connected!
null_resource.sf_cluster_install: Creation complete after 4s (ID: 5017581117349235118)
Here is the script im using to run the commands:
resource "null_resource" "sf_cluster_install" {
# count = "${local.sf_count}"
depends_on = ["null_resource.copy_sf_package"]
# Changes to any instance of the cluster requires re-provisioning
triggers = {
cluster_instance_ids = "${openstack_compute_instance_v2.sf_servers.0.id}"
}
connection = {
type = "ssh"
host = "${openstack_networking_floatingip_v2.sf_floatIP.0.address}"
user = "Administrator"
# private_key = "${file("~/.ssh/id_rsa")}"
password = "${var.admin_pass}"
}
provisioner "remote-exec" {
inline = [
"echo hello",
"powershell.exe Write-Host hello",
"powershell.exe New-Item C:/tmp/hello.txt -type file"
]
}
}
Put the connection block inside the provisioner block:
provisioner "remote-exec" {
connection = {
type = "ssh"
...
}
inline = [
"echo hello",
"powershell.exe Write-Host hello",
"powershell.exe New-Item C:/tmp/hello.txt -type file"
]
}

Sometimes cannot connect via SSH to to GCP VM for a certain amount of time

I am setting up a project with Terraform and Google Compute. In this project I start up multiple VMs and configure them directly afterwards via SSH. Sometimes there is a single or multiple VMs to which I cannot connect via SSH with my usual account. The problem magically disappears after approximately 5 minutes, even if I do not do anything. After this time everything works normally again. I am however able to SSH into the instance with the web interface during the down time.
I am not able to reliably reproduce this issue. It just magically happens sometimes to a random amount of VMs for about 5 minutes.
I am pretty lost on this and would really appreciate any pointer as to where I might find a solution.
Here is a short summary of the problem:
Cannot connect to GCP VM via SSH with predefined user
Only happens sometimes (Issue is not reliably reproducable)
Only lasts for a few minutes (~5 minutes)
During this time I can SSH into the VM via GCPs web interface
This is the Terraform code I am using to start the instances:
module.google:
variable "project"{}
variable "credentials"{}
variable "count"{default = 1}
variable "name_machine"{}
variable "zone"{}
provider "google" {
credentials = "${var.credentials}"
project = "${var.project}"
}
resource "google_compute_instance" "vm" {
count = "${var.count}"
zone = "${var.zone}"
name = "${var.name_machine}${count.index}"
machine_type = "n1-standard-1"
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-1604-lts"
}
}
network_interface {
network = "default"
access_config {
}
}
}
EDIT The code I use to SSH into the instance.
resource "null_resource" "node"{
provisioner "remote-exec" {
inline="${data.template_file.start_up_script.rendered}"
}
connection {
user = "${var.ssh_user}"
host = "${var.ip_address}"
type = "ssh"
private_key="${var.ssh_private_key}"
}
}
EDIT 2 Terraform Output
null_resource.node (remote-exec): Connecting to remote host via SSH...
null_resource.node (remote-exec): Host: xx.xx.xx.xx
null_resource.node (remote-exec): User: Nopx
null_resource.node (remote-exec): Password: false
null_resource.node (remote-exec): Private key: true
null_resource.node (remote-exec): SSH Agent: false
null_resource.node: Still creating... (10s elapsed)
null_resource.node (remote-exec): Connecting to remote host via SSH...
null_resource.node (remote-exec): Host: xx.xx.xx.xx
null_resource.node (remote-exec): User: Nopx
null_resource.node (remote-exec): Password: false
null_resource.node (remote-exec): Private key: true
null_resource.node (remote-exec): SSH Agent: false
null_resource.node (remote-exec): Connecting to remote host via SSH...
null_resource.node (remote-exec): Host: xx.xx.xx.xx
null_resource.node (remote-exec): User: Nopx
null_resource.node (remote-exec): Password: false
null_resource.node (remote-exec): Private key: true
null_resource.node (remote-exec): SSH Agent: false
null_resource.node: Still creating... (20s elapsed)
null_resource.node (remote-exec): Connecting to remote host via SSH...
null_resource.node (remote-exec): Host: xx.xx.xx.xx
null_resource.node (remote-exec): User: Nopx
null_resource.node (remote-exec): Password: false
null_resource.node (remote-exec): Private key: true
null_resource.node (remote-exec): SSH Agent: false
...