delegate_to ignores configured ssh_port - ssh

Use-Case:
We are deploying virtual machines into a cloud with a default linux image (Ubuntu 22.04 at the moment). After deploying a machine, we configure our default users and change the SSH port from 22 to 2222 with Ansible.
Side note: We are using a jump concept through the internet - Ansible automation platform / AWS => internet => SSH jump host => target host
To keep the possibility for Ansible to connect to the new machine, after changing the SSH port, I found multiple Stack Overflow / blog entries, checking and setting ansible_ssh_port, basically by running wait_for on port 22 and 2222 and set the SSH variable depending on the result (code below).
Right now this works fine for the first SSH host (jumphost), but always fails for the second host due to issues with establishing the ssh connection.
Side note: The SSH daemon is running. If I use my user from the jump host, I can get a SSH response from 22/2222 (depending on the current state of deployment).
Edit from questions:
The deployment tasks should only be run on the target host. Not the jumphost as well.
I run the deployment on the jumphost first and make sure it is up, running and configured.
After that, i run the deployment on all machines behind the jumphost to configure them.
This also ensures that if i ever would need reboot, that i don't kill all tunneled ssh session by accident.
Ansible inventory example
all:
hosts:
children:
jumphosts:
hosts:
example_jumphost:
ansible_host: 123.123.123.123
cloud_hosts:
hosts:
example_cloud_host01: #local DNS is resolved on the jumphost - no ansible_host here (yet)
ansible_ssh_common_args: '-oProxyCommand="ssh -W %h:%p -oStrictHostKeyChecking=no -q ansible#123.123.123.123 -p 2222"' #Tunnel through the appropriate jumphost
delegation_host: "ansible#123.123.123.123" #delegate jobs to the jumphost in each project if needed
vars:
ansible_ssh_port: 2222
SSH check_port role
- name: Set SSH port to 2222
set_fact:
ansible_ssh_port: 2222
- name: "Check backend port 2222"
wait_for:
port: 2222
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
# delegate_to: "{{ delegation_host }}"
# vars:
# ansible_ssh_port: 2222
ignore_errors: true
register: ssh_port
- name: "Check backend port 22"
wait_for:
port: "22"
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
# delegate_to: "{{ delegation_host }}"
# vars:
# ansible_ssh_port: 2222
ignore_errors: true
register: ssh_port_default
when:
- ssh_port is defined
- ssh_port.state is undefined
- name: Set backend SSH port to 22
set_fact:
ansible_ssh_port: 22
when:
- ssh_port_default.state is defined
The playbook itself
- hosts: "example_cloud_host01"
gather_facts: false
roles:
- role: check_port #check if we already have the correct port or need 22
- role: sshd #Set Port to 2222 and restart sshd
- role: check_port #check the port again, after it has been changed
- role: install_apps
- role: configure_apps
Error message:
with delegate_to for the task Check backend port 2222:
fatal: [example_cloud_host01 -> ansible#123.123.123.123]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 123.123.123.123 port 22: Connection refused", "unreachable": true}
This confuses me, because I expect the delegation host to use the same ansible_ssh_port as the target host.
Without delegate_to for task Check backend port 2222 and Check backend port 22:
fatal: [example_cloud_host01]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "elapsed": 5, "msg": "Timeout when waiting for example_cloud_host01:2222"}
fatal: [example_cloud_host01]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "elapsed": 5, "msg": "Timeout when waiting for example_cloud_host01:22"}
I have no idea why this happens. If I try the connection manually, it works fine.
What I tried so far:
I played around with delegate_to, vars, ... as mentioned above.
I wanted to see if I can provide delegato_to with the proper port 2222 for the jump host.
I wanted to see if can run this without delegate_to (since it should automatically use the proxy command to run on the jump host anyway).
Neither way gave me a solution on how to connect to my second tier servers after changing the SSH port.
Right now, I split the playbook into two
deploy sshd config with port 22
run our full deploy afterwards on port 2222

I would do the following (I somewhat tested this with fake values in the inventory using localhost as a jumphost to check ports on localhost as well)
Edit: modified my examples to somewhat try to show you a way after your comments on your question an on this answer
Inventory
---
all:
vars:
ansible_ssh_port: 2222
proxies:
vars:
ansible_user: ansible
hosts:
example_jumphost1:
ansible_host: 123.123.123.123
example_jumphost2:
ansible_host: 231.231.231.231
# ... and more jump hosts ...
cloud_hosts:
vars:
jump_vars: "{{ hostvars[jump_host] }}"
ansible_ssh_common_args: '-oProxyCommand="ssh -W %h:%p -oStrictHostKeyChecking=no -q {{ jump_vars.ansible_user }}#{{ jump_vars.ansible_host }} -p {{ jump_vars.ansible_shh_port | d(22) }}"'
children:
cloud_hosts_north:
vars:
jump_host: example_jumphost1
hosts:
example_cloud_host01:
example_cloud_host02:
# ... and more ...
cloud_hosts_south:
var:
jump_host: example_jumphost2
hosts:
example_cloud_host03:
example_cloud_host04:
# ... and more ...
# ... and more cloud groups ...
Tasks to check ports.
- name: "Check backend inventory configured port {{ ansible_ssh_port }}"
wait_for:
port: "{{ ansible_ssh_port }}"
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
delegate_to: "{{ jump_host }}"
ignore_errors: true
register: ssh_port
- name: "Check backend default ssh port if relevant"
wait_for:
port: "22"
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
delegate_to: "{{ jump_host }}"
ignore_errors: true
register: ssh_port_default
when: ssh_port is failed
- name: "Set backend SSH port to 22 if we did not change it yet"
set_fact:
ansible_ssh_port: 22
when:
- ssh_port_default is not skipped
- ssh_port_default is success
Please note that if checks for ports 22/2222 both fail, your configured port will still be 2222 but any later task will obviously fail. You might want to fail fast after checks for those relevant hosts:
- name: "Fail host if no port is available"
fail:
msg:
- "Host {{ inventory_hostname }}" does not have"
- "any ssh port available (tested 22 and 2222)"
when:
- ssh_port is failed
- ssh_port_default is failed
With this in place, you can use different targets on your play to reach the relevant hosts:
For jump hosts
Run on a single bastion host: e.g. hosts: example_jumphost1
Run on all bastion hosts: hosts: proxies
For cloud hosts
Run on all cloud hosts: hosts: cloud_hosts
Run on a single child group: e.g. hosts: cloud_hosts_north
Run on all cloud hosts except a subgroup: e.g. hosts: cloud_hosts:!cloud_hosts_south
For more see ansible patterns

Related

Ansible: Issues with vars_prompt, error: variable is undefined

I think I may be using vars_prompt incorrectly because when I define a variable (used as a host) from command line, the host is used for the following task correctly:
ansible-playbook newfile -v -e 'target_host=uat:prd'
- hosts '{{ target_host }}'
tasks:
...
But when I define the same variable using vars_prompt:
- name: run task
hosts: localhost
gather_facts: no
vars_prompt:
- name: target_host
prompt: please choose a host site
private: no
- hosts: '{{ target_host }}'
tasks:
...
I get error: 'target_host' is undefined pointing at the - hosts: '{{ target_host }}'
Note: it does ask the prompt before getting the error
Thank you for the suggestion to add to host group #JBone. Sadly I have already tried this approach and I get:
Failed to connect to the host via ssh: ssh: Could not resolve hostname uat:prd: Name or service not known
Even though if I fill the host in the playbook as uat:prd it runs on each host
this approach does work for uat or prd by themselves but not uat:prd
you should add this variable value to a new host group using add_host module.
- name: run task
hosts: localhost
gather_facts: no
vars_prompt:
- name: target_host
prompt: please choose a host site
private: no
tasks:
- name: add host
add_host:
name: "{{ target_host }}"
groups: new_hosts_grp
- hosts: new_hosts_grp
tasks:
...
try that one.

Ansible wait_for not response correct error

I trying to test a batch of connection but the connections that fail all error resposnses are "Timeout" but I know (I tested) some of them are "no route to host".
How I can do that with wait_for in ansible?
- name: Test connectivity flow
wait_for:
host: "{{ item.destination_ip }}"
port: "{{ item.destination_port }}"
state: started # Port should be open
delay: 0 # No wait before first check (sec)
timeout: 3 # Stop checking after timeout (sec)
delegate_to: "{{ item.source_ip }}"
failed_when: false
register: test_connectivity_flow_result
- name: Append result message to result list msg
set_fact:
result_list_msg: "{% if test_connectivity_flow_result.msg is defined %}{{ result_list_msg + [test_connectivity_flow_result.msg] }}{% else %}{{ result_list_msg + [ '' ] }}{% endif %}"
Current response: Timeout when waiting for 1.1.1.1:1040
Expected response: No route to host 1.1.1.1:1040
Quoting the title of the documentation of the wait_for module
wait_for – Waits for a condition before continuing
If I "rephrase" the condition your have written, this would give something like: "wait for host X to be a resolvable destination and for port 22 to be opened on that destination, retry with no delay and timeout after 3s".
This could typically be a test you launch because you started a new vm and registered it in a dns. So you wait for the dns to propagate AND the ssh port being available.
In your case, you get a timeout because your hostname never becomes a resolvable address.
If you specifically want to test there is no route to host and don't want to wait until the route eventually becomes available, you need to do that an other way. Here is a simple example playbook with the ping module:
---
- name: Very basic connection test
hosts: localhost
gather_facts: false
tasks:
- name: Test if host is reachable (will report no route if so)
ping:
delegate_to: nonexistent.host.local
Which results in:
PLAY [Very basic connection test] *****************************************************
TASK [Test if host is reachable (will report no route if so)] *************************
fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname nonexistent.host.local: Name or service not known", "unreachable": true}
PLAY RECAP ****************************************************************************
localhost : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
Note that the ping module:
Will report that there is no route to host if so
Will implicitly try to connect to port 22
Will make sure that the host has python installed and is ready to be managed through ansible.
If the host you are trying to check should not meet all of the above conditions (e.g. you want the test to succeed even if python is not installed), you will need an other scenario. Running the ICMP ping through the command module is one of the multiple solutions.
I ended up doing something like that:
- name: Check if port {{ R_PORT }} is open. If it is, let's fail and investigate manually.
shell: ss -ltpn | grep :{{ R_PORT }} | wc -l
register: open_redis_port
- debug: msg="{{ open_redis_port.stdout }}"
- name: Fail playbook execution if port {{ R_PORT }} is open
fail:
msg: Port {{ R_PORT }} is open. Failing, please investigate manually.
when: open_redis_port.stdout == "2" or open_redis_port.stdout == "1"

Set ansible connection differently according to given condition

I have a playbook that, for one of the hosts, how I need to connect differs according to whether certain tasks have previously succeeded.
In this specific case there's a tunnel between two of them, and one routes all its traffic over that tunnel, so once configured I need to use the other as a jump box in order to connect - but I can imagine many other circumstances where you might want to change connection method mid-playbook, from as simple as modifying users/passwords.
How can I have a conditional connection method?
I can't simply update with set_fact, since by the time I reach that task ansible will already have tried and possibly failed to 'gather facts' at the start, and won't proceed.
The devil is in the details for such a question, for sure, but in general I think use of add_host will be the most legible way to do what you want. You can also change the connection on a per-task basis, or conditionally change the connection for the whole playbook against that host:
- hosts: all
connection: ssh # <-- or whatever bootstrap connection plugin
gather_facts: no
tasks:
- command: echo "do something here"
register: the_thing
# now, you can either switch to the alternate connection per task:
- command: echo "do the other thing"
connection: lxd # <-- or whatever
when: the_thing is success
# OR, you can make the alternate connection the default
# for the rest of the current playbook
- name: switch the rest of the playbook
set_fact:
ansible_connection: chroot
when: the_thing is success
# OR, perhaps run another playbook using the alternate connection
# by adding the newly configured host to a special group
- add_host:
name: '{{ ansible_host }}'
groups:
- configured_hosts
when: the_thing is success
# and then running the other playbook against configured hosts
- hosts: configured_hosts
connection: docker # <-- or whatever connection you want
tasks:
- setup:
I use the following snippet as a role and invoke this role depending on the situation whether I need jumphost(bastion or proxy) or not. An example is also given in the comments. This role can add multiple hosts at the same time. Put the following contents in roles/inventory/tasks/main.yml
# Description: |
# Adds given hosts to inventory.
# Inputs:
# hosts_info: |
# (mandatory)
# List of hosts with the structure which looks like this:
#
# - name: <host name>
# address: <url or ip address of host>
# groups: [] list of groups to which this host will be added.
# user: <SSH user>
# ssh_priv_key_path: <private key path for ssh access to host>
# proxy: <define following structure if host should be accessed using proxy>
# ssh_priv_key_path: <priv key path for ssh access to proxy node>
# user: <login user on proxy node>
# host: <proxy host address>
#
# Example Usage:
# - include_role:
# name: inventory
# vars:
# hosts_info:
# - name: controller-0
# address: 10.100.10.13
# groups:
# - controller
# user: user1
# ssh_priv_key_path: /home/user/.ssh/id_rsa
# - name: node-0
# address: 10.10.1.14
# groups:
# - worker
# - nodes
# user: user1
# ssh_priv_key_path: /home/user/.ssh/id_rsa
# proxy:
# ssh_priv_key_path: /home/user/jumphost_key.rsa.priv
# user: jumphost-user
# host: 10.100.10.13
- name: validate inventory input
assert:
that:
- "single_host_info.name is defined"
- "single_host_info.groups is defined"
- "single_host_info.address is defined"
- "single_host_info.user is defined"
- "single_host_info.ssh_priv_key_path is defined"
loop: "{{ hosts_info }}"
loop_control:
loop_var: single_host_info
- name: validate inventory proxy input
assert:
that:
- "single_host_info.proxy.host is defined"
- "single_host_info.proxy.user is defined"
- "single_host_info.proxy.ssh_priv_key_path is defined"
when: "single_host_info.proxy is defined"
loop: "{{ hosts_info }}"
loop_control:
loop_var: single_host_info
- name: Add hosts to inventory without proxy
add_host:
groups: "{{ single_host_info.groups | join(',') }}"
name: "{{ single_host_info.name }}"
host: "{{ single_host_info.name }}"
hostname: "{{ single_host_info.name }}"
ansible_host: "{{ single_host_info.address }}"
ansible_connection: ssh
ansible_ssh_user: "{{ single_host_info.user }}"
ansible_ssh_extra_args: "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
ansible_ssh_private_key_file: "{{ single_host_info.ssh_priv_key_path }}"
loop: "{{ hosts_info | json_query(\"[?contains(keys(#), 'proxy') == `false`]\") | list }}"
loop_control:
loop_var: single_host_info
- name: Add hosts to inventory with proxy
add_host:
groups: "{{ single_host_info.groups | join(',') }}"
name: "{{ single_host_info.name }}"
host: "{{ single_host_info.name }}"
hostname: "{{ single_host_info.name }}"
ansible_host: "{{ single_host_info.address }}"
ansible_connection: ssh
ansible_ssh_user: "{{ single_host_info.user }}"
ansible_ssh_extra_args: "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
ansible_ssh_private_key_file: "{{ single_host_info.ssh_priv_key_path }}"
ansible_ssh_common_args: >-
-o ProxyCommand='ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
-W %h:%p -q -i {{ single_host_info.proxy.ssh_priv_key_path }}
{{ single_host_info.proxy.user }}#{{ single_host_info.proxy.host }}'
loop: "{{ hosts_info | json_query(\"[?contains(keys(#), 'proxy') == `true`]\") }}"
loop_control:
loop_var: single_host_info

Ansible how to ignore unreachable hosts before ansible 2.7.x

I'm using ansible to run a command against multiple servers at once. I want to ignore any hosts that fail because of the '"SSH Error: data could not be sent to remote host \"1.2.3.4\". Make sure this host can be reached over ssh"' error because some of the hosts in the list will be offline. How can I do this? Is there a default option in ansible to ignore offline hosts without failing the playbook? Is there an option to do this in a single ansible cli argument outside of a playbook?
Update: I am aware that the ignore_unreachable: true works for ansible 2.7 or greater, but I am working in an ansible 2.6.1 environment.
I found a good solution here. You ping each host locally to see if you can connect and then run commands against the hosts that passed:
---
- hosts: all
connection: local
gather_facts: no
tasks:
- block:
- name: determine hosts that are up
wait_for_connection:
timeout: 5
vars:
ansible_connection: ssh
- name: add devices with connectivity to the "running_hosts" group
group_by:
key: "running_hosts"
rescue:
- debug: msg="cannot connect to {{inventory_hostname}}"
- hosts: running_hosts
gather_facts: no
tasks:
- command: date
With current version on Ansible (2.8) something like this is possible:
- name: identify reachable hosts
hosts: all
gather_facts: false
ignore_errors: true
ignore_unreachable: true
tasks:
- block:
- name: this does nothing
shell: exit 1
register: result
always:
- add_host:
name: "{{ inventory_hostname }}"
group: reachable
- name: Converge
hosts: reachable
gather_facts: false
tasks:
- debug: msg="{{ inventory_hostname }} is reachable"

vmware_vm_facts vCenter password validation failing

I am using Ansible and vCenter to provision a VM. When I run my playbook, I get an authentication error:
Cannot complete login due to an incorrect user name or password.
However, using the same credentials, I am able to log into vCenter manually.
Here is my simplified playbook:
---
- name: create a new VM on an ESX server
hosts: localhost
connection: local
tasks:
- name: include vars
include_vars:
dir: 'group_vars/prod'
files_matching: 'secret-esx.yml'
- name: gather facts from target host
local_action:
module: vmware_vm_facts
hostname: vi-devops-esx9.lab.vi.local
username: "{{ esx_username }}"
password: "{{ esx_password }}"
validate_certs: no
register: qe_facts
Why can I access vCenter, but vmware_vm_facts cannot with the same credentials?
My hostname was incorrect. Fixing my hostname fixed the authentication error.