Ansible wait_for not response correct error - testing

I trying to test a batch of connection but the connections that fail all error resposnses are "Timeout" but I know (I tested) some of them are "no route to host".
How I can do that with wait_for in ansible?
- name: Test connectivity flow
wait_for:
host: "{{ item.destination_ip }}"
port: "{{ item.destination_port }}"
state: started # Port should be open
delay: 0 # No wait before first check (sec)
timeout: 3 # Stop checking after timeout (sec)
delegate_to: "{{ item.source_ip }}"
failed_when: false
register: test_connectivity_flow_result
- name: Append result message to result list msg
set_fact:
result_list_msg: "{% if test_connectivity_flow_result.msg is defined %}{{ result_list_msg + [test_connectivity_flow_result.msg] }}{% else %}{{ result_list_msg + [ '' ] }}{% endif %}"
Current response: Timeout when waiting for 1.1.1.1:1040
Expected response: No route to host 1.1.1.1:1040

Quoting the title of the documentation of the wait_for module
wait_for – Waits for a condition before continuing
If I "rephrase" the condition your have written, this would give something like: "wait for host X to be a resolvable destination and for port 22 to be opened on that destination, retry with no delay and timeout after 3s".
This could typically be a test you launch because you started a new vm and registered it in a dns. So you wait for the dns to propagate AND the ssh port being available.
In your case, you get a timeout because your hostname never becomes a resolvable address.
If you specifically want to test there is no route to host and don't want to wait until the route eventually becomes available, you need to do that an other way. Here is a simple example playbook with the ping module:
---
- name: Very basic connection test
hosts: localhost
gather_facts: false
tasks:
- name: Test if host is reachable (will report no route if so)
ping:
delegate_to: nonexistent.host.local
Which results in:
PLAY [Very basic connection test] *****************************************************
TASK [Test if host is reachable (will report no route if so)] *************************
fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname nonexistent.host.local: Name or service not known", "unreachable": true}
PLAY RECAP ****************************************************************************
localhost : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
Note that the ping module:
Will report that there is no route to host if so
Will implicitly try to connect to port 22
Will make sure that the host has python installed and is ready to be managed through ansible.
If the host you are trying to check should not meet all of the above conditions (e.g. you want the test to succeed even if python is not installed), you will need an other scenario. Running the ICMP ping through the command module is one of the multiple solutions.

I ended up doing something like that:
- name: Check if port {{ R_PORT }} is open. If it is, let's fail and investigate manually.
shell: ss -ltpn | grep :{{ R_PORT }} | wc -l
register: open_redis_port
- debug: msg="{{ open_redis_port.stdout }}"
- name: Fail playbook execution if port {{ R_PORT }} is open
fail:
msg: Port {{ R_PORT }} is open. Failing, please investigate manually.
when: open_redis_port.stdout == "2" or open_redis_port.stdout == "1"

Related

delegate_to ignores configured ssh_port

Use-Case:
We are deploying virtual machines into a cloud with a default linux image (Ubuntu 22.04 at the moment). After deploying a machine, we configure our default users and change the SSH port from 22 to 2222 with Ansible.
Side note: We are using a jump concept through the internet - Ansible automation platform / AWS => internet => SSH jump host => target host
To keep the possibility for Ansible to connect to the new machine, after changing the SSH port, I found multiple Stack Overflow / blog entries, checking and setting ansible_ssh_port, basically by running wait_for on port 22 and 2222 and set the SSH variable depending on the result (code below).
Right now this works fine for the first SSH host (jumphost), but always fails for the second host due to issues with establishing the ssh connection.
Side note: The SSH daemon is running. If I use my user from the jump host, I can get a SSH response from 22/2222 (depending on the current state of deployment).
Edit from questions:
The deployment tasks should only be run on the target host. Not the jumphost as well.
I run the deployment on the jumphost first and make sure it is up, running and configured.
After that, i run the deployment on all machines behind the jumphost to configure them.
This also ensures that if i ever would need reboot, that i don't kill all tunneled ssh session by accident.
Ansible inventory example
all:
hosts:
children:
jumphosts:
hosts:
example_jumphost:
ansible_host: 123.123.123.123
cloud_hosts:
hosts:
example_cloud_host01: #local DNS is resolved on the jumphost - no ansible_host here (yet)
ansible_ssh_common_args: '-oProxyCommand="ssh -W %h:%p -oStrictHostKeyChecking=no -q ansible#123.123.123.123 -p 2222"' #Tunnel through the appropriate jumphost
delegation_host: "ansible#123.123.123.123" #delegate jobs to the jumphost in each project if needed
vars:
ansible_ssh_port: 2222
SSH check_port role
- name: Set SSH port to 2222
set_fact:
ansible_ssh_port: 2222
- name: "Check backend port 2222"
wait_for:
port: 2222
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
# delegate_to: "{{ delegation_host }}"
# vars:
# ansible_ssh_port: 2222
ignore_errors: true
register: ssh_port
- name: "Check backend port 22"
wait_for:
port: "22"
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
# delegate_to: "{{ delegation_host }}"
# vars:
# ansible_ssh_port: 2222
ignore_errors: true
register: ssh_port_default
when:
- ssh_port is defined
- ssh_port.state is undefined
- name: Set backend SSH port to 22
set_fact:
ansible_ssh_port: 22
when:
- ssh_port_default.state is defined
The playbook itself
- hosts: "example_cloud_host01"
gather_facts: false
roles:
- role: check_port #check if we already have the correct port or need 22
- role: sshd #Set Port to 2222 and restart sshd
- role: check_port #check the port again, after it has been changed
- role: install_apps
- role: configure_apps
Error message:
with delegate_to for the task Check backend port 2222:
fatal: [example_cloud_host01 -> ansible#123.123.123.123]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 123.123.123.123 port 22: Connection refused", "unreachable": true}
This confuses me, because I expect the delegation host to use the same ansible_ssh_port as the target host.
Without delegate_to for task Check backend port 2222 and Check backend port 22:
fatal: [example_cloud_host01]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "elapsed": 5, "msg": "Timeout when waiting for example_cloud_host01:2222"}
fatal: [example_cloud_host01]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "elapsed": 5, "msg": "Timeout when waiting for example_cloud_host01:22"}
I have no idea why this happens. If I try the connection manually, it works fine.
What I tried so far:
I played around with delegate_to, vars, ... as mentioned above.
I wanted to see if I can provide delegato_to with the proper port 2222 for the jump host.
I wanted to see if can run this without delegate_to (since it should automatically use the proxy command to run on the jump host anyway).
Neither way gave me a solution on how to connect to my second tier servers after changing the SSH port.
Right now, I split the playbook into two
deploy sshd config with port 22
run our full deploy afterwards on port 2222
I would do the following (I somewhat tested this with fake values in the inventory using localhost as a jumphost to check ports on localhost as well)
Edit: modified my examples to somewhat try to show you a way after your comments on your question an on this answer
Inventory
---
all:
vars:
ansible_ssh_port: 2222
proxies:
vars:
ansible_user: ansible
hosts:
example_jumphost1:
ansible_host: 123.123.123.123
example_jumphost2:
ansible_host: 231.231.231.231
# ... and more jump hosts ...
cloud_hosts:
vars:
jump_vars: "{{ hostvars[jump_host] }}"
ansible_ssh_common_args: '-oProxyCommand="ssh -W %h:%p -oStrictHostKeyChecking=no -q {{ jump_vars.ansible_user }}#{{ jump_vars.ansible_host }} -p {{ jump_vars.ansible_shh_port | d(22) }}"'
children:
cloud_hosts_north:
vars:
jump_host: example_jumphost1
hosts:
example_cloud_host01:
example_cloud_host02:
# ... and more ...
cloud_hosts_south:
var:
jump_host: example_jumphost2
hosts:
example_cloud_host03:
example_cloud_host04:
# ... and more ...
# ... and more cloud groups ...
Tasks to check ports.
- name: "Check backend inventory configured port {{ ansible_ssh_port }}"
wait_for:
port: "{{ ansible_ssh_port }}"
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
delegate_to: "{{ jump_host }}"
ignore_errors: true
register: ssh_port
- name: "Check backend default ssh port if relevant"
wait_for:
port: "22"
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
delegate_to: "{{ jump_host }}"
ignore_errors: true
register: ssh_port_default
when: ssh_port is failed
- name: "Set backend SSH port to 22 if we did not change it yet"
set_fact:
ansible_ssh_port: 22
when:
- ssh_port_default is not skipped
- ssh_port_default is success
Please note that if checks for ports 22/2222 both fail, your configured port will still be 2222 but any later task will obviously fail. You might want to fail fast after checks for those relevant hosts:
- name: "Fail host if no port is available"
fail:
msg:
- "Host {{ inventory_hostname }}" does not have"
- "any ssh port available (tested 22 and 2222)"
when:
- ssh_port is failed
- ssh_port_default is failed
With this in place, you can use different targets on your play to reach the relevant hosts:
For jump hosts
Run on a single bastion host: e.g. hosts: example_jumphost1
Run on all bastion hosts: hosts: proxies
For cloud hosts
Run on all cloud hosts: hosts: cloud_hosts
Run on a single child group: e.g. hosts: cloud_hosts_north
Run on all cloud hosts except a subgroup: e.g. hosts: cloud_hosts:!cloud_hosts_south
For more see ansible patterns

Ansible: How to check SSH access

Good morning all,
I'm racking my brains over a simple subject.
I'm on a "master" server and I would like to check if he manages to connect in SSH on a server list.
Example
ansible-playbook -i inventaire_test test_ssh.yml
---
tasks:
- name: test unreachable
ansible.builtin.ping:
register: test_ssh
ignore_unreachable: true
- name: test
fail:
msg: "test"
when: test_ssh.unreachable is defined
- name: header CSV
lineinfile:
insertafter: EOF
dest: /home/list.csv
line: "Server;OS;access"
delegate_to:localhost
- name: Info
lineinfile:
dest: /home/list.csv
line: "{{ inventory_hostname }};OK"
state: present
when: test_ssh is successful
delegate_to:localhost
- name: Info csv
lineinfile:
dest: /home/list.csv
line: "{{ inventory_hostname }};KO"
state: present
when: test_ssh.unreachable is undefined
delegate_to:localhost
I can't find a check_ssh module. There is ansible.builtin.ssh but I can't use it.
Do you have an idea?
Thanks in advance.
Regarding
I'm on a "master" server and I would like to check if he manages to connect in SSH on a server list. ... I can't find a check_ssh module.
According the documentation there is a
ping module – Try to connect to host, verify a usable python and return pong on success
... test module, this module always returns pong on successful contact. It does not make sense in playbooks, but it is useful from /usr/bin/ansible to verify the ability to login and that a usable Python is configured.
which seems to be doing what you are looking for.

Run Ansible Tasks against failed hosts

I'm running a ansible playbook with several tasks and hosts. In this playbook I'm trying to rerun tasks to failed hosts. I'll try to rebuild the situation:
Inventory:
[hostgroup_1]
host1 ansible_host=1.1.1.1
host2 ansible_host=1.1.1.2
[hostgroup_2]
host3 ansible_host=1.1.1.3
host4 ansible_host=1.1.1.4
The hosts from "hostgroup_1" are supposed to fail, so I can check the error-handling on the two hosts.
Playbook:
---
- name: firstplaybook
hosts: all
gather_facts: false
connection: network_cli
vars:
- ansible_network_os: ios
tasks:
- name: sh run
cisco.ios.ios_command:
commands: show run
- name: sh run
cisco.ios.ios_command:
commands: show run
As expected the fist two hosts (1.1.1.1 & 1.1.1.2) are failing and won't be considered for the second task. After looking to several Ansible documentations I found the meta clear_host_errors task. So I tried to run the playbook like this:
---
- name: firstplaybook
hosts: all
gather_facts: false
connection: network_cli
vars:
- ansible_network_os: ios
tasks:
- name: sh run
cisco.ios.ios_command:
commands: show run
- meta: clear_host_errors
- name: sh run
cisco.ios.ios_command:
commands: show run
Sadly the meta input did not reset the hosts and the Playbook went on without considering the failed hosts again.
Actually I would just like to know how Ansible considers failed hosts in a run again, so I can go on with these.
Thank y'all in advance
Regards, Lucas
Do you get any different results when using:
ignore_errors: true
or
ignore_unreachable: yes
with the first task?
Q: "How Ansible considers failed hosts in a run again?"
A: Use ignore_unreachable (New in version 2.7.). For example, in the play below the host test_99 is unreachable
- hosts: test_11,test_12,test_99
gather_facts: false
tasks:
- ping:
- debug:
var: inventory_hostname
As expected, the debug task omit the unreachable host
PLAY [test_11,test_12,test_99] ********************************************
TASK [ping] ***************************************************************
fatal: [test_99]: UNREACHABLE! => changed=false
msg: 'Failed to connect to the host via ssh: ssh: Could not resolve
hostname test_99: Name or service not known'
unreachable: true
ok: [test_11]
ok: [test_12]
TASK [debug] ***************************************************************
ok: [test_11] =>
inventory_hostname: test_11
ok: [test_12] =>
inventory_hostname: test_12
PLAY RECAP *****************************************************************
If you set ignore_unreachable: true the host will be skipped and included in the next task
- hosts: test_11,test_12,test_99
gather_facts: false
tasks:
- ping:
ignore_unreachable: true
- debug:
var: inventory_hostname
PLAY [test_11,test_12,test_99] ********************************************
TASK [ping] ***************************************************************
fatal: [test_99]: UNREACHABLE! => changed=false
msg: 'Failed to connect to the host via ssh: ssh: Could not resolve
hostname test_99: Name or service not known'
skip_reason: Host test_99 is unreachable
unreachable: true
ok: [test_11]
ok: [test_12]
TASK [debug] ***************************************************************
ok: [test_11] =>
inventory_hostname: test_11
ok: [test_12] =>
inventory_hostname: test_12
ok: [test_99] =>
inventory_hostname: test_99
PLAY RECAP *****************************************************************

Can Ansible match hosts passed as parameter without using add_hosts module

Is it possible to pass the IP address as parameter 'Source_IP' to ansible playbook and use it as hosts ?
Below is my playbook ipinhost.yml:
---
- name: Play 2- Configure Source nodes
hosts: "{{ Source_IP }}"
serial: 1
tasks:
- name: Copying from "{{ inventory_hostname }}" to this ansible server.
debug:
msg: "MY IP IS: {{ Source_IP }}"
The playbook fails to run with the message "Could not match supplied host pattern." Output below:
ansible-playbook ipinhost.yml -e Source_IP='10.8.8.11'
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
[WARNING]: Could not match supplied host pattern, ignoring: 10.8.8.11
PLAY [Play 2- Configure Source nodes] ***********************************************************************************************************************
skipping: no hosts matched
PLAY RECAP **************************************************************************************************************************************************
I do not wish to use ansible's add_host i.e i do not wish to build a dynamic host list as the Source_IP will always be a single server.
Please let me know if this is possible and how can my playbook be tweaked to make it run with hosts matching '10.8.8.11'?
If it is always a single host, a possible solution is to pass a static inline inventory to ansible-playbook.
target your play to the 'all' group. => hosts: all
call your playbook with an inlined inventory of one host. Watch out: the comma at the end of IP in the command is important:
ansible-playbook -i 10.8.8.11, ipinhost.yml

Ansible how to ignore unreachable hosts before ansible 2.7.x

I'm using ansible to run a command against multiple servers at once. I want to ignore any hosts that fail because of the '"SSH Error: data could not be sent to remote host \"1.2.3.4\". Make sure this host can be reached over ssh"' error because some of the hosts in the list will be offline. How can I do this? Is there a default option in ansible to ignore offline hosts without failing the playbook? Is there an option to do this in a single ansible cli argument outside of a playbook?
Update: I am aware that the ignore_unreachable: true works for ansible 2.7 or greater, but I am working in an ansible 2.6.1 environment.
I found a good solution here. You ping each host locally to see if you can connect and then run commands against the hosts that passed:
---
- hosts: all
connection: local
gather_facts: no
tasks:
- block:
- name: determine hosts that are up
wait_for_connection:
timeout: 5
vars:
ansible_connection: ssh
- name: add devices with connectivity to the "running_hosts" group
group_by:
key: "running_hosts"
rescue:
- debug: msg="cannot connect to {{inventory_hostname}}"
- hosts: running_hosts
gather_facts: no
tasks:
- command: date
With current version on Ansible (2.8) something like this is possible:
- name: identify reachable hosts
hosts: all
gather_facts: false
ignore_errors: true
ignore_unreachable: true
tasks:
- block:
- name: this does nothing
shell: exit 1
register: result
always:
- add_host:
name: "{{ inventory_hostname }}"
group: reachable
- name: Converge
hosts: reachable
gather_facts: false
tasks:
- debug: msg="{{ inventory_hostname }} is reachable"