Ansible: Restart the task if server is unreachable - automation

I have the following task to check if a file is available on a host.
- name: Wait for file to be available
shell: ls /path/to/file.txt
delay: 60
retries: 20
register: file_available
until: file_available.rc == 0
While the task is waiting for the file to be available, a server process reboots the server that leads to the host being unreachable and the task failing. I want to wait for the server to be up if it is unreachable and once it is up, I want to retry the task. I have tried a workaround with pausing the playbook and then executing the task, but I'd like to not hard code the wait time, looking for a solution that doesn't explicitly pause, but rather handles the error in the task itself. Is there any way I could do that with ansible? Thanks in advance.

Related

gitlab CI/Ci runner is failing to failing to start redis server

I am trying to run a gitlab CI/CD pipeline but job to run the text script is failing due to redis server failing to start, I am getting an error saying
Starting redis ... error
ERROR: for redis Cannot start service redis: driver failed programming external connectivity on endpoint redis (8c425d45729aecef06c5c15b082ac96867a73986400f5c8bae29ebab55eb5fdf): Error starting userland proxy: listen tcp 0.0.0.0:6379: bind: address already in
I have tried run gitlab-runner restart command from terminal but the job is still failing
I al tried setting the Maximum job timeout to 10 mins but the job is still persisting

How Do I Make Ansible Retry on SSH Errors?

I am occasionally getting SSH failures in my Ansible 2.6.19 playbook during operations that that use file or copy with large with_items. Several items will succeed then at some point I will get
Shared connection to xxx.xyz.com closed
sudo: PAM account management error: Authentication service cannot retrieve authentication info
Then 2 seconds later there is a SUCCESS message for each of the rest of the files. This suggests to me that something must have happened on the server to cause the issue and then it resolved itself.
I have pipelining = True in my ansible.cfg.
How do I make Ansible playbook try again on SSH errors like this so the playbook doesn't fail?
EDIT: To address the comment, I am investigating the source but since I don't have control of it I need a backup. The retry/until is at the task level, however, there are too many tasks to put it on each one. I really need something at a playbook level. e.g. in ansible.cfg
One option at configuration level is use retry files. This will allow you to rerun the playbooks with the --limit #path/to/retry-file option.
Excerpt from ansible.cfg:
retry_files_enabled = True
retry_files_save_path = ~/.ansible-retry
This will cause a <playbook>.retry file to be created (in ~/.ansible-retry/ directory) when a playbook failure occurs. Though it doesn't make Ansible automatically retry, the playbook can be rerun with --limit option to cover the hosts on which failure occurred. This can be combined with error handling (as #Zeitounator commented).
The other option is to use the wait_for_connection module.
- name: wait for connection to host for 2 mins
wait_for_connection:
timeout: 120

how do I kill idle redis clients

I want to timeout and kill idle redis clients. Is there a setting I can set to do this? I seem to remember setting a configuration somewhere but I can't seem to find it again.
I want this to be done automatically, rather than manually calling the client kill command.
Have a look into the Redis configuration file (the one you use to launch Redis).
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0
Just check the parameter is not commented out, and change the timeout parameter to put a non zero value in seconds. The instance should be restarted to take this parameter in account.
To change this parameter on a running Redis instance, you can use a client command:
> src/redis-cli config set timeout 10
OK
> src/redis-cli config get timeout
1) "timeout"
2) "10"

Running Redis in daemonized form and using Upstart to manage it doesn't work

I've written an Upstart script for Redis as follows:
description "Redis Server"
start on runlevel [2345]
stop on shutdown
expect daemon
exec sudo -u redis /usr/local/bin/redis-server /etc/redis/redis.conf
respawn
respawn limit 10 5
I then configure redis via it's redis.conf to:
daemonize yes
All the documentation and my own experimentation says Redis forks twice in daemonized form and "expect daemon" should work, but the Upstart script is always holding on to the PID of the former parent (PID - 1). Has anyone got this working?
The following upstart config seems to be working for me, with upstart 1.5 on ubuntu 12.04, with redis.conf daemonize set to yes:
description "redis server"
start on (local-filesystems and net-device-up IFACE=eth0)
stop on shutdown
setuid redis
setgid redis
expect fork
exec /opt/redis/redis-server /opt/redis/redis.conf
respawn
Other people have the same problem. See this gist.
When the daemonize option is activated, Redis does not check if the process is already a daemon (there is no call to getppid). It systematically forks, but only once. It is somewhat unusual, other daemonization mechanisms may require the initial check on getppid, and fork to be called twice (before and after the setsid call), but on Linux this is not strictly required.
See this faq for more information about daemonization.
Redis daemonize function is extremely simple:
void daemonize(void) {
int fd;
if (fork() != 0) exit(0); /* parent exits */
setsid(); /* create a new session */
/* Every output goes to /dev/null. If Redis is daemonized but
* the 'logfile' is set to 'stdout' in the configuration file
* it will not log at all. */
if ((fd = open("/dev/null", O_RDWR, 0)) != -1) {
dup2(fd, STDIN_FILENO);
dup2(fd, STDOUT_FILENO);
dup2(fd, STDERR_FILENO);
if (fd > STDERR_FILENO) close(fd);
}
}
Upstart documentation says:
expect daemon
Specifies that the job's main process is a daemon, and will fork twice after being run.
init(8) will follow this daemonisation, and will wait for this to occur before running
the job's post-start script or considering the job to be running.
Without this stanza init(8) is unable to supervise daemon processes and will
believe them to have stopped as soon as they daemonise on startup.
expect fork
Specifies that the job's main process will fork once after being run. init(8) will
follow this fork, and will wait for this to occur before running the job's post-start
script or considering the job to be running.
Without this stanza init(8) is unable to supervise forking processes and will believe
them to have stopped as soon as they fork on startup.
So I would either deactivate daemonization on Redis side, either try to use expect fork rather than expect daemon in upstart configuration.

Monit on CentOS causes httpd.pid not to be created

The solution was to replace this line:
check process apache with pidfile /var/run/httpd.pid
With this line:
check process httpd with pidfile /var/run/httpd/httpd.pid
And I also removed the 'group apache'.
Original post:
After installing Monit on CentOS, and setting an alert for the Apache (httpd) service, the service no longer creates the /var/run/httpd.pid file.
The httpd service IS running properly.
On top of it, as if that's not enough, Monit reports the status of the service as: Execution failed
Naturally, the only way to restart such a service is by killing it, since the 'restart' script doesn't see any running process.
These are the contents of the /etc/monit.d/monitrc file:
set daemon 10
set logfile syslog facility log_daemon
set mailserver localhost
set mail-format { from: me#server.com }
set alert bugs#server.com
set httpd port 2812 and
# SSL ENABLE
# PEMFILE /var/certs/monit.pem
allow user:password
check process apache with pidfile /var/run/httpd.pid
group apache
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if cpu is greater than 180% for 1 cycles then alert
if totalmem > 1200 MB for 2 cycles then restart
if children > 250 then restart
check process sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/sshd start"
stop program "/etc/init.d/sshd stop"
if failed port 22 protocol ssh for 5 cycles then restart
if 5 restarts within 25 cycles then timeout
Output of "service httpd restart":
Stopping httpd: [FAILED]
Starting httpd: (98)Address already in use: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs
[FAILED]
Any help will be greatly appreciated.
Try to replace stop program with /usr/sbin/httpd -k stop. It work for me.
I had the same problem but /usr/sbin/httpd -k stop didn't seem to help since this still tries to look up the process id from the pid file.
I opted for stop program = "/usr/bin/killall httpd". I don't think this is very elegant (probably kills open requests) but it was the only way I could find to restart apache and have the pid file recreated by monit.
I think that monit is doing a restart as 'stop; start' and is not waiting for 'stop' to finish before starting a new process, and thus is deleting the pid file at an inappropriate time. At least, that's my conclusion after tinkering with all this.
I found a reference to someone who fixed this issue by making monit sleep after the 'stop' statement.
Personally, I found that replacing 'restart' with 'start' when the http server is down worked just fine.