Ansible: will it skip to next task if the current task failed - error-handling

Since Ansible execute the tasks in order, will it skip to the next task if the current task failed?

As documented in Error handling in playbooks
:
When Ansible receives a non-zero return code from a command or a failure from a module, by default it stops executing on that host and continues on other hosts.
[...]
You can use ignore_errors to continue on in spite of the failure

From the Ansible docs:
By default Ansible stops executing tasks on a host when a task fails on that host. You can use ignore_errors to continue on in spite of the failure.
- name: Do not count this as a failure
ansible.builtin.command: /bin/false
ignore_errors: yes
The ignore_errors directive only works when the task is able to run and returns a value of ‘failed’. It does not make Ansible ignore undefined variable errors, connection failures, execution issues (for example, missing packages), or syntax errors.
REFERENCES:
Ansible docs: Error handling in playbooks: https://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html#error-handling-in-playbooks

Related

Batch job submission failed: I/O error writing script/environment to file

I installed slurm on a workstation and it seemed to work, i can use the slurm commands, srun is working too.
But when i try to launch a job from a script using sbatch test.sh i get the following error : Batch job submission failed: I/O error writing script/environment to file even if the script is the simplest like
#!/bin/bash
srun hostname
Make sure slurmd is running as root. See the SlurmdUser parameter in slurm.conf. Its default value is root and it should be so.
Note this is different from the SlurmUser parameter, that defines the user which runs the controller processes ; this one is preferably not root.
If the configuration is correct, then you might have a faulty filesystem at the location referred to in the SlurmdSpoolDir parameter, where slurmd writes the submission script and environment for jobs assigned to the node.

TFS - 'Run SSH task' option times out

In TFS, Am using SSH task with 'Commands' option to connect to a remote machine and run a set of few commands. Am using cd to a particular folder and running a shell script using 'sh '
This script usually takes around 2 hours to finish execution. The ssh task timesout after 15 minutes and exits the task. But when I see in the machine manually, the process is running.
Why doesn't the ssh task wait until the script finishes completely
According to your description, you may encountered a time out limitation of SSH task or build definition.
First, please double check the time out setting under control options.
Specifies the maximum time, in minutes, that a task is allowed to
execute before being cancelled by server. A zero value indicates an
infinite timeout.
Another place to check is build job time out, under the settings of your build definition: Option ->Build job timeout in minutes.
Specifies the maximum time a build job is allowed to execute on an
agent before being canceled by the server.
An empty or zero value indicates an infinite timeout.
If both set properly and you still get the time out, please attach more detail related build failed log with Verbose Debug Mode by setting system.debug=true for troubleshooting.

Jenkins SSH remote process is getting killed as soon as the Jenkins SSH plugin returns back

Jenkins version: 1.574
I created a simple job which performs the following:
Using "Execute shell script on remote host using SSH" as one of the BUILD steps, I'm just calling a shell script. This shell script performs stop and start operations on Tomcat to restart an application on the target machine.
I have a valid username, password, port defined for the target SSH server in Jenkins Global settings.
I saw this behavior that when I run a Jenkins job and call the restart script (which gets the application name as parameter $1), it works fine, but as soon as "Execute shell script on remote host using SSH" step completes, I see the new process dies on the remote/target application server.
If I run the script from the target/remote server itself, everything works fine and the new process/PID remains live forever, but running the same script from Jenkins, though I don't see any errors and everything works as expected, the new process dies as soon as the above mentioned SSH step is complete and control comes back to the next BUILD step in Jenkins job OR the Jenkins job is complete.
I saw a few posts/blogs and tried setting: BUILD_ID=dontKillMe in the Jenkins job (in various places i.e. Prepare Environment variables and also using Inject Environment variables...). When the job's particular build# is complete, I can see Environment Variables for that build# does say BUILD_ID=dontKillMe as its value (instead of the default Timestamp tag value).
I tried putting nohup before calling the restart script, i.e.,
nohup restart_tomcat.sh "${app}"
I also tried:
BUILD_ID=dontKillMe nohup restart_tomcat.sh "${app}"
This doesn't give any error and creates a nohup.out file on the remote server (but I'm not worried about it as the restart_tomcat.sh script itself creates its own LOG file which I'm "cat"ing after the restart_tomcat.sh script is complete. cat'ing on the log file is performed using another "Execute shell script on remote host using SSH" build step, and it successfully shows the log file created by the restart script).
I don't know what I'm missing at this point, but as soon as the restart_tomcat.sh step is complete, the new PID/process on the remote/target server dies.
How can I fix this?
I've been through this myself.
On my first iteration, before I knew about Jenkins ProcessTreeKiller, I ended up just daemonizing Tomcat. The Apache Tomcat documentation includes a section on running as a daemon.
You can also try disabling the ProcessTreeKiller for your whole Jenkins instance, if it's relatively small (read the first link for information).
The BUILD_ID=dontKillMe should be passed to the shell, and therefore it should be in your command line, not in Jenkins global configuration or job parameters.
BUILD_ID=dontKillMe restart_tomcat.sh "${app}" should have worked without problems.
You can also try nohup restart_tomcat.sh "${app}" & with the & at the end.
My solution (it worked after trying everything else) in Ubuntu 14.04 (Trusty Tahr) (Amazon AWS - Amazon EC2), Jenkins 1.601:
Exec command: (setsid COMMAND < /dev/null > /dev/null 2>&1 &);
Exec in PTY: DISABLED
// Example COMMAND=socat TCP4-LISTEN:1337,fork TCP4:127.0.0.1:1338
I created this Transfer as my last one.
#!/bin/ksh
export BUILD_ID=dontKillMe
I added the above line to the start of my script and the issue was resolved.

TeamCity config fails on cleanup of selenium-server.jar

I've setup a TC configuration ready to run Selenium tests once we get a build agent able to run them.
This is the first TC config I've created but it was running until I got TC to run the Selenium test runner. Now it fails when it tries to cleanup the Selenium-server.jar.
Can you exclude file types from the cleanup or is there another solution that I'm missing here?
TC build errors;
> Problem reported from build script. New build status text is: : {build.status.text}; Swabra cleanup failed
> Error while applying patch: Error while applying patch: Failed to delete: C:\BuildAgent\work\f43641868cf93216\src\django_selenium\selenium-server.jar
You can configure Swabra cleaner to ignore your selenium-server.jar using path rule:
-:src\django_selenium\selenium-server.jar
but it looks like there is a problem in selenium tests setup. The error message you posted may be caused by some process (most probably, selenium server) still running after tests are finished. It has locked the library and thats why swabra can't delete it.

Jenkins succeed when unit test fails (Rails)

I'm barely started to use Jenkins and this is the first problem I've had so far. Basically my jenkins job always succeed even when an error happened in some of the tests. This is what I'm running in the shell config:
bundle install
rake db:migrate:reset
rake test:units
rake spec:models
Thing is that Jenkins only reports a failure when the task which fails is the last one. For instance, if I put "rake test:units" the last task it will notify an error if something go wrong. Using this configuration I only get error reports for the rspec tests but not for the unit tests.
Anyone wondering why I don't only use rspec or unit test, we are currently migrating to rspec but this problem is still painful.
This is part of the log from Jenkinsm as you can see one of the unit test fails but jenkins still finish with success.
314 tests, 1781 assertions, 1 failures, 0 errors, 0 skips
rake aborted!
Command failed with status (1): [/var/lib/jenkins/.rvm/rubies/ruby-1.9.3-p1...]
Tasks: TOP => test:units
(See full trace by running task with --trace)
Lot of rspec tests here....
Finished in 3.84 seconds
88 examples, 0 failures, 42 pending
Pushing HEAD to branch master of origin repository
Pushing HEAD to branch master at repo origin
Finished: SUCCESS
Jenkins executes the commands you type into a Build Step box by writing them to a temporary file and then running the script using /bin/sh -xe.
Usually this produces the desired effect: Commands are executed in sequence (and printed) and the script aborts immediately when a command fails i.e. exits with non-zero exit code.
If this is not happening to you, the only reason can be that you have overridden this behavior. You can override it by starting the first line of your Build Step with these two characters: #!.
For example, if your Build Step looks like this:
#!/bin/bash
bundle install
rake db:migrate:reset
rake test:units
rake spec:models
Then it means Jenkins will write the script to a temporary file and it will be executed with /bin/bash. When invoked like that, bash will execute commands one-by-one and not care if they succeed. The exit code of the bash process will be the exit code of the last command in the script and that will be seen by Jenkins when the script ends.
So, take care in what you put on the first line of the Build Step. If you do not know how shell works, do not put a hash-bang at all and let Jenkins decide how the script should be run.
If you need more control over how the Build Step is executed, you should study the man page of the shell you use to find out how to make it behave the way you want. Jenkins doesn't have much of a role in here. It just executes the shell you wanted the way you wanted.
Jenkins can only see the result code of the last command run so it has no way of knowing what the result of rake test:units is.
The easiest thing is probably to have each command of those commands as a separate jenkins build step.
An alternative solution is change your first line to the following:
#!/bin/bash -e
This tells your script to fail if any of the commands in the script return an error.
See: Automatic exit from bash shell script on error