Does Ansible do fault-tolerant SSHing? - ssh

Problem Statement:
I use Ansible for spawning slave instances, and SSHing into them, do some tasks and terminate them.
Suppose the playbook is spawning 3 instances. While SSHing into the slave instances, if one's SSH fails, then do Ansible go ahead with the ones which had a successful SSH, or does it fail the task altogether?
If not, then is there any way I can do it?
PS: I did explore the ssh_connection's retries option. But here, by failed SSH, I mean to imply an SSH which failed after retries.

By default Ansible will run your playbook for all specified hosts. If any of them fails, it will still continue running the playbook for the rest of the hosts, and in the end will create a playbook.retry file with the names of the failed hosts, which you can then re-run using:
ansible-playbook playbook.yml --limit #playbook.retry
(assuming your playbook's name is playbook.yml) Note that the re-run will re-run the whole playbook from start, even if some of your tasks have been succeeded in it, hence you should always try to make playbooks resilient to re-runs. Also note that even if you have multiple plays in your playbook, all referring to the same host, the first time the host fails, ansible will not try that host for subsequent plays at all.
There are some ways to change the default behaviour however:
You can for example abort the play for some tasks using any_errors_fatal: true meaning a failure there will mean ansible will stop execution on all hosts (this assumes you are using the default, linear strategy. Using the free strategy means that other hosts might be in a different stage, meaning they might abort earlier / later than you'd expect)
Also, since ansible 2.2 you can re-set unreachable hosts between plays, meaning that even if your host failed in one of the plays, in a subsequent ones ansible will still re-try to run the new plays on it (previous plays will still be marked as failed). You have to add meta: clear_host_errors to the play where you want to re-try all of the previously unreachable hosts.

Related

gcloud compute ssh connects shows wrong instance name

I'm pretty new to the Gcloud environment, but getting the hang of it.
Though with our first project live on an instance, I've been shuffeling some static IP's, instances and snapshots around for optimal deployment workflow. Though whats going on now, I can't understand;
I have two instances (i.e.) live-1 and dev-2.
Now I can connect to live-1 using gcloud compute ssh live-1 and it's okay.
When I try to connect to dev-2 using gcloud compute ssh dev-2, it logs me in to live-1.
The first time I tried to ssh to dev-2 it took longer than usual. After that it just connects me to the wrong instance immediately.
The goal was (as you might've guessed) to copy the live environment to a testing one. I did create an image of live-1, and cloned it to setup dev-2 with it. But in my earlier experience trying this, this was possible and worked as expected.
Whenever I use the Compute Console in the browser and use the online SSH tool from the instance list, it does connect to dev-2 properly. But on my local machine, using aformentioned command, connects me to live-1.
I already removed the IP for dev-2 from my known hosts, figuring it's cached somewhere, but no luck. What am I missing here?
Edit: I found out just now that the instances are separated though 'named' the same; if I login to dev-2, I do see myuser#live-1: in the shell, but it appears it is running a separate instance. I created a dummy file on the supposed dev-2, and it doesn't show up at the actual live-1 machine.
So this is very confusing; I rely on the 'user-tag' thing in front of every shell line to know where and what I'm actually working on; having two instances with the same name but different environments is confusing.
Ok, it was dead simple. Just run sudo hostname [desiredhostname] in the terminal, and restart it.
So in my case I logged in to dev-2 and ran sudo hostname dev-2.

Ansible: to how make Paramiko use ~/.ssh/config?

Ideally, of course, I'd like Ansible to completely take care of this.
If this is not possible (why?!), then, at least, I want to be able to extract ~/.ssh/config contents into some other format and then make Ansible feed this to Paramiko. I am sure I'm not the first one faced with this task, so what's the accepted way of doing this?
I need this in order to use authorized_keys module to turn on passwordless authentication.
Btw, I wish Ansible emitted some warning when falling back to non-default backend (like Paramiko). I lost a couple of hours yesterday and actually had to download Ansible sources to figure out why perfectly running Ansible command suddenly stopped running when adding -k / --ask-pass option (yes, I am completely new to Ansible).
You can define this configuration in the Ansible configuration ini file or environment variables -- specifically the section for ANSIBLE_SSH_ARGS.

SSH over two hops

I have to upload, compile and run some code on a remote system. It turned out, that the following mechanism works fine:
rsync -avz /my/code me#the-remote-host.xyz:/my/code
ssh me#the-remote-host.xyz 'cd /my/code; make; ./my_program'
While it's maybe not the best looking solution, it has the advantage that it's completely self-contained.
Now, the problem is: I need to do the same thing on another remote system which is not directly accessible from the outside by ssh, but via a proxy node. On that system, if I just want to execute a plain ssh command, I need to do the following:
[my local computer]$ ssh me#the-login-node.xyz
[the login node]$ ssh me#the-actual-system.xyz
[the actual system]$ make
How do I need to modify the above script in order to "tunnel" rsync and ssh via the-login-node to the-actual-system? I would also prefer a solution that is completely contained in the script.

use other command instead of ssh for ansible

I have an ansible configuration which I know works on my local machines. However, I'm trying to now set it up on my company's machines which use a wrapper command similar to ssh (let's call it 'myssh')
for example, to access these machines, instead of writing
ssh myuser#123.123.123.123
you write
myssh myuser#123.123.123.123
which ends up calling ssh, among other things.
My question is, is there a way to swap which command ansible uses for accessing machines?
You can create a Connection Type Plugin to archive this. Looking at the ssh plugin, it appears like it might be as easy as replacing the ssh_cmd in line 333. Also specify myssh in line 69.
See here where to place the modified file. Additionally to that information, you can specify a custom location and let Ansible know about it in connection_plugins setting in ansible.cfg.
Finally again in your ansible.cfg set the transport setting to your new plugin:
transport = myssh
PS: I have never done anything like that before. This is only info from the docs.

Calling SSH command from Jenkins

Jenkins keeps using the default "jenkins" user when executing builds. My build requires a number of SSH calls. However these SSH calls fails with Host verification exceptions because i haven't been able connect place the public key for this user on the target server.
I don't know where the default "jenkins" user is configured and therefore cant generate the required public key to place on the target server.
Any suggestions for either;
A way to force Jenkins to use a user i define
A way to enable SSH for the default Jenkins user
Fetch the password for the default 'jenkins' user
Ideally I would like to be able do both both any help greatly appreciated.
Solution: I was able access the default Jenkins user with an SSH request from the target server. Once i was logged in as the jenkins user i was able generate the public/private RSA keys which then allowed for password free access between servers
Because when having numerous slave machine it could be hard to anticipate on which of them build will be executed, rather then explicitly calling ssh I highly suggest using existing Jenkins plug-ins for SSH executing a remote commands:
Publish Over SSH - execute SSH commands or transfer files over SCP/SFTP.
SSH - execute SSH commands.
The default 'jenkins' user is the system user running your jenkins instance (master or slave). Depending on your installation this user can have been generated either by the install scripts (deb/rpm/pkg etc), or manually by your administrator. It may or may not be called 'jenkins'.
To find out under what user your jenkins instance is running, open the http://$JENKINS_SERVER/systemInfo, available from your Manage Jenkins menu.
There you will find your user.home and user.name. E.g. in my case on a Mac OS X master:
user.home /Users/Shared/Jenkins/Home/
user.name jenkins
Once you have that information you will need to log onto that jenkins server as the user running jenkins and ssh into those remote servers to accept the ssh fingerprints.
An alternative (that I've never tried) would be to use a custom jenkins job to accept those fingerprints by for example running the following command in a SSH build task:
ssh -o "StrictHostKeyChecking no" your_remote_server
This last tip is of course completely unacceptable from a pure security point of view :)
So one might make a "job" which writes the host keys as a constant, like:
echo "....." > ~/.ssh/known_hosts
just fill the dots from ssh-keyscan -t rsa {ip}, after you verify it.
That's correct, pipeline jobs will normally use the user jenkins, which means that SSH access needs to be given for this account for it work in the pipeline jobs. People have all sorts of complex build environments so it seems like a fair requirement.
As stated in one of the answers, each individual configuration could be different, so check under "System Information" or similar, in "Manage Jenkins" on the web UI. There should be a user.home and a user.name for the home directory and the username respectively. On my CentOS installation these are "/var/lib/jenkins/" and "jenkins".
The first thing to do is to get a shell access as user jenkins in our case. Because this is an auto-generated service account, the shell is not enabled by default. Assuming you can log in as root or preferably some other user (in which case you'll need to prepend sudo) switch to jenkins as follows:
su -s /bin/bash jenkins
Now you can verify that it's really jenkins and that you entered the right home directory:
whoami
echo $HOME
If these don't match what you see in the configuration, do not proceed.
All is good so far, let's check what keys we already have:
ls -lah ~/.ssh
There may only be keys created with the hostname. See if you can use them:
ssh-copy-id user#host_ip_address
If there's an error, you may need to generate new keys:
ssh-keygen
Accept the default values, and no passphrase, if it prompts you to add the new keys to the home directory, without overwriting existing ones. Now you can run ssh-copy-id again.
It's a good idea to test it with something like
ssh user#host_ip_address ls
If it works, so should ssh, scp, rsync etc. in the Jenkins jobs. Otherwise, check the console output to see the error messages and try those exact commands on the shell as done above.