Ansible Failing to Connect to Newly Elected VIP - ssh

I am trying to run an upgrade through a virtual ip (VIP). Behind the VIP, are 3 hardware hosts, where one host is "Leader."
Host 1
VIP---->Host 2 {Leader}
Host 3
So in this example, if I connect to the VIP, I actually connect to host 2.
The problem is, when I loop through my routine, and the loop hits the Leader and puts that host in "maintenance mode," ansible fails to continue. Note I use "async: 100" and "poll: 0" to put the host in maintenace mode, so Ansible doesnt "stick around" and moves on.
Ive tried this after issuing the host maintenance command:
- name: Reset the connection in case a new leader was chosen during the host maintenance command
meta: reset_connection
and this (found on a bug report regarding reset_connection on github):
- name: Reset ssh connection
shell: |
sleep 1; pkill -u {{ ansible_ssh_user }} sshd
async: 5
poll: 1
After I put the Leader into maintenance mode, and I use Putty to connect to the VIP, it connects me to the new Leader. So my command to put the host Leader into maintenance mode works. But if I try the above to force Ansible to reconnect to the VIP (after a new Leader is elected), it keeps trying to connect to the old Leader IP.
Does anyone have any workarounds to try and force Ansible to truly reconnect? Is there a "more complete" version of killing the SSH session, since I though that would work.
Any suggestions are welcome!
Thank you!

Related

Ubuntu Jump Host in Open Telekom Cloud not working as expected

Currently, I have built a small datacenter environment in OTC with Terraform. based on Ubuntu 20.04 images.
The idea is to have a jump host in the setup phase and for operational purposes that allows spontaneous access to service frontends via ssh proxy jumps without permanently routing them to the public net.
Basic setup works fine so far - I can access the jump host with ssh, and can access the internal machines from there with ssh when I put the private key onto the jump host. So, cloudwise the security seems to be fine. Key pair is generated with ed25519, I use the same key for jump host and internal servers (for now).
What I cannot achieve is the proxy jump as a chained command from my outside machine.
On the jump host, I set AllowTcpForwarding to "yes" in /etc/ssh/sshd_config and restarted ssh and sshd services.
My current local ssh config looks like this:
Host otc
User ubuntu
Hostname <FloatingIP-Address>
Port 22
StrictHostKeyChecking=no
UserKnownHostsFile=/dev/null
IdentityFile= ~/.ssh/ssh_access
ControlPath ~/.ssh/cm-%r#%h:%p
ControlMaster auto
ControlPersist 10m
Host 10.*
User ubuntu
Port 22
IdentityFile=~/.ssh/ssh_access
ProxyJump otc
StrictHostKeyChecking=no
UserKnownHostsFile=/dev/null
I can use this to ssh otc to the jump host.
What I would expect is that I could use e.g. ssh 10.0.0.56 to reach an internal host without further ado. As well I should be able to use commands like ssh -L 8080:10.0.0.56:8080 10.0.0.56 -N to map an internal server's port to a localhost port on my external machine. This is how I managed that successfully on other hosting scenarios in the public cloud.
All I get is:
Stdio forwarding request failed: Session open refused by peer
kex_exchange_identification: Connection closed by remote host
Journal on the Jump host says:
Jul 30 07:19:04 dev-nc-o-bastion sshd[2176]: refused local port forward: originator 127.0.0.1 port 65535, target 10.0.0.56 port 22
What I checked as well:
ufw is off on the Jump Host.
replaced ProxyJump configuration with ProxyCommand
So I am at the end of my knowledge. Has anyone a hint what else could be the reason? Any help welcome!
Ok, cause is found (but not yet fully explained).
My local ssh setting was allowing multiplexed forwards (ControlMaster auto ) which caused the creation of a unix socket file for the Controlpath in ~/.ssh.
I had to login to the jump host to AllowTcpForwarding in the first place.
After rebooting the sshd, I returned to the local machine and the failure occured when trying to forward to the remote internal machine.
After deleting the socket file in ~/.ssh, the connection can now be established as needed. Obviously, the persistent tunnel was not impacted by the restarted daemon on the jump host and simply refused to follow the new directive.
This cost me two days. On the bright side, I learned a lot about ssh :o

Connecting erlang observer to remote machine via public IP

Background
I have a machine in production running an elixir application (no access to iex, only to erl) and I am tasked with running an analysis on why we are consuming so much CPU. The idea here would be to launch observer, check the processes tab and see the processes with the most reductions.
How am I connecting?
To connect I am following a tutorial from a blog:
https://sgeos.github.io/elixir/erlang/observer/2016/09/16/elixir_erlang_running_otp_observer_remotely.html 1
Their instructions are as follows:
launch the app in the production machine with a cookie and a name
from local run: ssh user#public_ip "epmd -names" to get the name of the app and the port used
from local create a ssh tunnel to the remote machine: ssh -L 4369:user#public_ip:4369 -L 42877:user#public_ip:42877 user#public_ip (4369 is the epmd port by default, 42877 is the port of the app)
from local connect to the remote machine using the node's name: erl -name "user#app_name" -setcookie "mah_cookie" -hidden -run observer
Problem
And now in theory I should be able to use observer on the machine. Instead however I am greeted with the following error:
Protocol ‘inet_tcp’: register/listen error: epmd_close
So, after scouring the dark side of internet, I decided to use sudo journalctl -f to check all the logs of the machine and I found this:
channel 3: open failed: administratively prohibited: open failed
my_app_name sshd[8917]: error: connect_to flame#99.999.99.999: unknown host (Name or service not known)
/scripts/watchdog.sh")
my_app_name CRON[9985]: pam_unix(cron:session): session closed for user flame
Where:
erlang -name: my_app_name
machine user: flame
machine public ip: 99.999.99.999 (obviously not real)
so it tells me, unknown host ?? I am confused since 99.999.99.999 is the public IP of the machine itself!
Questions
What am I doing wrong?
I read that in older versions of erlang I can’t monitor a machine with observer if they are in different networks (which is the case, because I want to monitor this machine from my localhost) but I didn’t find any information regarding this in modern days.
If this is in fact impossible, what alternatives do I have?
Solution
After 3 days of non-stop searching, I finally found something that works.
To summarize I am putting it here everything I did.
All steps in local machine:
get the ports from the remote server:
> ssh remote-user#remote-ip "epmd -names"
epmd: up and running on port 4369 with data:
name super_duper_app at port 43175
create a ssh tunel with the ports:
ssh remote-user#remote-ip -L4369:localhost:4369 -L43175:localhost:43175
On another terminal in your local machine, run a iex terminal with the cookie the app in your remote server is using. Then connect to it and start observer:
iex --name observer#127.0.0.1 --cookie super_duper_cookie
Node.connect :"super_duper_app#127.0.0.1"
> true
:observer.start
With observer started, select the machine from the Nodes menu.
Possible setbacks
If you have tried this and it didn't work there are a few things you can check for:
Check if the EPMD port on your local machine is free, if not, kill the process using it and free it.
Check your ssh tunneling keys and configurations for permissions. As #Roberto Aloi pointed out this link can be useful: https://unix.stackexchange.com/questions/14160/ssh-tunneling-error-channel-1-open-failed-administratively-prohibited-open

Google Cloud VM instance SSH connection ~60 seconds timeout with 30 second keepalive

I've been connecting to a Google Cloud VM instance via gcloud ssh from my macOS:
$ gcloud compute ssh [username]#[instance]
Starting from a week ago, the connection will just drop after ~60 seconds of idle connection and returns:
Connection to [my_external_ip] closed by remote host.
Connection to [my_external_ip] closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I configured the TCP keepalive time to 30 seconds on both my macbook and the VM. But that did not solve the problem.
Any idea how do I extend the connection duration?
This is unlikely an issue with your timeout setting, but more likely an issue with your firewall rules or routes.
Firstly I would suggest checking your firewall rules and ensure you have an ingress firewall rule opening port 22. If you have, check the configuration of this rule, in particular:
Check the IP range in 'Source filters'. Does the range include the IP address of your home computer? For testing purposes, to ensure it does, you could temporarily set this to 0.0.0.0/0 to include all IP addresses.
Check the 'Targets' drop-down. Is this set to apply to 'All instances in the network' or is it set to 'Specified target tags'? If you have set it to 'Specified target tags', make sure that the same tag is added to the 'Network tags' section of the instance, otherwise the firewall rule will not apply to the instance and allow SSH traffic.
Ensure this rule has a higher priority than any other rules that could counteract it (when I say higher priority I mean lower number, for example, a a rule with a priority of 1000 is a higher priority than a rule with a priority of 20000).
If the above doesn't resolve the issue, run the following command to check the routes:
gcloud compute routes list
Ensure there is an entry which contains the following:
default 0.0.0.0/0 default-internet-gateway
EDIT
If you are able to sometimes SSH into the instance but then the connection drops, there may be some useful information in the logs, or the serial console.
You can access the serial console by clicking on the instance name in the GCP Console, then clicking on "Serial port 1 ".
When you SSH into the instance, information about the SSH session populates the serial console output (this can be refreshed by hitting the 'Refresh' at the top of the page.) Information about the session ending also populates the serial console. There may be some useful information/clues about why the session ends in this output.
It might also be worth checking the status of SSH daemon on the instance and giving it a restart to see if that makes a difference:
Check status of sshd:
systemctl status sshd
Restart sshd:
sudo systemctl restart sshd

How to have an idempotent Ansible playbook if we change the SSH port?

My playbook needs to change the ssh port and updates the firewall rules. (unfortunately, I cannot "get" a new server directly with the desired custom port).
Managing the change during the execution is easy.
However I do not know how to have an idempotent playbook.
The first run must be initiated on the default port (22).
The next runs must be initiated on the custom port.
A solution could be done but with performance issues.
Is there any other possibility with Ansible 2.0+?
You could approach this a couple of ways really.
The simplest way might be to simply separate the SSH port configuration into a separate playbook/role that specifies the SSH port as 22 but then your inventory would normally define the SSH port as your custom one.
ssh_port.yml
- hosts: all
vars:
ansible_ssh_port: 22
tasks:
- name: change the default ssh port
lineinfile ...
notify: restart ssh
handlers:
- name: restart ssh
service:
name : sshd
state: restarted
You would then only run this playbook on the creation of the machine and then only re-run your main playbook again and again, sidestepping the idempotency of this step.
Alternatively, as Mikko Ohtamaa pointed out in the comments, you could have Ansible modify your inventory file when you change the port. This will mean that you can run the whole thing end to end idempotently as the next run through will connect on the non default SSH port and then simply (pointlessly obviously) check that the SSH port is still set to the desired one. You can get at the inventory file by using the "magic variable" inventory_file. A rough example might look like this:
- name: change the default ssh port
lineinfile ...
notify: restart ssh
- name: change ssh port used by ansible
set_fact:
ansible_ssh_port: {{ custom_ssh_port }}
- name: change ssh port in inventory
lineinfile:
dest: inventory_file
insert_after: '[all:vars]'
line: 'ansible_ssh_port="{{ custom_ssh_port }}"'
Just make sure you have an inline group variables block for all in the inventory file and this will mean all future runs of any playbook against this inventory will connect to all of the hosts contained inside it on your custom SSH port.
If you use source control then you will also need a local_action task to push the change back to your remote.

ssh server connect to host xxx port 22: Connection timed out on linux-ubuntu [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed last month.
Improve this question
I am trying to connect to remote server via ssh but getting connection timeout.
I ran the following command
ssh testkamer#test.dommainname.com
and getting following result
ssh: connect to host testkamer#test.dommainname.com port 22: Connection timed out
but if try to connect on another remote server then I can login successfully.
So I think there is no problem in ssh and other person try to login with same login and password he can successfully login to server.
Please help me
Thanks.
Here are a couple of things that could be preventing you from connecting to your Linode instance:
DNS problem: if the computer that you're using to connect to your
remote server isn't resolving test.kameronderdehamer.nl properly
then you won't be able to reach your host. Try to connect using the
public IP address assigned to your Linode and see if it works (e.g.
ssh user#123.123.123.123). If you can connect using the public IP
but not using the hostname that would confirm that you're having
some problem with domain name resolution.
Network issues: there
might be some network issues preventing you from establishing a
connection to your server. For example, there may be a misconfigured
router in the path between you and your host, or you may be
experiencing packet loss. While this is not frequent, it has
happenned to me several times with Linode and can be very annoying.
It could be a good idea to check this just in case. You can have a look
at Diagnosing network issues with MTR (from the Linode
library).
That error message means the server to which you are connecting does not reply to SSH connection attempts on port 22. There are three possible reasons for that:
You're not running an SSH server on the machine. You'll need to install it to be able to ssh to it.
You are running an SSH server on that machine, but on a different port. You need to figure out on which port it is running; say it's on port 1234, you then run ssh -p 1234 hostname.
You are running an SSH server on that machine, and it does use the port on which you are trying to connect, but the machine has a firewall that does not allow you to connect to it. You'll need to figure out how to change the firewall, or maybe you need to ssh from a different host to be allowed in.
EDIT: as (correctly) pointed out in the comments, the third is certainly the case; the other two would result in the server sending a TCP "reset" package back upon the client's connection attempt, resulting in a "connection refused" error message, rather than the timeout you're getting. The other two might also be the case, but you need to fix the third first before you can move on.
I got this error and found that I don't have my SSH port (non standard number) whitelisted in config server firewall.
Just adding this here because it worked for me. Without changing any settings (to my knowledge), I was no longer able to access my AWS EC2 instance with: ssh -i /path/to/key/key_name.pem admin#ecx-x-x-xxx-xx.eu-west-2.compute.amazonaws.com
It turned out I needed to add a rule for inbound SSH traffic, as explained here by AWS. For Port range 22, I added 0.0.0.0/0, which allows all IPv4 addresses to access the instance using SSH.
Note that making the instance accessible to all IPv4 addresses is a security risk; it is acceptable for a short time in a test environment, but you'll likely need a longer term solution.
If you are on Public Network, Firewall will block all incoming connections by default. check your firewall settings or use private network to SSL
The possibility could be, the SSH might not be enabled on your server/system.
Check sudo systemctl status ssh is Active or not.
If it's not active, try installing with the help of these commands
sudo apt update
sudo apt install openssh-server
Now try to access the server/system with following command
ssh username#ip_address
This happens because of firewall connection.
Reset your firewall connection from your hosting website.
It will start working.
After connecting to the server again add this to your (ufw) security
sudo ufw allow 22/tcp
There can be many possible reasons for this failure.
Some are listed above. I faced the same issue, it is very hard to find the root cause of the failure.
I will recommend you to check the session timeout for shh from ssh_config file.
Try to increase the session timeout and see if it fails again
My VPN connection was not enabled. I was trying all possible way to open up the Firwall and Ports until I realized, I am working from home and my VPN connection was down.
But yes, Firewall and ssh configurations can be a reason.
Try connecting to a vpn, if possible. That was the reason I was facing problem.
Tip: if you're using an ec2 machine, try rebooting it. This worked for me the other day :)
I had this issue while trying to ssh into a local nextcloud server from my Mac.
I had no issues ssh-ing in once, but if I tried to have more than one concurrent connection, it would hang until it timed out.
Note, I was sshing to my user#public-ip-address.
I realized the second connection only didn't work when I tried to ssh into it when on the same network, ie my home network
Furthermore, when I tried ssh user#server-domain it worked!
The end fix was to use ssh user#server-domain rather than ssh user#public-ip
I have experienced a couple of nasty issues that lead to these errors, and these are different from everyone else's answer here:
Wrong folder access rights. You need to have specific directory permissions on you ssh folders and files.
a. The .ssh directory permissions should be 700 (drwx------).
b. The public key (.pub file) should be 644 (-rw-r--r--).
c. The private key (id_rsa) on the client host, and the authorized_keys file on the server, should be 600 (-rw-------).
Nasty docker network configuration. This just happened to me on an AWS EC2 instance. It turned out that I had a docker network with an ip range that interfered with the ssh access granted by the security group and VPC. The docker network's range was e.g. 192.168.176.0/20 (i.e. a range from 192.168.176.1->192.168.191.254), whereas the security group had a range of 192.168.179.0/24; interfering with the SSH access.
I had this error when trying to SSH into my Raspberry pi from my MBP via bash terminal. My RPI was connected to the network via wifi/wlan0 and this IP had been changed upon restart by my routers DHCP.
Check IP being used to login via SSH is correct. Re-check IP of device being SSH'd into (in my case the RPI), which can be checked using hostname -I
Confirm/amend SSH login credentials on "guest" device (in my case the MBP) and it worked fine in my attempt.
I faced a similar issue. I checked for the below:
if ssh is not installed on your machine, you will have to install it firstly. (You will get a message saying ssh is not recognized as a command).
Port 22 is open or not on the server you are trying to ssh.
If the control of remote server is in your hands and you have permissions, try to disable firewall on it.
Try to ssh again.
If port is not an issue then you would have to check for firewall settings as it is the one that is blocking your connection.
For me too it was a firewall issue between my machine and remote server.I disabled the firewall on the remote server and I was able to make a connection using ssh.
my main machine is windows 10 and I have CEntOS 7 VBox
Search in your main machine for "known_hosts"
usually, known_host location in windows in "user/.ssh/known_host"
open it using notepad and delete the line where your centos vbox ip
then try connect in your terminal
in mac os user you can find known_hosts in "~/.ssh/known_hosts"
Make sure to ask the admin to authorize your device.
On Linux run:
sudo zerotier-cli listnetworks
if it returns status ACCESS DENIED ask the admin to authorize your node. This is mentioned here.
https://discuss.zerotier.com/t/solved-cant-join-network/1919
This issue is also caused if the Dynamic Host Configuration Protocol is not set-up properly.
To solve this first check if your IP Address is configured using
ping ipaddress,
If there is no packet loss and the IP Address is working fine try any other solution. If there is no response and you have 100% packet loss, it means that your IP Address is not working and not configured.
Now configure your IP Address using,
sudo dhclient -v devicename
To check your device you can use the 'ip a' command
For eg. My device was usb0 since I had connected the device through usb
This will configure an IP Address automatically and you can even see which one is configured. You can again check with the 'ip a' command to confirm.
This may be very case specific and work in some cases only but
check to see if you were previously connecting through some VPN software/application.
Try connecting again to the VPN. Worked in my case.
This happened to me after enabling port 22 with "sudo ufw allow ssh". Before that, I was getting a refusal from my machine when entering with ssh from another one. After enabling it, I thought it would work, but instead it showed the message "connection timed out". As I had just installed Ubuntu with the option of installing basic functions alongside, I checked whether I had the openssh-server with the command sudo apt list --installed | grep openssh-server. It turned out that Ubuntu had installed by defect the openssh-client instead. I uninstalled it and installed the openssh-server following the basic commands:
sudo apt-get purge openssh-client
sudo apt update
sudo apt install openssh-server
After that, a simple "sudo ufw allow ssh" worked perfectly and I was finally able to access the machine with an ssh command.
What worked for me was that i went to my security group and reset my IP and it worked
Here are some considerations which i took to resolve a similar issue that I had:
Port 22
IGW (Internet Gateway)
VPC
Scene 1> This is for port 22 not enabled with right configurations. If the port is set to custom or myip, the probable scene is this won't work.
Scene 2> When you delete the internet gateway, the network is created and the instance will be functional too, but the routing from the internet will not work. Hence make sure that if there is a VPC, it has an Internet Gateway attached.
Scene 3> Check the VPC for the subnet associations and routing table entries. This might probably tell you the cause. I found one in this kind of troubleshooting. The route used to land up in a "blackhole" (shows up in the route table section of the console). To fix this I had to check and find out my internet gateway and found the issue with the IGW.
Moral of the story: always trace backward in the network!
In my case I'm on windows, I reset my firewall settings, and it fixed
If you get any error check the basic a version control request with ssh -V and If it is not installed, install it with the sudo apt-get install openssh-server command.
Check your virtual machine ssh connection with sudo service ssh status at console.
Check "Active" rows and if write a inactive(dead) the console write sudo service ssh start
Result: Now you can check your connection with sudo service ssh status command and send ssh connection request.
Reset the firewall and reboot your VPS from your hosting service, it will start working perfectly fine
check whether accidentally you have deleted the default vpc or default subnets ,while creating your own vpc and subnets.
I have done this mistake while creating vpc, hence got this error while connecting via ssh.
alos check whether u have attched IGW to public subnets.
Its not complicated.
First, go disable your firewall(USE YOUR CONTROL PANEL)after you check if your openssh is active.
Disable firewall, then use putty or any alternative to basically disable using this command sudo ufw disable
try now
Update the security group of that instance. Your local IP must have updated. Every time it’s IP flips. You will have to go update the Security group.