Can not connect by ssh between nodes - ssh

I have 5 Proxmox nodes with debian, we had some power problems and all nodes went down. The nodes at this time cannot ssh between them... From my computer I can access all of them but from one node to another takes too long and appears disconnected...
Any one had this problem?
Tks.

Related

Redis cluster can not auto failed over

I setup my Redis cluster (version 6.2) with 3 master nodes and 3 slave nodes. It works well for normal scenario.
However, if I kill one of the master nodes, even if I wait a very long time, the auto-failover does not happen. I use the "cluster nodes" command, the output tells me that the killed node is marked as "master, failed", and all 3 slave nodes are still as "slave". From the log, I also can not see any useful information.
My cluster config, except below 2, all are used default:
cluster-node-timeout 5000
cluster-require-full-covearage no
So may I know who has an idea how to check what is wrong, that is very appreciated!

Raspberry (PiHole): 500 Internal Server Error

It happened already twice today, I try to reach the dashboard and I get "500 Internal Server Error"; I can ping the raspberry but SSH does not work (connection closed by peer)
A reboot will fix the problem
Any ideas?
There is a ton of situations here, you probably need some level of monitoring on your system. If it happens "a lot" then I'd look at a partition like your /var partition, filling up and the OS not being able to write to it.
With the reboot, typically your /tmp/ and /var partitions get cleaned out allowing you to administer the machine.
So, tl;dr? The best thing would be to set up some type of monitoring on your raspberry pi and watch the graphs. If you have no idea where to start, datadog will help you get off the ground.

Restarting managed servers by clusters without outage

I want to write script for restarting weblogics managed servers, which would do the following:
It would contain loop ,which would restart first nodes of all clusters at one time.
a.)FORCE_SHUTDOWN
b.)wait for status: SHUTDOWN
c.)START managed servers
d.)wait for status: RUNNING
e.)move to next node of each cluster and repeat until all managed servers are restarted.
So in first iteration it would restart all first nodes of each cluster, in second iteration it would restart the second nodes of each cluster and repeat this action until all managed servers are restarted.
I have not started to writing the script yet, I am newbie with weblogic and this is just concept. Do you have any suggestions how to achieve that goal?
Why reinvent the wheel?
rollingRestart
Category: Control Commands
Use with WLST: Online
Description Initiates a rolling restart of all servers in a domain or all servers in a specific cluster or clusters without interrupting
the service. This command provides the ability to sequentially restart
servers.
This operation involves the graceful shutdown of the servers, and the
servers being restarted without interrupting the service for the user.
Syntax
rollingRestart(target, [options])

DC/OS has three roles, they are master, slave, slave_public, why can't put them on one host?

I just investigate DC/OS, I find that DC/OS has three roles:master, slave, slave_public, I want to deploy a cluster which can host master, slave or slave_public roles on one host, but currently I can't do that.
I want to know that why can't put them on one host when designed. If I do that, could I get some suggestions?
I just have the idea. If I can't do, I'll quit using DCOS, I'll use mesos and marathon.
Is there someone has the idea with me? I look forward to the reply.
This is by design, and things are actually being worked on to re-enforce that an machine is installed with only one role because things break with more than one.
If you're trying to demo / experiment with DC/OS and you only have one machine, you can use Virtual Machines or Docker to partition that one machine into multiple machines / parts which you can install DC/OS on. dcos-vagrant and dcos-docker can help you there.
As far as installing though, the configuration for each of the three roles is incompatible with one another. The "master" role causes a whole bunch of pieces of software to be started / installed on a host (Mesos-DNS, Mesos master, marathon, exhibitor, zookeeper, 3dt, adminrouter, rexray, spartan, navstar among others) which listen on various ports. The "slave" role causes a machine to have a mesos-agent (mesos renamed mesos-slave to mesos-agent, hence the disconnect) configured and started on the agent. The mesos-agent is configured to control / most ports greater than 1024 to tasks which are launched by mesos frameworks on the agent. Several of those ports are used by services which are run on masters, resulting in odd conflicts and hard to fix bad behavior.
In the case of running the "slave" and "slave_public" on the same host, those two conflict more directly, because both of them cause mesos-agent to be run on the host, with slightly different configuration. Both the mesos-agent (the one configured with the "slave" role and the one with the "slave_public" role are configured to listen on port 5051. Only one of them can use it though, so you end up with one of the agents being non-functional.
DC/OS only supports running a node as either a master or an agent(slave). You are correct that Mesos does not have this limitation. But DC/OS is more than just a Mesos/Marathon. To enable all the additional features of DC/OS there are various components built around Mesos and Marathon. At times these components behave differently whether they are running on a master or an agent and at other times the components that exist on a master may or may not exist on an agent or vice versa. So running a master and an agent on the same node would lead to conflicts/issues.
If you are looking to run a small development setup before scaling the solution out to a bigger distributed system DC/OS Vagrant might be a good starting point.

Enterprise Jenkins HA plugin not working as it should

I've been trying to setup Enterprise Jenkins with the High Availabilty setup. The current setup consists of two jenkins masters sharing the same jenkins home, say master1 and master2, an installation of the jenkins-ha-monitor-1.1-1.1 rpm on both these masters, say monitor1 and monitor2. With this setup, according to the documentation atleast, the HA plugin should work as expected. Promotion and demotion scripts are similar to the ones in the documentation (only the ip and interface is different, same approach). i.e
For demotion
ifconfig eth0:2 down
For promotion
ifconfig eth0:2 the.floating.ip
Now for the nodes to get registered correctly I have to start master1, master2, monitor1 and monitor2 in that order. Tailing the logs for both I see that when the services are started in that order they are registered correctly by both monitor services as nodes in a cluster, and in the HA status gui in the jenkins console.
Now when master1 is killed by sending it a KILL signal monitor2 recognizes this and runs the promotion script. But monitor one keeps throwing :
Oct 24, 2012 3:47:36 PM
com.cloudbees.jenkins.ha.singleton.HASingleton$3 suspect INFO:
Suspecting a node failure in a cluster: jenkins-master-1-285 Oct 24,
2012 3:47:39 PM com.cloudbees.jenkins.ha.singleton.HASingleton$3
suspect INFO: Suspecting a node failure in a cluster:
jenkins-master-1-285
continuously without ever runnign the demotion script. Now since master2 has taken up the floating ip via its promotion script, and master1 still has that ip because demotion script is not run the setup ends up with two boxes claiming the same ip. Moreover restarting master1 does not do anything, i.e master1 does not get added to the cluster as a seconday node, monitor1 still keeps spitting the above messages to log, the floating ip keeps returning "Unable to connect" and master2 and monitor2 show the cluster as master2,monitor2 and monitor1. So my question/problem is twofold - why isnt master1 accepted back into the cluster? And why isn't the demotion script run as it should?
Also FYI i have tried to do a
service jenkins stop
and in that case the demotion script runs but again there are similar issues when
service jenkins start
is run on the master that was stopped earlier since the promotion script is run regardless of whether a primary jenkins exists. And in this case the two monitors register different clusters like so monitor1 : master1,monitor1 and monitor2 : master2,monitor2.
Running an ifconfig shows that both masters have taken up the floating ip at this point.
Any help is appreciated! Thanks!
Still under investigation with support. The originally reported problem (here) suggests that the two nodes are communicating fine, but promotions/demotions are not run correctly—either a bug in JGroups or in its usage in Jenkins high availability.
But further tests turned up problems with UDP multicast communication, which has been reported for RedHat/CentOS hosts. Work is underway to offer an alternate JGroups stack which does not rely on multicast (or UDP) at all, using the shared $JENKINS_HOME directory to register Jenkins and monitor instances (as TCP address:port records).