I would like to understand the logs of a RabbitMQ error. Can anyone help me decipher the following error message:
The interesting thing is that in my case, I have a server running with RabbitMQ 3.8.8 running and erlang 23 behind it, but suddenly after a long period running this crash occurred.
Maybe it's related to port 25672 connectivity?
The error repeated for a few hours until we noticed and then the whole server was restarted, this fact made the rabbitmq run again and everything went well.
So the question is, what does this error mean, that port 25672 doesn't respond?
Related
Been using a GCP preemptible VM for a few months without problems, but in the last 4 weeks my instances have consistently shut off anywhere from 10 minutes to 20 minutes into operation.
I'll be in the middle of training, and my notebook will suddenly disconnect. The terminal will show this error:
jupyter#fastai-instance:~$ Connection to 104.154.142.171 closed by remote host.
Connection to 104.154.142.171 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I then check the status of my VM, to see that it has shutdown.
I searched the terminal traceback and found this thread, which seemed promising: ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]
When I ran sudo gcloud compute config-ssh my VM ran for much longer than usual before shutting down, yet shutdown in the same way after about an hour. Since then, back to the same behavior.
I know preemptible instances can be shutdown when the platform needs resources, but my understanding is that comes with some kind of warning. I've checked the status of GCP's servers after shutdowns and they appear to be fine. This is also happening the same way every time I turn my VM on, which seems too frequent for preempting.
I am not sure where to look for any clues – has anyone else had a problem like this? What's especially puzzling to me is, if it is in fact an SSH problem, why would that cause the VM itself to shutdown, rather than just break the connection?
Thanks very much for any help!
Did you try to set a shutdown script and to print something in a file for validating the state of the VM when it goes down ?
Try this as shutdown script
#!/bin/bash
curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted" -H "Metadata-Flavor: Google" > /tmp/preempted.log
If there is TRUE in the file, it's because the VM has been preempted.
If a VM stops and you have an active SSH connection to that VM (via gcloud compute ssh), then it's normal that you are receiving an error. Since the VM goes down, all connections are closed, so does your SSH connection (you cannot connect to a stopped instance). The VM termination causes the SSH error, not the opposite.
When using preemptible instances, Google can reclaim the instance whenever it's needed. Note that (from the docs about preemptible instances limitations) :
Compute Engine might terminate preemptible instances at any time due to system events. The probability that Compute Engine will terminate a preemptible instance for a system event is generally low, but might vary from day to day and from zone to zone depending on current conditions.
It means that one day, your instance may be running for 24 hours without being terminated, but an other day, your instance may be stopped 30 minutes after being started if Compute Engine needs to reclaim some resources.
A comment on the "continuously shutting down" part:
(I have experienced this as well)
Keep in mind that Google prefers to shut down RECENTLY STARTED preemptible instances, over ones started earlier.
The link below (and supplied earlier) has the statement:
Generally, Compute Engine avoids preempting too many instances from a single customer and preempts new instances over older instances whenever possible.
This would generally mean that, yes, I suppose, if you are preempted, and boot up again, it is quite likely that you are going to be preempted again and again until the load in the zone reduces.
I'm surprised that Google don't simply preclude you starting the preemptible VM for a while (like 30-60 minutes?). - How much CPU is being wasted bouncing VMs up and down and crossing our fingers???
P.S. There is a dirty trick to end-around your frustration - Have 2 VMs identically configured, except for preemptibility, but only 1 underlying book disk. If you are having a bad day with preempts, simply 'move' the boot disk to the non-preemptible VM, boot it, and carry on. - It's a couple of simple gcloud commands to achieve this, easily scripted and very fast. Don't tell Google I told ya....
https://cloud.google.com/compute/docs/instances/preemptible#limitations
I have some Rabbitmq server issues installed in Centos 7 when starting and stopping it. The server works well when connected to internet. It only takes 1 or 2 seconds to start it. But when disconnected to internet, it takes about 3~5 minutes to either start or stop it. There seems nothing wrong on the access log.
I also checked what to set up in the configuration file. But I couldn't find any settings about starting server. Can you guys give some help for me?
I am trying to start zookeeper on a remote virtual machine. I use this for my project regularly and I do not have any problems while starting the zookeeper. But lately when I am trying to start the server I am getting an error.
When I give ./zkServer.sh start it shows zookeeper server started.
When I check for status using ./zkServer.sh status it shows "Error contacting service. It is probably not running."
I am totally working with 5 Virtual Machines. All these machines were fine initially. I started getting problems with machine 1. But recently I have the same problem with all my virtual machines. Can someone tell me what the issue is and suggest me a way to clear this issue?
Most probably Zookeeper server exited.
If we are running it on a Linux box, use the linux commands. Some of them:
ps -ef | grep -i zookeeper
jps
etc.
Also, try running it in foreground
zkServer.sh start-foreground
In My case the issue was $PATH issue...
You will get what was the issue by running zookeeper in foreground
zkServer.sh start-foreground
I encountered same problem,too. In my case problem is about zookeeper locations configuration is not same for each node so zookeeper can not provide Quorum and mentioned nodes can not be part of cluster.
Please be sure server definition for each node is same.
For example for all nodes, server definition must be same as below
server.0=ip0:2888:3888
server.1=ip1:2888:3888
server.2=ip2:2888:3888
server.3=ip3:2888:3888
server.4=ip4:2888:3888
In my case the issue was some how ClientPort attribute's value was missed in one of the box so in console it was showing as invalid config path.With the help of command 'zkServer.sh start-foreground' investigated and found root cause.
I have 2 servers A and B running a glassfish 3.1.2.2 application server on them. Both use a JMS queue for communication, which works fine so far. If the network connection breaks for any reason, I can see in the logs of server B (the one configured to connect to the remote queue of A) that it tries to reconnect and is actually always successful in doing so as soon as A is up again.
But the problem is, that if I try to restart the glassfish instance on B while server A is unreachable, the startup process will fail after some retries and remains stuck in a kind of undefined/unusable state, i.e. the java process is started, some ports are open but the applications are not started - not even the administration console.
IMHO glassfish startup process should not wait for the queues to connect, this should be done in some kind of background process.
Has anyone of you experienced something similar? Is there anything I can configure/tune to fix this behaviour?
Never mind, it seems to have fixed itself :(
After restarting the computer,removing the deployed ear and deploying it again it just worked. I haven't experienced this behaviour since then.
We are occasionally seeing the error "MSDTC encountered an error (HR=0x80000171) while attempting to establish a secure connection with system [servername]" on our servers. We are using NServiceBus.
After we reboot the server, we are ok, things work perfectly. However, after an undetermined amount of time, this error randomly happens and the services stop working and everything comes to a halt.
Has anyone encountered this before?