RabbitMQ Shovel plugin stuck on "starting" status - rabbitmq

RabbitMQ starts up just fine, but the shovel plugin status is listed as "starting".
I'm using the following rabbitmq.config:
Each broker is running on a separate AWS instance. The remote server is windows 2008 server, the local server is Amazon Linux.
[{rabbitmq_shovel,
[{shovels,
[{scrape_request_shovel,
[{sources, [{broker,"amqp://test_user:test_password#localhost"}]},
{destinations, [{broker, "amqp://test_user:test_password#ec2-###-##-###-###.compute-1.amazonaws.com"}]},
{queue, <<"scp_request">>},
{ack_mode, on_confirm},
{publish_properties, [{delivery_mode, 2}]},
{publish_fields, [{exchange, <<"">>},
{routing_key, <<"scp_request">>}]},
{reconnect_delay, 5}
]}
]
}]
}].
Running the following command:
sudo rabbitmqctl eval 'rabbit_shovel_status:status().'
returns:
[{scrape_request_shovel,starting,{{2012,7,11},{23,38,47}}}]
According to This question, this can result if the users haven't been set up correctly on the two brokers. However, I've double-checked that I've set up the users correctly via rabbitmqctl user_add on both machines -- have even tried it with a different set of users, to be sure.
I also ran an nmap scan of port 5672 on the remote host to verify is was up and running on that port.
UPDATE Problem isn't solved but this does appear to be a result of connection problems with the remote server. I changed "reconnect_delay" to 0 in my config file, to avoid having shovel infinitely re-try the connection. Highly recommend others with this problem do this as well, as it allows you to get error messages out of rabbit_shovel_status. In my case I got the following error:
[{scrape_request_shovel,
{terminated,
{{badmatch,{error,access_refused}},
[{rabbit_shovel_worker,make_conn_and_chan,1},
{rabbit_shovel_worker,handle_cast,2},
{gen_server2,handle_msg,2},
{proc_lib,init_p_do_apply,3}]}},
{{2012,7,12},{0,4,37}}}]

Answering my own question here, in case others encounter this issue. This error (and also a timeout error if you get it, {{badmatch,{error,etimedout}}, ), is almost certainly a communications problem between the two machines, most likely due to port access / firewall settings.
There were a couple of dumb things I was doing here:
1) Was using the wrong DNS for my remote EC2 instance (D'oh! really dumb -- can't tell you how long I spent banging my head against the wall on this one...). Remember that stopping and starting your instance generates a new DNS, if you don't have an elastic IP associated with the instance.
2) My remote instance is a windows server, and I realized you have to open up port 5672 both in windows firewall and in EC2 security groups -- there are two overlapping levels of access controls here, and opening up the port in the EC2 management console isn't sufficient if your machine is windows server on EC2, as you also have to configure the windows server firewall.

Related

Weblogic 10.3.6 managed server fails to start when unsecured listen port is disabled

This server worked not too long ago (I don't have a specific date). We use it for testing and had successfully deployed a few applications. Upon returning to the project I could no longer access the applications chrome saying the site cannot be reached when I netstat -an | grep 'LISTEN'.
I can see the unsecured port but the SSL port is missing in action. I asked the networking team if the ports were being blocked and they said no. I tried to force the application to use the secure port by disabling the unsecured port, restarting the managed server but the it fails to start with this configuration.
Any thoughts? SSL is not really my area of expertise (this is my first exposure). When googling the title I didn't see any results that matched the problem I am having, or at least I did not realize they did...
The server will restart if I enable the unsecured port.
# Gerardo Arroyo, yes this seems to be the issue. I assumed that this server used the same certs as other servers in the test system but it seems I was wrong. I will request a new cert from the networking team. Thank you

Google Compute Engine websocket

I have a Google Compute Engine Instance and have an ASP.NET Core application deployed to it. Within that application, I run
WebSocketServer server = new WebSocketServer("ws://0.0.0.0:2001");
To start a websocket server on port 2001. However, when I try and start a websocket connection to this port (m.y.i.p:2001), it times out. I don't understand why since the VM is tagged with the same network tag for ingress and egress that I created allowing access to all ports. If not the firewall, where else could I investigate?
For anyone else that seems to encounter a similar issue with opening a port on a VM running Windows Server (I was using the 2016 edition), I fixed it by remote desktoping into the machine and disabling its firewall. I had to do this even though I had made Compute Engine firewall exceptions. If anyone wants to clarify, I am assuming it's better to handle all firewall related things in GCP rather than having the internal firewall of the VM itself as well since there is likely to be conflict?

Hyper-V Fails to enable replication between servers with error 0x00002EE2

I am running into an issue trying to enable hyper-v replication on Windows Server 2016. I have tried via HTTP and HTTPS (AD signed certificates) and neither works. The interesting thing is, I already have another VM replicating between the two servers so I know its possible.
The current error is:
[Main Instruction]
Enabling replication failed.
[Expanded Information]
Hyper-V failed to enable replication for virtual machine 'VM2': The operation timed out (0x00002EE2). (Virtual machine ID 134E9F3F-XXXX-XXXX-XXXX-1AC608804212)
However this doesn't make sense as I can ping the server (ping works from both sides) and I can connect to port 80 and 443 from each side (VS1 and VS2) - note they are on different subnets however that shouldn't matter. Also both servers are part of the domain so authentication shouldn't be an issue (I am logged in as a domain admin and have a valid kerberos ticket) and there is nothing in any of the event logs that gives me any sort of clue as to what is wrong.
Anyone have any ideas of what might be wrong?
just had the same problem. This link helped me a lot:
https://social.technet.microsoft.com/wiki/contents/articles/24258.hyper-v-troubleshooting-error-0x00002ee2-while-enabling-replication.aspx
as described it could be a problem with your routing. I was able to solve it by enabling BypassProxyServer
the powershell command should look something like this (Keep in mind that this configuration is powershell only):
Set-VMReplication -vmname "name" -AuthenticationType Kerberos -ReplicaServerName "servername" -ReplicaServerPort "Port" -BypassProxyServer $true

Mesos Failed to connect error to IP:5050

I am new to Mesos and just finished setting up mesos and along with zookeeper on my test server.
Unfortunately I keep getting this error message on my mesos console indicating i am unable to connect to mesos on port 5050 and can't seem to figure out why.
I have included the error in the screen shot below
The mesos log files doesn't point to why the error is showing either.
I resolved the problem by this:
./bin/mesos-master.sh --ip=x.x.x.x --work_dir=/var/lib/mesos --hostname=x.x.x.x
We can avoid this problem by starting mesos-master with following option:
--ip=xx.xx.xx.xx --hostname_lookup=false
I have resolved this problem. Open the web page in Chrome, and open the developer tool, you will see the chrome is accessing the web site with domain, in my case the domain name is "mesosphere", as there is no mesosphere in dns, so the accessing was failed.
I solved the problem by adding the mesosphere in the hosts file, C:/windows/system32/etc/hosts/
If you use the domain name for the Mesos cluster you must set the domain name in windows hosts.
There can be multiple issues here.
Is your mesos-master running and healthy ?
Has leader election process completed, if all is good.
Check if you are able to do
ping leader.mesos
If above ping doesn't work, that means leader has not been elected. First fix that.
I had this problem also. Luckily, I have a running mesos server also. So, I can compare the different between my demo and the running mesos server. I captured the packets between client and server in my demo. I found the explorer didn`t resend fresh request, only some keepalive packets.
but, when I catch the packets in the running mesos server, I found the explorer send get request frequently. like the image
I think, if you run some task or add some agent, maybe it will activate the explore to send request frequently. Then the "Failed to connect" will disappeared.
I was having the same issues and what fixed it for me was the zookeeper configuration. In my case I was using the EC2 public IP Address rather than the private one. Once I changed the /etc/mesos/zk file to zk://<private IP>:2181/mesos I was able to connect without the constant error messages. In other words, zookeeper was reporting to be running in one IP and mesos-master was trying to connect using a different IP.
My configuration was correct as suggested. But failed to start mesos-master service. But There is alternative way to start mesos-master node with exact same configuration. Commands to start mesos-master
$ cd /usr/sbin [or mesos_installation directory/bin]
$sudo ./mesos-master --work_dir=/var/lib/mesos --log_dir=/home/rajeev/logs/mesos/
Its start mesos-master service successfully for me.

Google compute engine - getting blocked after accessing SSH a few times

I have a google compute engine VM, running ubuntu, and utilising Laravel Forge.
I seem to get blocked by the VM after accessing SSH a few times (2-4), even if I'm logging in correctly. Restarting the VM unblocks me.
I first noticed the issue as I was having trouble logging into SSH, after a few attempts it would become unreachable. My website hosted on it also wouldn't resolve. After restarting the vm, I could try log into ssh again and my website works. This happened a couple time before I figured out how to correctly log in with SSH.
Next, trying to log in to the database with HeidiSQL, which uses plink, I log in fine. But it seems to keep reconnecting via SSH every time I do something, and after 2-4 of these reconnects, I get the same problem with the VM being unreachable by SSH and my website hosted on it being down.
Using SQLyog, which seems to maintain the one SSH connection, rather than constantly reconnecting like HeidiSQL, I have no problems.
When my website is down, I use those "down for everyone or just me" websites to see if it is down, and apparently it's just down for me, so I must be getting blocked.
So I guess my questions are:
1. Is this normal?
2. Can I unblock myself without restarting the VM?
3. Can I make blocking occur in a less strict way?
4. Why does HeidiSQL keep reconnecting via SSH rather than maintaining the one connection like SQLyog seems to?
You have encountered sshguard, which is enabled by default on the GCE Ubuntu images (at least on the 14.10 image, where I encountered it myself). There is a whitelist file at /etc/sshguard/whitelist.
The sshguard default configuration on my VM has a "dangerousness" threshold of 40. Most "attacks" that sshguard detects incur dangerousness of 10, so getting blocked after 4 reconnects sounds about right.
The attack signatures are listed here: http://www.sshguard.net/docs/reference/attack-signatures/
I would bet that you are connecting from an IP that has an invalid reverse DNS configuration (I was). Four connects like that and the default config blocks you for 20 minutes.