Spark worker won't bind to master - ssh

Launching my spark worker, I got an error which may be related to the possibility from the slave to contact the master machine. But I am unsure.
6/02/12 23:47:13 INFO Utils: Successfully started service 'sparkWorker' on port 38019.
16/02/12 23:47:13 INFO Worker: Starting Spark worker 192.168.0.38:38019 with 8 cores, 26.5 GB RAM
16/02/12 23:47:13 INFO Worker: Running Spark version 1.6.0
16/02/12 23:47:13 INFO Worker: Spark home: /home/romain/spark-1.6.0-bin-hadoop2.6
16/02/12 23:47:13 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/02/12 23:47:13 INFO WorkerWebUI: Started WorkerWebUI at http://192.168.0.38:8081
16/02/12 23:47:13 INFO Worker: Connecting to master 192.168.0.39:7078...
16/02/12 23:47:13 WARN Worker: Failed to connect to master 192.168.0.39:7078
java.io.IOException: Failed to connect to /192.168.0.39:7078
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.0.39:7078
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
While on the master I see it is up and running :
16/02/12 23:30:30 WARN Utils: Your hostname, pl resolves to a loopback address: 127.0.1.1; using 192.168.0.39 instead (on interface eth0)
16/02/12 23:30:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/12 23:30:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/12 23:30:31 INFO SecurityManager: Changing view acls to: romain
16/02/12 23:30:31 INFO SecurityManager: Changing modify acls to: romain
16/02/12 23:30:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(romain); users with modify permissions: Set(romain)
16/02/12 23:30:31 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/02/12 23:30:31 INFO Utils: Successfully started service 'sparkMaster' on port 7078.
16/02/12 23:30:31 INFO Master: Starting Spark master at spark://pl:7078
16/02/12 23:30:31 INFO Master: Running Spark version 1.6.0
16/02/12 23:30:32 INFO Utils: Successfully started service 'MasterUI' on port 3094.
16/02/12 23:30:32 INFO MasterWebUI: Started MasterWebUI at http://192.168.0.39:3094
16/02/12 23:30:32 WARN Utils: Service could not bind on port 6066. Attempting port 6067.
16/02/12 23:30:32 INFO Utils: Successfully started service on port 6067.
16/02/12 23:30:32 INFO StandaloneRestServer: Started REST server for submitting applications on port 6067
16/02/12 23:30:32 INFO Master: I have been elected leader! New state: ALIVE
Going through blogs and pages it seems it is possible that we would need a secure network (I did install password-less ssh key - but for "romain" user : under which user is spark launch ? the command-line user I guess).
Should I check something on the network ?
From this page :
Spark worker can not connect to Master
I tried :
telnet 192.168.0.39
Trying 192.168.0.39...
telnet: Unable to connect to remote host: Connection refused
But ping works :
romain#wk:~/spark-1.6.0-bin-hadoop2.6$ ping 192.168.0.39
PING 192.168.0.39 (192.168.0.39) 56(84) bytes of data.
64 bytes from 192.168.0.39: icmp_seq=1 ttl=64 time=0.233 ms
64 bytes from 192.168.0.39: icmp_seq=2 ttl=64 time=0.185 ms
^C
--- 192.168.0.39 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.185/0.209/0.233/0.024 ms
and I do have passwordless ssh connectivity :
$ ssh 192.168.0.39
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-49-generic x86_64)
$
What should be done to make connectivity possible ?

By setting SPARK_LOCAL_IP=127.0.0.1 variable, I was able to get my
spark worker working.
you can either define it as local bash ENV variable in ~/.bashrc
you can make a copy of $SPARK_HOME/conf/spark-env.sh.template as 'conf/spark-env.sh' and define it there.
In a cluster environment, you better it as local IP address. Thus you would be able to see worker node UI.

Related

Selenium Grid Node can't register to HUB via VPN

I have got a VPN set up like a VPN server running in Oracle Cloud - Oracle Linux 8; the client is my local VM - Manjaro Linux. The VPN connection works just fine.
My Selenium Hub is running on the same Oracle Cloud instance, Selenium Node is running on the same local Manjaro VM, so they're on the same network, as needed.
Starting Selenium Hub works, but when starting Node it says:
[SelfRegisteringRemote$1.run] - Couldn't register this node: Error sending the registration request: No route to host (Host unreachable)
I started Host like:
java -Djava.net.preferIPv6Stack=false -jar selenium-server-standalone-3.141.59.jar -role hub
It says:
>13:40:52.933 INFO [GridLauncherV3.parse] - Selenium server version: 3.141.59, revision: e82be7d358
>13:40:53.079 INFO [GridLauncherV3.lambda$buildLaunchers$5] - Launching Selenium Grid hub on port 4444
>13:40:53.566:INFO::main: Logging initialized #949ms to org.seleniumhq.jetty9.util.log.StdErrLog
>13:40:53.763 INFO [Hub.start] - Selenium Grid hub is up and running
>13:40:53.766 INFO [Hub.start] - Nodes should register to http://10.9.0.1:4444/grid/register/
>13:40:53.767 INFO [Hub.start] - Clients should connect to http://10.9.0.1:4444/wd/hub
I started Node like:
java -Djava.net.preferIPv6Stack=false -jar selenium-server-standalone-3.141.59.jar -role node -hub http://10.9.0.1:4444
(10.9.0.1 is the VPN given IP of Selenium Host)
and it says:
>16:16:25.675 INFO [GridLauncherV3.parse] - Selenium server version: 3.141.59, revision: e82be7d358
>16:16:25.950 INFO [GridLauncherV3.lambda$buildLaunchers$7] - Launching a Selenium Grid node on port 31862
>2021-07-15 16:16:26.077:INFO::main: Logging initialized #754ms to org.seleniumhq.jetty9.util.log.StdErrLog
>16:16:26.368 INFO [WebDriverServlet.<init>] - Initialising WebDriverServlet
>16:16:26.520 INFO [SeleniumServer.boot] - Selenium Server is up and running on port 31862
>16:16:26.521 INFO [GridLauncherV3.lambda$buildLaunchers$7] - Selenium Grid node is up and ready to register to the hub
>16:16:26.629 INFO [SelfRegisteringRemote$1.run] - Starting auto registration thread. Will try to register every 5000 ms.
>16:16:27.092 WARN [SelfRegisteringRemote.registerToHub] - Error getting the parameters from the hub. The node may end up with wrong timeouts.No route to host (Host unreachable)
>16:16:27.102 INFO [SelfRegisteringRemote.registerToHub] - Registering the node to the hub: http://10.9.0.1:4444/grid/register
>16:16:27.266 INFO [SelfRegisteringRemote$1.run] - Couldn't register this node: Error sending the registration request: No route to host (Host unreachable)
Since VPN works fine, so the 2 machines are on the same network, as needed for Selenium Grid, I have no clue what can be wrong, especially after so many hours of Googling, even here on Stack Overflow.
Any suggestions?

Coldfusion 2018 on Centos 7 failing to setup Apache connector

I've installing CF2018 on a new server, which is installed and running, I can see it if I run ps aux | ack -i coldfusion
$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
$ httpd -v
Server version: Apache/2.4.6 (CentOS)
Server built: Jul 29 2019 17:18:49
UPDATE
I had clearly broken something so I've removed earlier errors, but I'm still getting issues with the connector.
I have removed all references and files relating to mod_jk from /etc/httpd/conf, reinstalled CF then re-ran the connector.
It's installed successfully with this command:
$ sudo ./wsconfig -ws Apache -dir /etc/httpd/conf
I have the dir at /opt/coldfusion2018/config/wsconfig/1 setup but I'm now getting these errors:
$ pwd
/opt/coldfusion2018/config/wsconfig/1
$ tail mod_jk.log
[error] ajp_service::jk_ajp_common.c (3000): (cfusion) connecting to tomcat failed (rc=-3, errors=583, client_errors=0).
[info] jk_open_socket::jk_connect.c (816): connect to ::1:8018 failed (errno=13)
[info] ajp_connect_to_endpoint::jk_ajp_common.c (1140): (cfusion) Failed opening socket to (::1:8018) (errno=13)
[error] ajp_send_request::jk_ajp_common.c (1811): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=13)
[info] ajp_service::jk_ajp_common.c (2979): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[info] jk_open_socket::jk_connect.c (816): connect to ::1:8018 failed (errno=13)
[info] ajp_connect_to_endpoint::jk_ajp_common.c (1140): (cfusion) Failed opening socket to (::1:8018) (errno=13)
[error] ajp_send_request::jk_ajp_common.c (1811): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=13)
[info] ajp_service::jk_ajp_common.c (2979): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
[error] ajp_service::jk_ajp_common.c (3000): (cfusion) connecting to tomcat failed (rc=-3, errors=584, client_errors=0).
I have no idea where port 8018 has come from, I thought tomcat used 8500 by default
UPDATE 2
If I visit my site with :8500 on the end I can get into the CFIDE, so it's CF is running and that port is accessible
UPDATE 3
I've found this in my server.xml file, tried setting the port to both 8009 and 8018 and it seems to make no difference to the errors in the mod_jk.log
<!-- Define an AJP 1.3 Connector on port 8009 -->
<!-- begin connector -->
<Connector port="8009" packetSize="65535" protocol="AJP/1.3" redirectPort="8451" tomcatAuthentication="false" maxThreads="500" connectionTimeout="60000"/>
<!-- end connector -->
Pete,
What's the OS and the webserver's version?
Did you try passing the parameters other than dir, explicitly, like so:
sudo ./wsconfig -ws Apache /opt/apache2/conf -bin /opt/apache2/bin/httpd -script /opt/apache2/bin/apachectl -dir -v
..and the coldfusion process need not be running for the connector to be configured.
8018 is the default AJP port that the conector uses to talk to tomcat. 8500 is the default HTTP port that you'd use when you access the CF admin console.
You initially reported error when configuring the connector. Is that resolved.
Did you check the wsconfig log to see if there were errors configuring the connector.
The modjk log excrepts you've shared more recently simply indicate that CF is not running, or at the least, not listening on the default AJP port.
The problem was SELinux blocking port 8018, I actually asked my hosting provider Secura to look into this for me and they fixed it (based on all the information I'd found from piyush's answer)
I had to allow port 8018 in SELinux
semanage port -a -t http_port_t -p tcp 8018

Spring messaging : Can't connect to remote rabbitmq on GCP

This spring guide on messaging with rabbitmq does not talk about the host port configurations. I followed the same and added these properties to application.properties to connect to rabbitmq broker installed on GCP
spring:
rabbitmq:
host: XXX.XXX.XXX.XX
port: 5672
username: user
password: bitnami
virtual-host: /
While running the app I am getting timeout exception while connecting to rabbitmq
2017-08-06 17:16:54.322 ERROR 7280 --- [ container-1] o.s.a.r.l.SimpleMessageListenerContainer : Failed to check/redeclare auto-delete queue(s).
org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out: connect
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:62) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:367) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:565) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1430) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1411) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1387) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.core.RabbitAdmin.getQueueProperties(RabbitAdmin.java:336) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.redeclareElementsIfNecessary(SimpleMessageListenerContainer.java:1136) ~[spring-rabbit-1.7.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1387) [spring-rabbit-1.7.2.RELEASE.jar:na]
Tried the following but still same error:
Opened up tcp:5672 through GCP firewall configuration
Changed the rabbitmq config at /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.config to change the allowed ips from localhost (127.0.0.1) to 0.0.0.0
{
rabbit,
[{tcp_listeners, [{"0.0.0.0", 5672}, {"::", 5672}]},
{default_vhost, <<"/">>},
{default_user, <<"user">>},
{default_pass, <<"bitnami">>},
{default_permissions, [<<".*">>, <<".*">>, <<".*">>]}
}
What could be the problem here ?
Update
I have installed rabbitmq locally and everything works fine.
I doubt if the updates to config file is actually not getting reflected properly. This is how I did it.
updated the rabbitmq.config
rabbitmqctl stop_app
rabbitmqctl start_app
But still I see some difference under the 'Ports and contexts' section in the UI
localhost
gcp
Any pointers ? Or is it all looking fine and the problem is something different, like with GCP setup or something ?
After telnet-ing to the port and checking the port config through the GCP console I figured out that I did a mistake in setting the right tag name to the instance where I installed rabbitmq.
Please do verify that the 'target tag' mentioned in your firewall rule is indeed mapped to the vm instance where rabbitmq is installed
Otherwise the config mentioned in the question is enough to make it work from a remote client

Hadoop client not able to connect to server

I set up a 2-node Hadoop cluster, and running start-df.sh and start-yarn.sh works nicely (i.e. all expected services are running, no errors in the logs).
However, when I actually try to run an application, several tasks fail:
15/04/01 15:27:53 INFO mapreduce.Job: Task Id :
attempt_1427894767376_0001_m_000008_2, Status : FAILED
I checked the yarn and datanode logs, but nothing is reported there.
In the userlogs, the syslogs files on the slave node all contain the following error message:
2015-04-01 15:27:21,077 INFO [main] org.apache.hadoop.ipc.Client:
Retrying connect to server:
slave.domain.be./127.0.1.1:53834. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2015-04-01 15:27:21,078 WARN [main]
org.apache.hadoop.mapred.YarnChild:
Exception running child :
java.net.ConnectException: Call From
slave.domain.be./127.0.1.1 to
slave.domain.be.:53834 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
So the problem is that the slave cannot connect to itself..
I checked whether there is a process running on the slave node listening at port 53834, but there is none.
However, all 'expected' ports are being listened on (50020,50075,..). Nowhere in my configuration I have used port 53834. It's always a different port on different runs.
Any ideas on fixing this issue?
Your error might be due to loopback address in your hosts file. Go to /etc/hosts file and comment the line with 127.0.1.1 in your slave nodes and master node(if necessary). Now start the hadoop processes.
EDITED:
Do this in terminal to edit hosts file without root permission:
sudo bash
Enter your current user password to enter into root login. You can now edit your hosts file using:
nano /etc/hosts

RabbitMQ and ActiveMQ running on the same machine

For testing purposes I need ActiveMQ and RabbitMQ running on the same Windows machine. I have both installed, but I can't run them together: I need to stop one service in order to have the other one running.
This is the error I get trying to start RabbitMQ having ActiveMQ running:
=INFO REPORT==== 17-Feb-2015::14:24:00 ===
Error description:
{could_not_start,rabbit,
{bad_return,
{{rabbit,start,[normal,[]]},
{'EXIT',
{rabbit,failure_during_boot,
{boot_step,networking,
{case_clause,
{error,
{{shutdown,
{failed_to_start_child,tcp_listener,
{cannot_listen,{0,0,0,0,0,0,0,0},5672,eacces}}},
{child,undefined,'rabbit_tcp_listener_sup_:::5672',
{tcp_listener_sup,start_link,
[{0,0,0,0,0,0,0,0},
5672,
[inet6,binary,
{packet,raw},
{reuseaddr,true},
{backlog,128},
{nodelay,true},
{linger,{true,0}},
{exit_on_close,false}],
{rabbit_networking,tcp_listener_started,[amqp]},
{rabbit_networking,tcp_listener_stopped,[amqp]},
{rabbit_networking,start_client,[]},
"TCP Listener"]},
transient,infinity,supervisor,
[tcp_listener_sup]}}}}}}}}}}
And this is the error I get trying to start ActiveMQ with RabbitMQ already running:
jvm 1 | INFO | Listening for connections at: tcp://BROKER:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600
jvm 1 | INFO | Connector openwire started
jvm 1 | ERROR | Failed to start Apache ActiveMQ ([localhost, ID:DEV-BROKER01-56290-1424197666199-0:1], java.io.IOException: Transport Connector could not be registered in JMX: java.io.IOException: Failed to bind to server socket: amqp://0.0.0.0:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600 due to:
java.net.BindException: Address already in use: JVM_Bind)
jvm 1 | INFO | Apache ActiveMQ 5.11.0 (localhost, ID:DEV-BROKER01-56290-1424197666199-0:1) is shutting down
That "Address already in use" is the key I guess.
Any way to sort this out? Thanks
this is the problem:
java.net.BindException: Address already in use: JVM_Bind)
both the brokers use the 5672 port (amqp default port).
just change the port for one broker, for example in rabbitmq check this link:
https://www.rabbitmq.com/configure.html
The configuration file rabbitmq.config allows the RabbitMQ core
application, Erlang services and RabbitMQ plugins to be configured. It
is a standard Erlang configuration file, documented on the Erlang
Config Man Page.
An example configuration file follows:
[
{rabbit, [{tcp_listeners, [5673]}]}
].
This example will the port RabbitMQ listens on from 5672 to 5673.
This configuration file is not the same as rabbitmq-env.conf, which
can be used to set environment variables on non-windows systems.