Deadlocked compileUnitTestKotlin gradle task in Android Project - kotlin

I wanted to add some tests to android project, but then got stuck with frozen process. I have no idea why this freeze happens but looking in more details it is compileUnitTestsKotlin task, which if solely run in debug mode has log messages as can be seen below; as far as I understand, some deadlocking happens but I am not sure what is the main issue and how it can be resolved. If someone has ever faced this type of issue can you please suggest a possible solution?
2022-03-03T23:46:21.953+0300 [DEBUG] [sun.rmi.transport.tcp] RMI RenewClean-[127.0.0.1:17154,org.jetbrains.kotlin.daemon.common.LoopbackNetworkInterface$ClientLoopbackSocketFactory#600c1c6f]: reuse connection
2022-03-03T23:46:21.954+0300 [DEBUG] [sun.rmi.transport.tcp] RMI RenewClean-[127.0.0.1:17154,org.jetbrains.kotlin.daemon.common.LoopbackNetworkInterface$ClientLoopbackSocketFactory#600c1c6f]: create reaper
2022-03-03T23:46:21.988+0300 [DEBUG] [sun.rmi.transport.tcp] RMI TCP Connection(35)-127.0.0.1: accepted socket from [127.0.0.1:62994]
2022-03-03T23:46:21.988+0300 [DEBUG] [sun.rmi.transport.tcp] RMI TCP Connection(35)-127.0.0.1: (port 62675) op = 80
2022-03-03T23:46:21.989+0300 [DEBUG] [sun.rmi.loader] RMI TCP Connection(35)-127.0.0.1: name = “[Ljava.rmi.server.ObjID;”, codebase = “”, defaultLoader = jdk.internal.loader.ClassLoaders$PlatformClassLoader#4f6b541f
2022-03-03T23:46:21.989+0300 [DEBUG] [sun.rmi.loader] RMI TCP Connection(35)-127.0.0.1: name = “java.rmi.dgc.Lease”, codebase = “”, defaultLoader = jdk.internal.loader.ClassLoaders$PlatformClassLoader#4f6b541f
2022-03-03T23:46:21.989+0300 [DEBUG] [sun.rmi.loader] RMI TCP Connection(35)-127.0.0.1: name = “java.rmi.dgc.VMID”, codebase = “”, defaultLoader = jdk.internal.loader.ClassLoaders$PlatformClassLoader#4f6b541f
2022-03-03T23:46:21.989+0300 [DEBUG] [sun.rmi.loader] RMI TCP Connection(35)-127.0.0.1: name = “[B”, codebase = “”, defaultLoader = jdk.internal.loader.ClassLoaders$PlatformClassLoader#4f6b541f 2022-03-03T23:46:21.989+0300 [DEBUG] [sun.rmi.loader] RMI TCP Connection(35)-127.0.0.1: name = “java.rmi.server.UID”, codebase = “”, defaultLoader = jdk.internal.loader.ClassLoaders$PlatformClassLoader#4f6b541f
2022-03-03T23:46:25.048+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:46:25.048+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:46:25.048+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:46:25.048+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:46:25.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:46:25.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:46:35.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:46:35.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:46:35.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:46:35.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:46:35.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:46:35.050+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:46:35.193+0300 [DEBUG] [sun.rmi.transport.tcp] RMI Scheduler(0): close connection
2022-03-03T23:46:35.242+0300 [DEBUG] [sun.rmi.transport.tcp] RMI TCP Connection(34)-127.0.0.1: (port 62675) connection closed
2022-03-03T23:46:35.242+0300 [DEBUG] [sun.rmi.transport.tcp] RMI TCP Connection(34)-127.0.0.1: close connection
2022-03-03T23:46:36.958+0300 [DEBUG] [sun.rmi.transport.tcp] RMI Scheduler(0): close connection
2022-03-03T23:46:36.993+0300 [DEBUG] [sun.rmi.transport.tcp] RMI TCP Connection(35)-127.0.0.1: (port 62675) connection closed
2022-03-03T23:46:36.993+0300 [DEBUG] [sun.rmi.transport.tcp] RMI TCP Connection(35)-127.0.0.1: close connection
2022-03-03T23:46:45.046+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:46:45.046+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:46:45.046+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:46:45.046+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:46:45.046+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:46:45.046+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:46:55.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:46:55.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:46:55.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:46:55.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:46:55.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:46:55.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:47:05.048+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:47:05.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:47:05.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:47:05.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:47:05.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:47:05.049+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:47:15.045+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:47:15.045+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:47:15.045+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:47:15.046+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:47:15.046+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:47:15.046+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.
2022-03-03T23:47:25.048+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Waiting to acquire shared lock on daemon addresses registry.
2022-03-03T23:47:25.048+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Lock acquired on daemon addresses registry.
2022-03-03T23:47:25.048+0300 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on daemon addresses registry.

Related

Coldfusion 2018 on Centos 7 failing to setup Apache connector

I've installing CF2018 on a new server, which is installed and running, I can see it if I run ps aux | ack -i coldfusion
$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
$ httpd -v
Server version: Apache/2.4.6 (CentOS)
Server built: Jul 29 2019 17:18:49
UPDATE
I had clearly broken something so I've removed earlier errors, but I'm still getting issues with the connector.
I have removed all references and files relating to mod_jk from /etc/httpd/conf, reinstalled CF then re-ran the connector.
It's installed successfully with this command:
$ sudo ./wsconfig -ws Apache -dir /etc/httpd/conf
I have the dir at /opt/coldfusion2018/config/wsconfig/1 setup but I'm now getting these errors:
$ pwd
/opt/coldfusion2018/config/wsconfig/1
$ tail mod_jk.log
[error] ajp_service::jk_ajp_common.c (3000): (cfusion) connecting to tomcat failed (rc=-3, errors=583, client_errors=0).
[info] jk_open_socket::jk_connect.c (816): connect to ::1:8018 failed (errno=13)
[info] ajp_connect_to_endpoint::jk_ajp_common.c (1140): (cfusion) Failed opening socket to (::1:8018) (errno=13)
[error] ajp_send_request::jk_ajp_common.c (1811): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=13)
[info] ajp_service::jk_ajp_common.c (2979): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
[info] jk_open_socket::jk_connect.c (816): connect to ::1:8018 failed (errno=13)
[info] ajp_connect_to_endpoint::jk_ajp_common.c (1140): (cfusion) Failed opening socket to (::1:8018) (errno=13)
[error] ajp_send_request::jk_ajp_common.c (1811): (cfusion) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=13)
[info] ajp_service::jk_ajp_common.c (2979): (cfusion) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
[error] ajp_service::jk_ajp_common.c (3000): (cfusion) connecting to tomcat failed (rc=-3, errors=584, client_errors=0).
I have no idea where port 8018 has come from, I thought tomcat used 8500 by default
UPDATE 2
If I visit my site with :8500 on the end I can get into the CFIDE, so it's CF is running and that port is accessible
UPDATE 3
I've found this in my server.xml file, tried setting the port to both 8009 and 8018 and it seems to make no difference to the errors in the mod_jk.log
<!-- Define an AJP 1.3 Connector on port 8009 -->
<!-- begin connector -->
<Connector port="8009" packetSize="65535" protocol="AJP/1.3" redirectPort="8451" tomcatAuthentication="false" maxThreads="500" connectionTimeout="60000"/>
<!-- end connector -->
Pete,
What's the OS and the webserver's version?
Did you try passing the parameters other than dir, explicitly, like so:
sudo ./wsconfig -ws Apache /opt/apache2/conf -bin /opt/apache2/bin/httpd -script /opt/apache2/bin/apachectl -dir -v
..and the coldfusion process need not be running for the connector to be configured.
8018 is the default AJP port that the conector uses to talk to tomcat. 8500 is the default HTTP port that you'd use when you access the CF admin console.
You initially reported error when configuring the connector. Is that resolved.
Did you check the wsconfig log to see if there were errors configuring the connector.
The modjk log excrepts you've shared more recently simply indicate that CF is not running, or at the least, not listening on the default AJP port.
The problem was SELinux blocking port 8018, I actually asked my hosting provider Secura to look into this for me and they fixed it (based on all the information I'd found from piyush's answer)
I had to allow port 8018 in SELinux
semanage port -a -t http_port_t -p tcp 8018

Apache mod_proxy_ajp module prematurely sending traffic to spare backend server

We've got a pair of Apache 2.4 web servers (web02, web03) running mod_proxy_ajp talking to a pair of Tomcat 7.0.59 servers (app02, app03).
The Tomcat server on app03 is a standby server that should not get traffic unless app02 is completely offline.
Apache config on web02 and web03:
<Proxy balancer://ajp_cluster>
BalancerMember ajp://app02:8009 route=worker1 ping=3 retry=60
BalancerMember ajp://app03:8009 status=+R route=worker2 ping=3 retry=60
ProxySet stickysession=JSESSIONID|jsessionid lbmethod=byrequests
</Proxy>
Tomcat config for AJP on app02 and app03:
<Connector protocol="AJP/1.3" URIEncoding="UTF-8" port="8009" />
We are seeing issues where Apache starts sending traffic to app03 which is marked as the spare even when app02 is still available but perhaps a bit busy.
Apache SSL error log:
[Thu Sep 12 14:23:28.028162 2019] [proxy_ajp:error] [pid 24234:tid 140543375898368] (70007)The timeout specified has expired: [client 207.xx.xxx.7:1077] AH00897: cping/cpong failed to 10.160.160.47:8009 (app02)
[Thu Sep 12 14:23:28.028196 2019] [proxy_ajp:error] [pid 24234:tid 140543375898368] [client 207.xx.xxx.7:1077] AH00896: failed to make connection to backend: app02
[Thu Sep 12 14:23:28.098869 2019] [proxy_ajp:error] [pid 24135:tid 140543501776640] [client 207.xx.xxx.7:57809] AH01012: ajp_handle_cping_cpong: ajp_ilink_receive failed, referer: https://site.example.com/cart
[Thu Sep 12 14:23:28.098885 2019] [proxy_ajp:error] [pid 24135:tid 140543501776640] (70007)The timeout specified has expired: [client 207.xx.xxx.7:57809] AH00897: cping/cpong failed to 10.160.160.47:8009 (app02), referer: https://site.example.com/cart
There are hundreds of these messages in our Apache logs.
Any suggestions on settings for making Apache stick to app02 unless it is completely offline?
You are experiencing thread exhaustion in the Tomcat connector causing httpd to think app02 is in a bad state - which, in a way, it is.
The short answer is switch your Tomcat AJP connector to use protocol="org.apache.coyote.ajp.AjpNioProtocol"
The long answer is, well, rather longer.
mod_jk uses persistent connections between httpd and Tomcat. The historical argument for this is performance. It saves the time of establishing a new TCP connection for each request. Generally, testing shows that this argument doesn't hold and that the the time taken to establish a new TCP connection or to perform a CPING/CPONG to confirm that the connection is valid (which you need to do if you use persistent connections) takes near enough the same time. Regaredless, persistent connections are the default with mod_jk.
When using persistent connections mod_jk creates one connection per httpd worker thread and caches that connection in the worker thread.
The default AJP connection in Tomcat 7.x is the BIO connector. This connector uses blocking I/O and requires one thread per connection.
The issue occurs when httpd is configured with more workers than Tomcat has threads. Initially everything is OK. When an httpd worker encounters the first request that needs to be passed to Tomcat, mod_jk creates the persistent connection for that httpd worker and the request is served. Subsequent requests processed by that httpd worker that need to be passed to Tomcat will use that cached connection. Requests are allocated (effectively) randomly to httpd workers. As more httpd workers see their first request that needs to be passed to Tomcat, mod_jk creates the necessary persistent connection for each worker. It is likely that many of the connections to Tomcat will be mostly idle. How idle will depend on the load on httpd and the proportion of those requests that are passed to Tomcat.
All is well until more httpd workers need to create a connection to Tomcat that Tomcat has threads. Remember that the Tomcat AJP BIO connector requires a thread per connection so maxThreads is essentially the maximum number of AJP connections that Tomcat will allow. At that point mod_jk is unable to create the request and therefore the failover process is initiated.
There are two solutions. The first - the one I described above - is to remove the one thread per connection limitation. By switching to the NIO AJP connector, Tomcat uses a Poller thread to maintain 1000s of connections, only passing those with data to process to a thread for processing. The limitation for Tomcat processing is then that maxThreads is the maximum number of concurrent requests that Tomcat can process on that Connector.
The second solution is to disable persistent connections. mod_jk the creates a connection, uses it for a single request and then closes it. This reduces the number of connections the mod_jk requires at any one point between httpd and Tomcat.
Sorry the above is rather a large wall of text. I've also covered this in various presentations including this one.

Packer loops through ports when attempting to establish SSH connection

When Packer reaches the "Waiting for SSH to become available..." step.
My logs show
14:07:29 [INFO] Attempting SSH connection...
14:07:29 reconnecting to TCP connection for SSH
14:07:29 handshaking with SSH
14:07:29 handshake error: ssh: handshake failed: read tcp 127.0.0.1:60372->127.0.0.1:3057: read: connection reset by peer
14:07:29 [DEBUG] SSH handshake err: ssh: handshake failed: read tcp 127.0.0.1:60372->127.0.0.1:3057: read: connection reset by peer
14:07:36 [INFO] Attempting SSH connection...
14:07:36 reconnecting to TCP connection for SSH
14:07:36 handshaking with SSH
14:07:36 handshake error: ssh: handshake failed: read tcp 127.0.0.1:60376->127.0.0.1:3057: read: connection reset by peer
14:07:36 [DEBUG] SSH handshake err: ssh: handshake failed: read tcp 127.0.0.1:60376->127.0.0.1:3057: read: connection reset by peer
Note a different port on each attempt.
60372
60376
Packer is trying a new port, every 7 seconds.
Is there a way to configure the ports before or during the build to avoid this try/fail approach?
That is the source port which the ssh connection is made from. It's assigned by the OS with a random available high port.
The issue is not with SSH Server or TCP/IP. It is with the way Packer is designed.
When a VM is created, Packer.io will run boot commands. It takes time and time varies on different machines. During that time you will see "Waiting for SSH to become available...". On the background, Packer.io will be attempting to establish an SSH connection. The log is saturated with messages like this
Linux
14:07:29 [INFO] Attempting SSH connection...
14:07:29 reconnecting to TCP connection for SSH
14:07:29 handshaking with SSH
14:07:29 handshake error: ssh: handshake failed: read tcp 127.0.0.1:60372->127.0.0.1:3057: read: connection reset by peer
14:07:29 [DEBUG] SSH handshake err: ssh: handshake failed: read tcp 127.0.0.1:60372->127.0.0.1:3057: read: connection reset by peer
or
Windows
15:54:31 packer.exe: 2017/02/01 15:54:31 [INFO] Attempting SSH connection...
15:54:31 packer.exe: 2017/02/01 15:54:31 reconnecting to TCP connection for SSH
15:54:31 packer.exe: 2017/02/01 15:54:31 handshaking with SSH
15:54:31 packer.exe: 2017/02/01 15:54:31 handshake error: ssh: handshake failed: read tcp 127.0.0.1:62691->127.0.0.1:4289: wsarecv: An existing connection was forcibly closed by the remote host.
15:54:31 packer.exe: 2017/02/01 15:54:31 [DEBUG] SSH handshake err: ssh: handshake failed: read tcp 127.0.0.1:62691->127.0.0.1:4289: wsarecv: An existing connection was forcibly closed by the remote host.
OS will boot and with it SSH server on the guest will become available. At that moment SSH connection from host to guest should establish.
Reproduced and Confirmed on Windows 10 Pro and Ubuntu 16.04.1 TLs

Spark worker won't bind to master

Launching my spark worker, I got an error which may be related to the possibility from the slave to contact the master machine. But I am unsure.
6/02/12 23:47:13 INFO Utils: Successfully started service 'sparkWorker' on port 38019.
16/02/12 23:47:13 INFO Worker: Starting Spark worker 192.168.0.38:38019 with 8 cores, 26.5 GB RAM
16/02/12 23:47:13 INFO Worker: Running Spark version 1.6.0
16/02/12 23:47:13 INFO Worker: Spark home: /home/romain/spark-1.6.0-bin-hadoop2.6
16/02/12 23:47:13 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/02/12 23:47:13 INFO WorkerWebUI: Started WorkerWebUI at http://192.168.0.38:8081
16/02/12 23:47:13 INFO Worker: Connecting to master 192.168.0.39:7078...
16/02/12 23:47:13 WARN Worker: Failed to connect to master 192.168.0.39:7078
java.io.IOException: Failed to connect to /192.168.0.39:7078
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.0.39:7078
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
While on the master I see it is up and running :
16/02/12 23:30:30 WARN Utils: Your hostname, pl resolves to a loopback address: 127.0.1.1; using 192.168.0.39 instead (on interface eth0)
16/02/12 23:30:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/12 23:30:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/12 23:30:31 INFO SecurityManager: Changing view acls to: romain
16/02/12 23:30:31 INFO SecurityManager: Changing modify acls to: romain
16/02/12 23:30:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(romain); users with modify permissions: Set(romain)
16/02/12 23:30:31 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/02/12 23:30:31 INFO Utils: Successfully started service 'sparkMaster' on port 7078.
16/02/12 23:30:31 INFO Master: Starting Spark master at spark://pl:7078
16/02/12 23:30:31 INFO Master: Running Spark version 1.6.0
16/02/12 23:30:32 INFO Utils: Successfully started service 'MasterUI' on port 3094.
16/02/12 23:30:32 INFO MasterWebUI: Started MasterWebUI at http://192.168.0.39:3094
16/02/12 23:30:32 WARN Utils: Service could not bind on port 6066. Attempting port 6067.
16/02/12 23:30:32 INFO Utils: Successfully started service on port 6067.
16/02/12 23:30:32 INFO StandaloneRestServer: Started REST server for submitting applications on port 6067
16/02/12 23:30:32 INFO Master: I have been elected leader! New state: ALIVE
Going through blogs and pages it seems it is possible that we would need a secure network (I did install password-less ssh key - but for "romain" user : under which user is spark launch ? the command-line user I guess).
Should I check something on the network ?
From this page :
Spark worker can not connect to Master
I tried :
telnet 192.168.0.39
Trying 192.168.0.39...
telnet: Unable to connect to remote host: Connection refused
But ping works :
romain#wk:~/spark-1.6.0-bin-hadoop2.6$ ping 192.168.0.39
PING 192.168.0.39 (192.168.0.39) 56(84) bytes of data.
64 bytes from 192.168.0.39: icmp_seq=1 ttl=64 time=0.233 ms
64 bytes from 192.168.0.39: icmp_seq=2 ttl=64 time=0.185 ms
^C
--- 192.168.0.39 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.185/0.209/0.233/0.024 ms
and I do have passwordless ssh connectivity :
$ ssh 192.168.0.39
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-49-generic x86_64)
$
What should be done to make connectivity possible ?
By setting SPARK_LOCAL_IP=127.0.0.1 variable, I was able to get my
spark worker working.
you can either define it as local bash ENV variable in ~/.bashrc
you can make a copy of $SPARK_HOME/conf/spark-env.sh.template as 'conf/spark-env.sh' and define it there.
In a cluster environment, you better it as local IP address. Thus you would be able to see worker node UI.

Hadoop client not able to connect to server

I set up a 2-node Hadoop cluster, and running start-df.sh and start-yarn.sh works nicely (i.e. all expected services are running, no errors in the logs).
However, when I actually try to run an application, several tasks fail:
15/04/01 15:27:53 INFO mapreduce.Job: Task Id :
attempt_1427894767376_0001_m_000008_2, Status : FAILED
I checked the yarn and datanode logs, but nothing is reported there.
In the userlogs, the syslogs files on the slave node all contain the following error message:
2015-04-01 15:27:21,077 INFO [main] org.apache.hadoop.ipc.Client:
Retrying connect to server:
slave.domain.be./127.0.1.1:53834. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2015-04-01 15:27:21,078 WARN [main]
org.apache.hadoop.mapred.YarnChild:
Exception running child :
java.net.ConnectException: Call From
slave.domain.be./127.0.1.1 to
slave.domain.be.:53834 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
So the problem is that the slave cannot connect to itself..
I checked whether there is a process running on the slave node listening at port 53834, but there is none.
However, all 'expected' ports are being listened on (50020,50075,..). Nowhere in my configuration I have used port 53834. It's always a different port on different runs.
Any ideas on fixing this issue?
Your error might be due to loopback address in your hosts file. Go to /etc/hosts file and comment the line with 127.0.1.1 in your slave nodes and master node(if necessary). Now start the hadoop processes.
EDITED:
Do this in terminal to edit hosts file without root permission:
sudo bash
Enter your current user password to enter into root login. You can now edit your hosts file using:
nano /etc/hosts