Issue starting jstatd - visualvm

I'm trying to connect JVisualVM to jstatd. However on my remote server I'm having issues starting jstatd.
# netstat -nlp | grep rmiregistry
tcp 0 0 0.0.0.0:1098 0.0.0.0:* LISTEN 7320/rmiregistry
tcp 0 0 0.0.0.0:34872 0.0.0.0:* LISTEN 7320/rmiregistry
The error
# ./jstatd -J-Djava.security.policy=jstatd.all.policy -p 1098
Could not bind //:1098/JStatRemoteHost to RMI Registry
java.rmi.UnexpectedException: undeclared checked exception; nested exception is:
java.lang.ClassNotFoundException: sun.jvmstat.monitor.remote.RemoteHost not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}
at sun.rmi.registry.RegistryImpl_Stub.rebind(Unknown Source)
at java.rmi.Naming.rebind(Naming.java:177)
at sun.tools.jstatd.Jstatd.bind(Jstatd.java:57)
at sun.tools.jstatd.Jstatd.main(Jstatd.java:143)
Caused by: java.lang.ClassNotFoundException: sun.jvmstat.monitor.remote.RemoteHost not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:273)
at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:251)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:377)
... 4 more

You are using wrong JDK. This sun.jvmstat.monitor.remote.RemoteHost not found in gnu.gcj.runtime.SystemClassLoader shows that you are using GCJ to run Jstatd. Please use OpenJDK or Oracle JDK.

Related

Pentaho not starting because it tries bind to port 9092 already used by itself

I'm trying to start Pentaho server on Debian Jessie.
Pentaho crap itself by showing the following error:
15:55:24,198 WARN [PentahoSolutionSpringApplicationContext] Exception encountered during context initialization - cancelling refresh attempt
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.h2.tools.Server' defined in file [/opt/pentaho-biplatform-ce-6.1.0.1-196/biserver-ce/pentaho-solutions/system/GettingStartedDB-spring.xml]: Invocation of init method failed; nested exception is org.h2.jdbc.JdbcSQLException: Exception opening port "H2 TCP Server (tcp://localhost:9092)" (port may be in use), cause: "timeout" [90061-131]
Error is very clear - port 9092 is used by something else. The problem is that it is actually used by Pentaho, so it's complaining about the port which is currently used by itself...
To test that I've changed the port to 9093 in the following file:
./pentaho-solutions/system/GettingStartedDB.properties
The only difference between exceptions now was that port, which was 9093 this time, so it's definitely complaining about the port it is using, very weird.
Full log can be found here: http://ix.io/1ydv
Ideas?
Try to add the following attribute to the CATALINA_OPTS options in the start_pentaho.sh file :
CATALINA_OPTS="... -Dh2.bindAddress=ip_of_your_machine"
It helped me to remove the Exception opening port "H2 TCP Server (tcp://localhost:9092)" (port may be in use) error.
adding as follows in CATALINA_OPTS options in the start_pentaho.sh file is solving this issue:
CATALINA_OPTS="... -Dh2.bindAddress=localhost"
The root cause of the problem is that your server's hostname does not points to 127.0.0.1
Just add(edit) this line into your /etc/hosts:
127.0.0.1 localhost YOUR_HOST_NAME

HiveServer2: Thrift SASL related exception when using custom PasswdAuthenticationProvider

I've created a custom implementation of the PasswdAuthenticationProvider interface, based on OAuth2. I think the code is irrelevant for the problem I'm experiencing, nevertheless, it can be found here.
I've configured hive-site.xml with the following properties:
<property>
<name>hive.server2.authentication</name>
<value>CUSTOM</value>
</property>
<property>
<name>hive.server2.custom.authentication.class</name>
<value>com.telefonica.iot.cosmos.hive.authprovider.OAuth2AuthenticationProviderImpl</value>
</property>
Then I've restarted the Hive service and I've connected a JDBC based remote client with success. This is an example of a successful run found in /var/log/hive/hiveserver2.log:
2016-02-01 11:52:44,515 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (HttpClientFactory.java:<init>(66)) - Setting max total connections (500)
2016-02-01 11:52:44,515 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (HttpClientFactory.java:<init>(67)) - Setting default max connections per route (100)
2016-02-01 11:52:44,799 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(65)) - Doing request: GET https://account.lab.fiware.org/user?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx HTTP/1.1
2016-02-01 11:52:44,800 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(76)) - Response received: {"organizations": [], "displayName": "frb", "roles": [{"name": "provider", "id": "106"}], "app_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "email": "frb#tid.es", "id": "frb"}
2016-02-01 11:52:44,801 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(104)) - User frb authenticated
2016-02-01 11:52:44,868 INFO [pool-5-thread-5]: thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(188)) - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6
2016-02-01 11:52:44,871 INFO [pool-5-thread-5]: session.SessionState (SessionState.java:start(358)) - No Tez session required at this point. hive.execution.engine=mr.
2016-02-01 11:52:44,873 INFO [pool-5-thread-5]: session.SessionState (SessionState.java:start(358)) - No Tez session required at this point. hive.execution.engine=mr.
The problem is after that the following error appears in a recurrent manner:
2016-02-01 11:52:48,227 ERROR [pool-5-thread-4]: server.TThreadPoolServer (TThreadPoolServer.java:run(215)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
2016-02-01 11:53:18,323 ERROR [pool-5-thread-5]: server.TThreadPoolServer (TThreadPoolServer.java:run(215)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
Why? I've seen in several other questions this occurs when using the default value of hive.server2.authentication, i.e. SASL, and the client is not doing the handshake. But in my case, the value of such a property is CUSTOM. I cannot understand it, and any help would be really appreciated.
EDIT 1
I've found there are periodical requests to the HiveServer2... from the HiveServer2 itself! These are the requests that are resulting in Thrift SASL errors:
$ sudo tcpdump -i lo port 10000
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
...
...
10:18:48.183469 IP dev-fiwr-bignode-11.hi.inet.ndmp > dev-fiwr-bignode-11.hi.inet.55758: Flags [.], ack 7, win 512, options [nop,nop,TS val 1034162147 ecr 1034162107], length 0
^C
21 packets captured
42 packets received by filter
0 packets dropped by kernel
[fiware-portal#dev-fiwr-bignode-11 ~]$ sudo netstat -nap | grep 55758
tcp 0 0 10.95.76.91:10000 10.95.76.91:55758 CLOSE_WAIT 7190/java
tcp 0 0 10.95.76.91:55758 10.95.76.91:10000 FIN_WAIT2 -
[fiware-portal#dev-fiwr-bignode-11 ~]$ ps -ef | grep 7190
hive 7190 1 1 10:10 ? 00:00:10 /usr/java/jdk1.7.0_71//bin/java -Xmx1024m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx1024m -Xmx4096m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-0.13.0.2.1.7.0-784.jar org.apache.hive.service.server.HiveServer2 -hiveconf hive.metastore.uris=" " -hiveconf hive.log.file=hiveserver2.log -hiveconf hive.log.dir=/var/log/hive
1011 14158 12305 0 10:19 pts/1 00:00:00 grep 7190
Any idea?
EDIT 2
More research about the connections sent from HiveServer2 to HiveServer2. Data packets always sent 5 bytes, the following ones (hexadecimal): 22 41 30 30 31
Any idea about these connections?
I finally "fixed" this. Since the message was sent by the Ambari agent running in the HiveServer2 machine (some king of weird ping), I simply added an iptables rule blocking all the connections to TCP/10000 port on the loopback interface:
iptables -A INPUT -i lo -p tcp --dport 10000 -j DROP
Of course, now Ambari warns the HiveServer2 is not alive (the pings are droped). And the above rule must be removed if I want to restart the server from Ambari (there is another alive check in the starting script); then after the restart I can enable the rule again. Well, I can live with that.

java.net.ConnectException: JBAS012144: Could not connect to remote://nnn.nn.nn.88:9999. The connection timed out

I am trying to run in jboss instance in domain mode. While I do that I am getting the following issue......
[Host Controller] 12:45:56,535 WARN [org.jboss.as.host.controller] (Controller Boot Thread) JBAS010900: Could not connect to remote domain controller at remote://nnn.nn.nn.88:9999 -- java.net.ConnectException: JBAS012144: Could not connect to remote://nnn.nn.nn.88:9999. The connection timed out
I had ran two JBoss instance in domain mode after configuring...
First JBoss instance->
./domain.sh -b nnn.nn.nn.88 -Djboss.bind.address.management=nnn.nn.nn.88
Second JBoss Instance ->
./domain.sh -b nnn.nn.nn.89 -Djboss.domain.master.address=nnn.nn.nn.88 --host-config=host-slave.xml
nnn.nn.nn.88 host.xml configuration is as follows...
<domain-controller>
<local/>
</domain-controller>
nnn.nn.nn.89 host-slave.xml configuration is as follows...
<domain-controller>
<remote host="${jboss.domain.master.address}" port="${jboss.domain.master.port:9999}" security-realm="ManagementRealm"/>
<domain-controller>
I am able to telnet to port 9999 on host nnn.nn.nn.88 from 89..... as I configured by removing loopback ip for public & management port...... Although is it the implication that <domain-controller> has <local/>....
Please help me to solve this issue... JDK version is JDK 7 Update 80.... EAP 6.3....
In HC host.xml and if we use --host-config=host-slave.xml that particular xml has to connected with DC under <domain-controller> node....
jboss.domain.master.address should be Domain Controller address nnn.nn.nn.88....
<domain-controller>
<remote host="${jboss.domain.master.address:nnn.nn.nn.88}" port="${jboss.domain.master.port:9999}" security-realm="ManagementRealm"/>
<domain-controller>
As per the solution article from redhat....
https://access.redhat.com/solutions/218053#
I ran following command for the same configuration which I had while posting this question..... And I got succeeded.....
DC->
./domain.sh -b my-host-ip1 -bmanagement my-host-ip1
HC->
./domain.sh -Djboss.domain.master.address=my-host-ip1 -b my-host-ip2 -bmanagement my-host-ip2
Although is this way of configuring gives clustering capability to DC and HCs..... I had raised same question to Redhat on the same solution article..... The answer must be yes I hope....
https://access.redhat.com/solutions/218053#comment-975683

clustering in Servicemix using apache cellar

Aim is to create 2 instances of ServiceMix, starting both by changing the RMI host and port number in one of the nodes, installing apache cellar in both the ServiceMix nodes and deploy same bundle in both the nodes.
The bundle contains the routes which uses JM, ActiveMQ and CXF like endpoints. I did all these steps and the last step of bundle deployment in both works fine in node 1 and throws error in node 2 like below.
org.osgi.service.cm.ConfigurationException: null : Cannot start the broker
at org.apache.activemq.osgi.ActiveMQServiceFactory.updated(ActiveMQServiceFactory.java:110)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.felix.cm.impl.helper.ManagedServiceFactoryTracker.provideConfiguration(ManagedServiceFactoryTracker.java:88)[6:org.apache.felix.configadmin:1.8.0]
at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceFactoryUpdate.provide(ConfigurationManager.java:1605)[6:org.apache.felix.configadmin:1.8.0]
at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceFactoryUpdate.run(ConfigurationManager.java:1548)[6:org.apache.felix.configadmin:1.8.0]
at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)[6:org.apache.felix.configadmin:1.8.0]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]
Caused by: java.io.IOException: Transport Connector could not be registered in JMX: Failed to bind to server socket: tcp://0.0.0.0:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 due to: java.net.BindException: Address already in use: JVM_Bind
at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:27)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.BrokerService.registerConnectorMBean(BrokerService.java:2069)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.BrokerService.startTransportConnector(BrokerService.java:2531)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.BrokerService.startAllConnectors(BrokerService.java:2448)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.BrokerService.doStartBroker(BrokerService.java:693)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.BrokerService.startBroker(BrokerService.java:659)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.BrokerService.start(BrokerService.java:595)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.osgi.ActiveMQServiceFactory.updated(ActiveMQServiceFactory.java:104)[92:org.apache.activemq.activemq-osgi:5.10.0]
... 5 more
Caused by: java.io.IOException: Failed to bind to server socket: tcp://0.0.0.0:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 due to: java.net.BindException: Address already in use: JVM_Bind
at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:33)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.transport.tcp.TcpTransportServer.bind(TcpTransportServer.java:135)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.transport.tcp.TcpTransportFactory.doBind(TcpTransportFactory.java:56)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.transport.TransportFactorySupport.bind(TransportFactorySupport.java:40)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.TransportConnector.createTransportServer(TransportConnector.java:318)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.TransportConnector.getServer(TransportConnector.java:144)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.TransportConnector.asManagedConnector(TransportConnector.java:110)[92:org.apache.activemq.activemq-osgi:5.10.0]
at org.apache.activemq.broker.BrokerService.registerConnectorMBean(BrokerService.java:2064)[92:org.apache.activemq.activemq-osgi:5.10.0]
... 11 more
Caused by: java.net.BindException: Address already in use: JVM_Bind
at java.net.DualStackPlainSocketImpl.bind0(Native Method)[:1.7.0_75]
at java.net.DualStackPlainSocketImpl.socketBind(DualStackPlainSocketImpl.java:106)[:1.7.0_75]
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376)[:1.7.0_75]
at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:190)[:1.7.0_75]
at java.net.ServerSocket.bind(ServerSocket.java:376)[:1.7.0_75]
at java.net.ServerSocket.<init>(ServerSocket.java:237)[:1.7.0_75]
at javax.net.DefaultServerSocketFactory.createServerSocket(ServerSocketFactory.java:231)[:1.7.0_75]
at org.apache.activemq.transport.tcp.TcpTransportServer.bind(TcpTransportServer.java:132)[92:org.apache.activemq.activemq-osgi:5.10.0]
I know this is because of the same ActiveMQ configuration (mainly port 61616) in both the nodes. But the idea of clustering is to use the same port, based on the load to one node, the other ServiceMix node should be called to service the request.
Correct me if im wrong. Am I going in the right path to achieve clustering?

GridGain network connection: Is it possible to forward a node via SSH?

I would like to SSH into a remote machine running a gridgain instance and connect to it from a local gridgain instance. Can this be done?
How is the gridgain network connection being done? As far as I could sse the node spins up and listens on the first available port on 47100-47200. But it opens some more ports too.
It seems not be sufficient to just e.g. forward 47100 on the remote machine (the remote machines gridgain port) to local 47100. Probably the communication is not just client server but symmetrical with the remote node trying to connect to my home node?
Is there documentation on the network protocol?
I tried a symetrically forwarding the
GridTcpCommunicationSpi.DFLT_PORTs (47100+) and
GridTcpDiscoverySpi.DFLT_PORTs (47500+)
ports.
The nodes are able to connect. On the local node I first get this warning:
WARN GridTcpCommunicationSpi - Connect timed out (consider increasing 'connTimeout' configuration property) [addr=/10.240.136.167:47100]
WARN GridTcpDiscoverySpi - Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout. Current timeout: 5000.
WARN GridDhtPreloader - <gg-utility-sys-cache> Failed to wait for initial partition map exchange. Possible reasons are:
^-- Transactions in deadlock.
^-- Long running transactions (ignore if this is the case).
^-- Unreleased explicit locks.
WARN GridTcpDiscoverySpi - Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node. Current timeout: 5000.
This is a timeout when somehow trying to connect to connect to 10.240.136.167:47100 - which is the remote machines local IP, which is obviously impossible.
But it looks nice as I get the following:
INFO GridDiscoveryManager - Topology snapshot [ver=2, nodes=2, CPUs=6, heap=2.7GB]
On executing the following broadcast test:
grid.compute().broadcast(new GridRunnable() {
#Override
public void run() {
System.out.println("hello!");
}
});
I get this fatal error on the remote machine, whatever it may be:
[SEVERE][gridgain-#9%pub-null%][GridJobProcessor] Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$Test, taskClsName=at$
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[19:58:02,237][SEVERE][gridgain-#11%pub-null%][GridJobProcessor] Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$1, taskClsName=at.a$
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
class org.gridgain.grid.GridDeploymentException: Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$1, taskClsName=at.ac.ait.is.infrase$
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1107)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
On the client side I don't see anything but:
INFO GridDeploymentLocalStore - Class locally deployed: class nix.GoogleGridRun$1
hello!
When I try to push the broadcast again via the debugger, then I get the following on the local machine and the same error message as before on the remote machine:
ERROR GridTaskWorker - Failed to obtain remote job result policy for result from GridComputeTask.result(..) method (will fail the whole task): GridJobResultImpl [job=o.g.g.kernal.processors.closure.GridClosureProcessor$10#7e89183d, sib=GridJobSiblingImpl [sesId=4c17983b841-43f8b9fa-87ae-4a20-99a1-8d36f5eb74a4, jobId=0d17983b841-ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, nodeId=ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, isJobDone=false], jobCtx=GridJobContextImpl [jobId=0d17983b841-ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, attrs={}], node=GridTcpDiscoveryNode [id=ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, addrs=[10.240.136.167, 127.0.0.1], sockAddrs=[/10.240.136.167:47500, /10.240.136.167:47500, /127.0.0.1:47500], discPort=47500, order=1, loc=false, ver=6.5.0#20140925-sha1:6dc3d773], ex=class o.g.g.GridDeploymentException: Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$Test, taskClsName=nix.GoogleGridRun$Test, codeVer=0, clsLdrId=eb17983b841-43f8b9fa-87ae-4a20-99a1-8d36f5eb74a4, seqNum=1411761402302, depMode=SHARED, dep=null]
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
, hasRes=true, isCancelled=false, isOccupied=true]
class org.gridgain.grid.GridException: Remote job threw user exception (override or implement GridComputeTask.result(..) method if you would like to have automatic failover for this exception).
at org.gridgain.grid.compute.GridComputeTaskAdapter.result(GridComputeTaskAdapter.java:109)
at org.gridgain.grid.kernal.processors.task.GridTaskWorker$3.apply(GridTaskWorker.java:819)
at org.gridgain.grid.kernal.processors.task.GridTaskWorker$3.apply(GridTaskWorker.java:812)
at org.gridgain.grid.util.GridUtils.wrapThreadLoader(GridUtils.java:6093)
at org.gridgain.grid.kernal.processors.task.GridTaskWorker.result(GridTaskWorker.java:812)
at org.gridgain.grid.kernal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:708)
at org.gridgain.grid.kernal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:906)
at org.gridgain.grid.kernal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1138)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.gridgain.grid.GridDeploymentException: Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$Test, taskClsName=nix.GoogleGridRun$Test, codeVer=0, clsLdrId=eb17983b841-43f8b9fa-87ae-4a20-99a1-8d36f5eb74a4, seqNum=1411761402302, depMode=SHARED, dep=null]
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1107)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
On the local host side I have connections between the virtual and real ports
tcp6 0 0 127.0.0.1:47100 127.0.0.1:38272 VERBUNDEN 12280/java
tcp6 0 0 127.0.0.1:38272 127.0.0.1:47100 VERBUNDEN 12280/java
And some more to and from the ssh client (also java)
tcp6 45832 0 78.101.12.107:47101 146.148.119.62:51867 VERBUNDEN 12280/java
tcp6 231 0 78.101.12.107:47501 146.148.119.62:46219 CLOSE_WAIT 12280/java
tcp6 48 0 78.101.12.107:37129 146.148.119.62:22 VERBUNDEN 12280/java
tcp6 1 0 78.101.12.107:47501 146.148.119.62:44391 CLOSE_WAIT 12280/java
78.101.12.107 = local ip
146.148.119.62 = remote ip
I looked at netstat on a successful local 2 node grid I see the following connections being made:
tcp6 0 0 ::1:47501 ::1:43143 VERBUNDEN 10218/java
tcp6 0 0 ::1:47500 ::1:34708 VERBUNDEN 9496/java
tcp6 0 0 ::1:34708 ::1:47500 VERBUNDEN 10218/java
tcp6 0 0 ::1:43143 ::1:47501 VERBUNDEN 9496/java
These are between the GridTcpCommunicationSpi.DFLT_PORTs and GridTcpDiscoverySpi.DFLT_PORTs - so these should maybe be enough.
Any Ideas on what could be wrong?
Home node should be available from cluster as well. You have 2 options:
Setup VPN
Implement and configure GridAddressResolver for all nodes which will turn their local addresses to external addresses. This will require to setup port forwarding in your home network.