Hadoop client not able to connect to server - apache

I set up a 2-node Hadoop cluster, and running start-df.sh and start-yarn.sh works nicely (i.e. all expected services are running, no errors in the logs).
However, when I actually try to run an application, several tasks fail:
15/04/01 15:27:53 INFO mapreduce.Job: Task Id :
attempt_1427894767376_0001_m_000008_2, Status : FAILED
I checked the yarn and datanode logs, but nothing is reported there.
In the userlogs, the syslogs files on the slave node all contain the following error message:
2015-04-01 15:27:21,077 INFO [main] org.apache.hadoop.ipc.Client:
Retrying connect to server:
slave.domain.be./127.0.1.1:53834. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2015-04-01 15:27:21,078 WARN [main]
org.apache.hadoop.mapred.YarnChild:
Exception running child :
java.net.ConnectException: Call From
slave.domain.be./127.0.1.1 to
slave.domain.be.:53834 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
So the problem is that the slave cannot connect to itself..
I checked whether there is a process running on the slave node listening at port 53834, but there is none.
However, all 'expected' ports are being listened on (50020,50075,..). Nowhere in my configuration I have used port 53834. It's always a different port on different runs.
Any ideas on fixing this issue?

Your error might be due to loopback address in your hosts file. Go to /etc/hosts file and comment the line with 127.0.1.1 in your slave nodes and master node(if necessary). Now start the hadoop processes.
EDITED:
Do this in terminal to edit hosts file without root permission:
sudo bash
Enter your current user password to enter into root login. You can now edit your hosts file using:
nano /etc/hosts

Related

cannot connect to X.X.X.X:10514: Connection refused

Configuring a ELK stack version 8.1, based on two virtual machine which both run Oracle linux 8. I need to send logs from a VM to the other using rsyslog. On the recieving machine logs will be acquired using FileBeat. The file rsyslog.conf has been configured on the sending machine, adding target machine parameters. The file filebeat.yml has been configured to recieve logs from rsyslog like this:
- type: syslog
enabled: true
format: auto
protocol.tcp:
host: "X.X.X.X:10514"
The firewalld on the receiving machine has been configured opening the port 10514.
Since the reboot after configuration, the only thing I can get is the error:
cannot connect to X.X.X.X:10514: Connection refused
How can I solve this problem?

Can't establish TCP connection, RabbitMQ

I'm new to RabbitMQ and I want to run a RabbitMQ server instance on centOS7 using the following command:
sudo systemctl start rabbitmq-server
The command seemed to take forever and when I stopped the process and checked the log files, everything was ok and it said that rabbit is up and running. But when I try to execute any command using rabbitmqctl I'm getting the following error:
Error: unable to perform an operation on node 'rabbit#hostname'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit#hostname
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
**DIAGNOSTICS**
attempted to contact: [rabbit#hostname]
rabbit#hostname:
* connected to epmd (port 4369) on hostname
* epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic
* can't establish TCP connection to the target node, reason: timeout (timed out)
* suggestion: check if host 'hostname' resolves, is reachable and ports 25672, 4369 are not blocked by firewall
Current node details:
* node name: 'rabbitmqcli-806330-rabbit#hostname'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: KgAE7WR3dl5/FGAyWKE5LA==
I tried killing the processes manually but it didn't work.
every needed port is listening and I can telnet them. Can you please help me on where the problem might be?
The client machine cannot resolve the hostname pointing to the rabbitmq server.
If the IP address isn't publicly propagated, you have to put the IP/host combination in /etc/hosts file.
You could also try to connect to the IP address instead of the hostname to clear any other network related issues.

Command not found while starting the secured zookeeper CLI to connect to ZK server

I have configured the ZK Server to use SSL (signed cert, trust store,keystore, modified zookeeper.properties all setup done and good). Zookeeper starts and listens on the port 2182 for SSL requests and no errors in the zookeeper and kafka server logs.
#new properties added in kafka/config/zookeeper.properties
secureClientPort=2182
authProvider.x509=org.apache.zookeeper.server.auth.X509AuthenticationProvider
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.trustStore.location=/path/to/ssl/kafka.zookeeper.truststore.jks
ssl.trustStore.password=serversecret
ssl.keyStore.location=/path/to/ssl/kafka.zookeeper.keystore.jks
ssl.keyStore.password=serversecret
ssl.clientAuth=need
Now to connect to secure zookeeper using ZK-CLI I am following similar approach. Create zk-client cert, get it signed, create truststore and keystore for the same. Create the properties file and trying to connect to ZK server but I get an error
Command not found: Command not found /path/to/ssl/zookeeper-client.properties
$ kafka/bin/zookeeper-shell.sh localhost:2182 -zk-tls-config-file /Users/path/to/ssl/zookeeper-client.properties
Connecting to localhost:2182
ZooKeeper -server host:port cmd args
addauth scheme auth
close
.....
Command not found: Command not found /Users/path/to/ssl/zookeeper-client.properties
My zookeeper-client.properties looks like this
$cat /Users/path/to/ssl/zookeeper-client.properties
#zookeeper.connect=localhost:2182
zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
zookeeper.ssl.client.enable=true
zookeeper.ssl.protocol=TLSv1.2
zookeeper.ssl.truststore.location=/Users/path/to/ssl/kafka.zookeeper-client.truststore.jks
zookeeper.ssl.truststore.password=serversecret
zookeeper.ssl.keystore.location=/Users/path/to/ssl/kafka.zookeeper-client.keystore.jks
zookeeper.ssl.keystore.password=serversecret
Kafka Server logs at the start of the ZK.
[2021-07-16 11:27:38,676] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NettyServerCnxnFactory)
[2021-07-16 11:27:43,760] INFO bound to port 2181 (org.apache.zookeeper.server.NettyServerCnxnFactory)
.....
[2021-07-16 11:27:43,819] INFO Using org.apache.zookeeper.server.NettyServerCnxnFactory as server connection factory (org.apache.zookeeper.server.ServerCnxnFactory)
[2021-07-16 11:27:43,819] INFO binding to port 0.0.0.0/0.0.0.0:2182 (org.apache.zookeeper.server.NettyServerCnxnFactory)
[2021-07-16 11:27:43,821] INFO bound to port 2182 (org.apache.zookeeper.server.NettyServerCnxnFactory)
...
When I try to connect to port 2182 with the zk-client the server logs doesn't show an entry (probably because it is not able to connect as the command to initiate connection fails)
I am using kafka_2.12 version and it has zookeeper-3.5.7
What am I missing here? To me configurations look as expected and the zk-cli shouldn't throw
Reference :
https://atsc.com.sg/docs/edp/7-security/zookeeper-mutual-tls/
https://docs.confluent.io/platform/current/security/zk-security.html
Thanks,
JE
I think the problem is that your cli is running from older version that does not yet support this parameter, check your execution path , are you truly executing from the "current" version?

How to fix error activeMQ port already in use

I cannot start activeMQ. it started the day before without problems. It says that the port 1883 is in use. But I cannot find that port 1883 is in use.
I call active mq from the location : C:\Users\"user"\Desktop\apache-activemq-5.15.8 then I use "activemq start"
-redownloading activeMQ
-restarting the PC 3 times
-inserting the .jar files into the corresponding java project
ERROR | Failed to start Apache ActiveMQ (localhost, ID:DESKTOP-H0C9C4R-2808-1550050121460-0:1)
java.io.IOException: Transport Connector could not be registered in JMX: java.io.IOException: Failed to bind to server socket: mqtt://0.0.0.0:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 due to: java.net.BindException: Address already in use: JVM_Bind
The exception message java.net.BindException: Address already in use: JVM_Bind denotes that the port is already used by any one of the process. You can check this by executing the command in the console - netstat -an | find "LISTEN" (Windows) or netstat -an | grep "LISTEN"(Linux & Other OS).
Hope this will help to troubleshoot the issue.

Unable to start OHS component

Middleware: Oracle HTTP Server(OHS)
Version: 12.2.1.3
Configured Oracle HTTP Server(OHS) in standalone mode. Node manager is running perfectly. While starting "./startComponent.sh ohs1" i am getting the below error,
"""
javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
Error: Error occurred while performing nmConnect : Cannot connect to Node Manager. : Unrecognized SSL message, plaintext connection?
"""
The solution found in internet is to changing the SecureListener to false in Nodemanager properties file.
When i did that i got the below error,
"""
weblogic.nodemanager.NMConnectException: Connection refused (Connection refused). Could not connect to NodeManager. Check that it is running at localhost/XXX.0.X.X:XXXX.
Error: Error occurred while performing nmConnect : Cannot connect to Node Manager. : Connection refused (Connection refused). Could not connect to NodeManager. Check that it is running at localhost/XXX.0.X.X:XXXX.
"""
And the solution for this is Setting the SecureListener to true in node manager properties file.
I am confused. Can someone help in resolving these errors?
Installed Oracle Access Manager(OAM) and OHS on same machine, but installed OHS in Standalone mode in different folder. After that, uninstalled OHS and Installed OHS in the same folder where i installed OAM, It worked.