HiveServer2: Thrift SASL related exception when using custom PasswdAuthenticationProvider - hive

I've created a custom implementation of the PasswdAuthenticationProvider interface, based on OAuth2. I think the code is irrelevant for the problem I'm experiencing, nevertheless, it can be found here.
I've configured hive-site.xml with the following properties:
Then I've restarted the Hive service and I've connected a JDBC based remote client with success. This is an example of a successful run found in /var/log/hive/hiveserver2.log:
2016-02-01 11:52:44,515 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (<init>(66)) - Setting max total connections (500)
2016-02-01 11:52:44,515 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (<init>(67)) - Setting default max connections per route (100)
2016-02-01 11:52:44,799 INFO [pool-5-thread-5]: authprovider.HttpClientFactory ( - Doing request: GET HTTP/1.1
2016-02-01 11:52:44,800 INFO [pool-5-thread-5]: authprovider.HttpClientFactory ( - Response received: {"organizations": [], "displayName": "frb", "roles": [{"name": "provider", "id": "106"}], "app_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "email": "", "id": "frb"}
2016-02-01 11:52:44,801 INFO [pool-5-thread-5]: authprovider.HttpClientFactory ( - User frb authenticated
2016-02-01 11:52:44,868 INFO [pool-5-thread-5]: thrift.ThriftCLIService ( - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6
2016-02-01 11:52:44,871 INFO [pool-5-thread-5]: session.SessionState ( - No Tez session required at this point. hive.execution.engine=mr.
2016-02-01 11:52:44,873 INFO [pool-5-thread-5]: session.SessionState ( - No Tez session required at this point. hive.execution.engine=mr.
The problem is after that the following error appears in a recurrent manner:
2016-02-01 11:52:48,227 ERROR [pool-5-thread-4]: server.TThreadPoolServer ( - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(
at org.apache.thrift.server.TThreadPoolServer$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TTransport.readAll(
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(
... 4 more
2016-02-01 11:53:18,323 ERROR [pool-5-thread-5]: server.TThreadPoolServer ( - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(
at org.apache.thrift.server.TThreadPoolServer$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TTransport.readAll(
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(
... 4 more
Why? I've seen in several other questions this occurs when using the default value of hive.server2.authentication, i.e. SASL, and the client is not doing the handshake. But in my case, the value of such a property is CUSTOM. I cannot understand it, and any help would be really appreciated.
I've found there are periodical requests to the HiveServer2... from the HiveServer2 itself! These are the requests that are resulting in Thrift SASL errors:
$ sudo tcpdump -i lo port 10000
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
10:18:48.183469 IP dev-fiwr-bignode-11.hi.inet.ndmp > dev-fiwr-bignode-11.hi.inet.55758: Flags [.], ack 7, win 512, options [nop,nop,TS val 1034162147 ecr 1034162107], length 0
21 packets captured
42 packets received by filter
0 packets dropped by kernel
[fiware-portal#dev-fiwr-bignode-11 ~]$ sudo netstat -nap | grep 55758
tcp 0 0 CLOSE_WAIT 7190/java
tcp 0 0 FIN_WAIT2 -
[fiware-portal#dev-fiwr-bignode-11 ~]$ ps -ef | grep 7190
hive 7190 1 1 10:10 ? 00:00:10 /usr/java/jdk1.7.0_71//bin/java -Xmx1024m -Dhadoop.log.dir=/var/log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Xmx1024m -Xmx4096m,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service- org.apache.hive.service.server.HiveServer2 -hiveconf hive.metastore.uris=" " -hiveconf hive.log.file=hiveserver2.log -hiveconf hive.log.dir=/var/log/hive
1011 14158 12305 0 10:19 pts/1 00:00:00 grep 7190
Any idea?
More research about the connections sent from HiveServer2 to HiveServer2. Data packets always sent 5 bytes, the following ones (hexadecimal): 22 41 30 30 31
Any idea about these connections?

I finally "fixed" this. Since the message was sent by the Ambari agent running in the HiveServer2 machine (some king of weird ping), I simply added an iptables rule blocking all the connections to TCP/10000 port on the loopback interface:
iptables -A INPUT -i lo -p tcp --dport 10000 -j DROP
Of course, now Ambari warns the HiveServer2 is not alive (the pings are droped). And the above rule must be removed if I want to restart the server from Ambari (there is another alive check in the starting script); then after the restart I can enable the rule again. Well, I can live with that.


erlang failed to resolve ipv6 addresses using parameter from rabbitmq

I'm using rabbitmq cluster in k8s which has only pure ipv6 address. inet return nxdomain error when parsing the k8s service name.
The paramter passed to erlang from rabbitmq is:
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+A 128 -kernel inetrc '/etc/rabbitmq/erl_inetrc' -proto_dist inet6_tcp"
RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp"
erl_inetrc: |-
{inet6, true}.
when rabbitmq using its plugin rabbit_peer_discovery_k8s to invoke k8s api:
2019-10-15 07:33:55.000 [info] <0.238.0> Peer discovery backend does not support locking, falling back to randomized delay
2019-10-15 07:33:55.000 [info] <0.238.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized start
up delay.
2019-10-15 07:33:55.000 [debug] <0.238.0> GET https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/tazou/endpoints/zt4-crmq
2019-10-15 07:33:55.015 [debug] <0.238.0> Response: {error,{failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},{inet,[inet]
2019-10-15 07:33:55.015 [debug] <0.238.0> HTTP Error {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},{inet,[inet],nxdom
2019-10-15 07:33:55.015 [info] <0.238.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}}
2019-10-15 07:33:55.016 [error] <0.237.0> CRASH REPORT Process <0.237.0> with 0 neighbours exited with reason: no case clause matching {error,"{fa
iled_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from
_config/0 line 167 in application_master:init/4 line 138
2019-10-15 07:33:55.016 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kub
in k8s console, the address could be resolved:
[rabbitmq]# nslookup -type=AAAA kubernetes.default.svc.cluster.local
Server: 2019:282:4000:2001::6
Address: 2019:282:4000:2001::6#53
kubernetes.default.svc.cluster.local has AAAA address fd01:abcd::1
the inet could return ipv6 address.
kubectl exec -ti zt4-crmq-0 rabbitmqctl eval 'inet:gethostbyname("kubernetes.default.svc.cluster.local").'
as I know, plugin call httpc:request to invoke k8s api. I don't know what's the gap between httpc:request and inet:gethostbyname. I also don't what's used by httpc:request to resolve the address of hostname.
I query for the rabbitmq plugin, It's said that rabbitmq plugin don't aware how erlang resovlve the address.
Anything else I could set for erl_inetrc so that erlang could resolve the ipv6 address? what did i miss to config? or how could i debug from erlang side? I'm new to erlang.

Zookeeper and ActiveMQ LevelDB replication non reliable

In my current project we are trying to set up an activeMQ cluster with LevelDB replication. Our configuration has a ZooKeeper ensemble of three nodes and an ActiveMQ cluster of three nodes.
The following is the configuration used for activeMQ: (of course the hostname is different for each node in the cluster)
We start up three instances of zookeeper and three instances of activemq. We observe that the zookeeper leader gets correctly elected. But in activeMQ cluster Master election is not happening. Go through the log we came to know that there is a authentication problem with zookeeper. (as per the log, I am having less knowledge in zookeeper/activemq). Herewith I pasted the logs for reference.
INFO: Loading '/opt/activemq//bin/env'
INFO: Using java '/usr/bin/java'
INFO: Starting in foreground, this is just for debugging purposes (stop process by pressing CTRL+C)
INFO: Creating pidfile /data/activemq/
Java Runtime: Oracle Corporation 1.8.0_91 /usr/lib/jvm/java-8-openjdk-amd64/jre
Heap sizes: current=62976k free=59998k max=932352k
JVM args: -Xms64M -Xmx1G -Djava.awt.headless=true -
Dactivemq.classpath=/opt/activemq/conf.tmp:/opt/activemq//../lib/: -Dactivemq.home=/opt/activemq/ -
Dactivemq.base=/opt/activemq/ -Dactivemq.conf=/opt/activemq/conf.tmp
Extensions classpath:[/opt/activemq/lib,/opt/activemq/lib/camel,/opt/activemq/lib/optional,/opt/activemq/lib/web,/opt/activemq/lib/extra]
ACTIVEMQ_HOME: /opt/activemq
ACTIVEMQ_BASE: /opt/activemq
ACTIVEMQ_CONF: /opt/activemq/conf.tmp
ACTIVEMQ_DATA: /data/activemq
Loading message broker from: xbean:activemq.xml
INFO | Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1#7823a2f9: startup date [Sat Jun 17 09:15:51 UTC 2017]; root of context hierarchy
INFO | JobScheduler using directory: /data/activemq/localhost/scheduler
INFO | Using Persistence Adapter: Replicated LevelDB[/data/activemq/leveldb, ip-172-20-44-97.ec2.internal:2181,ip-172-20-45-105.ec2.internal:2181,ip-172-20-48-226.ec2.internal:2181//activemq/leveldb-stores]
INFO | Starting StateChangeDispatcher
INFO | Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
INFO | Client
INFO | Client environment:java.version=1.8.0_91
INFO | Client environment:java.vendor=Oracle Corporation
INFO | Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
INFO | Client environment:java.class.path=/opt/activemq//bin/activemq.jar
INFO | Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
INFO | Client
INFO | Client environment:java.compiler=<NA>
INFO | Client
INFO | Client environment:os.arch=amd64
INFO | Client environment:os.version=4.4.65-k8s
INFO | Client
INFO | Client environment:user.home=/root
INFO | Client environment:user.dir=/tmp
INFO | Initiating client connection, connectString=ip-172-20-44-97.ec2.internal:2181,ip-172-20-45-105.ec2.internal:2181,ip-172-20-48-226.ec2.internal:2181 sessionTimeout=2000 watcher=org.apache.activemq.leveldb.replicated.groups.ZKClient#4b41dd5c
WARN | SASL configuration failed: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/opt/activemq/conf.tmp/login.config'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
WARN | unprocessed event state: AuthFailed
INFO | Opening socket connection to server ip-172-20-45-105.ec2.internal/
WARN | Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect Connection refused at Method)[:1.8.0_91] at[:1.8.0_91] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport([zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.ClientCnxn$[zookeeper-3.4.6.jar:3.4.6-1569965]
WARN | SASL configuration failed: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/opt/activemq/conf.tmp/login.config'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
INFO | Opening socket connection to server ip-172-20-48-226.ec2.internal/
WARN | unprocessed event state: AuthFailed
WARN | Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect Connection refused at Method)[:1.8.0_91] at[:1.8.0_91] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( [zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.ClientCnxn$[zookeeper-3.4.6.jar:3.4.6-1569965]
WARN | SASL configuration failed: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/opt/activemq/conf.tmp/login.config'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
INFO | Opening socket connection to server ip-172-20-44-97.ec2.internal/
WARN | unprocessed event state: AuthFailed
WARN | Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect Connection refused at Method)[:1.8.0_91] at[:1.8.0_91] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport([zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.ClientCnxn$[zookeeper-3.4.6.jar:3.4.6-1569965]
Please help to get out from this problem.
If anyone having idea of deploying Zookeeper with ActiveMQ cluster in Kubernetes please share your ideas. since we are trying to deploy it in Kubernetes.

CumulocityLongPollingTransport - canceling the long poll request because of inactivity

I am using the Cumulocity java agent (7.38.0) and it apparently lost communication with the server somehow and never recovered. The admin interface says:
November 22, 2016 2:25 AM
and last cumulo record in the the device syslog was:
Nov 22 01:25:47 localhost root: 01:25:47.166 [CumulocityLongPollingTransport-scheduler-2] WARN c.c.s.c.n.ConnectionHeartBeatWatcher - canceling the long poll request because of inactivity
(there was 1 hour time diff due to some device config prob.)
process looks running anyways:
ps -ef | grep -i c8y
root 1341 1257 0 Nov19 ? 00:00:00 /bin/sh ./
root 1342 1341 0 Nov19 ? 00:00:00 /bin/sh ./
root 1344 1342 0 Nov19 ? 00:25:39 java -cp cfg/*:lib/* -Dlogback.configurationFile=cfg/logback.xml c8y.lx.agent.Agent
Has anyone seen this prob before?
We had it once or twice when people were connecting to cumulocity via firewall or vpn. The result was exactly as you described: the polling gets stuck after some time, like if connections were blocked. In other words i would suspect that it’s a proxy that’s blocking the reconnect.

activemq failover using multiple instances in master slave mode on same linux machine

I have setup ActiveMQ mulitple instances to achieve failover in master slave mode in windows.
While setting up the same i just created 3 instances under bin folder without changing any port and started all 3 instances one by one. First instance became master and remaining were in slave mode until I stopped master instance.
Now I am trying to achieve the same in Linux environment. First instance starts successfully but when I start second instance in a different window it throws below error:
ERROR | Failed to start Apache ActiveMQ ([instance2, ID:132vm6-57227-1478597606120-0:1], Transport Connector could not be registered in JMX: Failed to bind to server socket: tcp:// due to: Address already in use)
INFO | Apache ActiveMQ 5.14.0 (instance2, ID:132vm6-57227-1478597606120-0:1) is shutting down
INFO | Connector openwire stopped
INFO | Connector amqp stopped
INFO | Connector stomp stopped
INFO | Connector mqtt stopped
INFO | Connector ws stopped
INFO | PListStore:[/opt/apache-activemq-5.14.0/bin/instance2/data/instance2/tmp_storage] stopped
INFO | Stopping async queue tasks
INFO | Stopping async topic tasks
INFO | Stopped KahaDB
INFO | Apache ActiveMQ 5.14.0 (instance2, ID:132vm6-57227-1478597606120-0:1) uptime 0.585 seconds
INFO | Apache ActiveMQ 5.14.0 (instance2, ID:132vm6-57227-1478597606120-0:1) is shutdown
INFO | Closing org.apache.activemq.xbean.XBeanBrokerFactory$1#4233871a: startup date [Tue Nov 08 15:03:24 IST 2016]; root of context hierarchy
WARN | Exception thrown from LifecycleProcessor on context close
java.lang.IllegalStateException: LifecycleProcessor not initialized - call 'refresh' before invoking lifecycle methods via the context: org.apache.activemq.xbean.XBeanBrokerFactory$1#4233871a: startup date [Tue Nov 08 15:03:24 IST 2016]; root of context hierarchy
at org.apache.activemq.xbean.XBeanBrokerService.stop([activemq-spring-5.14.0.jar:5.14.0]
at org.apache.activemq.xbean.XBeanBrokerService.afterPropertiesSet([activemq-spring-5.14.0.jar:5.14.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.7.0_65]
at sun.reflect.NativeMethodAccessorImpl.invoke([:1.7.0_65]
at sun.reflect.DelegatingMethodAccessorImpl.invoke([:1.7.0_65]
at java.lang.reflect.Method.invoke([:1.7.0_65]
I am using ActiveMQ 5.14 version.
If anybody has encountered a similar issue, kindly provide your inputs.
To get multiple instances of ActiveMQ running on the same machine, you need to change the ports that they try to open. There are (at least) 3 ports that need to be changed:
The transportConnector ports that accept messaging traffic. These are defined in theactivemq.xml file. Typically you only need the openwire one - this is 61616 by default; I usually change this in the other ActiveMQ instances to 61626, 61636 etc. You can usually comment out the others if you don't intend to use them.
The Jetty HTTP port. This is defined in the jetty.xml file. The default is 8161, set the next ones to 8162, 8163 etc.
The JMX port. This one's a bit tricky, as you need to stick a piece of config into the activemq.xml to explicitly define it as follows:
<managementContext createConnector="true" connectorPort="1099"/>
You can then change this to 1199, 1299 on the other instances. Hope this helps.

Datanode error : NameSystem.getDatanode

Help please Folks
I am trying to set up my Hadoop multinode env (1 master, 1 secondary and 3 slaves - hadoop 2.7.1/Ubuntu 14 on AWS) and i am getting "NameSystem.getDatanode" ERROR message. I browsed and read and tried but reach my limits. Could you point me at least in some direction
Logs (extract) from master - xxx-141/142/143 are the ip of the slaves
Line 134: 2016-01-23 17:36:19,432 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.143:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node is expected to serve this storage.
Line 135: 2016-01-23 17:36:19,457 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node is expected to serve this storage.
Line 159: 2016-01-23 17:36:20,988 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.141:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node XXX.XX.XX.143:50010 is expected to serve this storage.
Extract From SLAVE2 SERVER logs
2016-01-23 17:36:14,812 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
`2016-01-23 17:36:18,607 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x3c90bbfe60c, containing 1 storage report(s), of which we sent 0. The reports had 1 total blocks and used 0 RPC(s). This took 4 msec to generate and 144 msecs for RPC and NN processing. Got back no commands.
2016-01-23 17:36:18,608 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000 is shutting down
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(1XX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 1XX.XX.XX.141:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
at org.apache.hadoop.ipc.RPC$
at org.apache.hadoop.ipc.Server$Handler$
at org.apache.hadoop.ipc.Server$Handler$
at Method)
at org.apache.hadoop.ipc.Server$
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
at com.sun.proxy.$Proxy13.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(
2016-01-23 17:36:18,610 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000
2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d)
2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-1050309752-MAST.XX.XX.169-1453113991010
2016-01-23 17:36:20,611 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2016-01-23 17:36:20,613 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2016-01-23 17:36:20,614 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down DataNode at ip-SLAV-XX-XX-142/1XX.XX.XX.142
it looks like you have three slaves
and you created two of them from a clone of the first slave, after the slave was already included in the cluster.
The two clones now already have a copy of the DFS and use the same storage ID as the first slave. Only one slave with the same ID is expected by the name server. It is trying to tell you this by logging:
[...] is attempting to report storage ID [...].
Node [...]:50010 is expected to serve this storage.
You can try removing the dfs directory on two of the slaves, then restarting them.
i.e. stop the slaves, do a rm -rf on the dfs directory like:
rm -rf /tmp/hadoop-hadoop/dfs/
You can then restart and check that all slaves are connecting and test file replication, e.g. by setting the replication level to 4 for some files like:
hdfs dfs -setrep -w 4 -R /user/somedir
The -w option causes the command to wait until replication has succeeded.