Hive - Out of Memory Exception - hive

which results in MR job. The MR job runs successfully, but when beeswax tries to render the result then I get an OOM Exception.
I was wondering if there is a configuration setting to help me get passed this issue.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:561)
at java.nio.CharBuffer.toString(CharBuffer.java:1201)
at org.apache.hadoop.io.Text.decode(Text.java:394)
at org.apache.hadoop.io.Text.decode(Text.java:371)
at org.apache.hadoop.io.Text.toString(Text.java:273)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:280)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:220)
at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:59)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:427)
at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:498)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1474)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.materializeResults(BeeswaxServiceImpl.java:434)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.fetch(BeeswaxServiceImpl.java:543)
at com.cloudera.beeswax.BeeswaxServiceImpl$5.run(BeeswaxServiceImpl.java:986)
at com.cloudera.beeswax.BeeswaxServiceImpl$5.run(BeeswaxServiceImpl.java:981)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at com.cloudera.beeswax.BeeswaxServiceImpl.doWithState(BeeswaxServiceImpl.java:772)
at com.cloudera.beeswax.BeeswaxServiceImpl.fetch(BeeswaxServiceImpl.java:980)
at com.cloudera.beeswax.api.BeeswaxService$Processor$fetch.getResult(BeeswaxService.java:987)
at com.cloudera.beeswax.api.BeeswaxService$Processor$fetch.getResult(BeeswaxService.java:971)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Update
I increased the memory settings in cloudera manager but no cigar. After restarting the service the first time I run the query it works. The second time I run it fails:
Hue - Beeswax Server (Default) / Resource Management - Java Heap Size
of Beeswax Server in Bytes [1 Gib]
Hive - Gateway (Default) / Resource Management - Client Java Heap Size in Bytes [1 Gib]
Hive - HiveServer2
(Default) / Resource Management - Java Heap Size of HiveServer2 in
Bytes [1 Gib]

There are three -Xmx that you can play with (increase) - client java, Hive Serever 2 and Hive Meta Store Server. I guess you're hitting one of these limtits.

Related

X-Ray Daemon don't receive any data from envoy

I have a service running a task definition with three containers:
service itself
envoy
x-ray daemon
And I want to trace and monitor my services interacting with each other with x-ray.
But I don't see any data in x-ray.
I can see the request logs and everything in the envoy logs but there are no error messages about missing connection to the x-ray daemon.
Envoy container has three env variables:
APPMESH_VIRTUAL_NODE_NAME = mesh/mesh-name/virtualNode/service-virtual-node
ENABLE_ENVOY_XRAY_TRACING = 1
ENVOY_LOG_LEVEL = trace
The x-ray daemon is pretty plain and has just a name and an image (amazon/aws-xray-daemon:1).
But when looking in the logs of the x-ray dameon, there is only the following:
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] Initializing AWS X-Ray daemon 3.0.0
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] Using buffer memory limit of 76 MB
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] 1216 segment buffers allocated
2022-05-31T14:48:05.051+02:00 2022-05-31T12:48:05Z [Info] Using region: eu-central-1
2022-05-31T14:48:05.788+02:00 2022-05-31T12:48:05Z [Error] Get instance id metadata failed: RequestError: send request failed
2022-05-31T14:48:05.788+02:00 caused by: Get http://169.254.169.254/latest/meta-data/instance-id: dial tcp xxx.xxx.xxx.254:80: connect: invalid argument
2022-05-31T14:48:05.789+02:00 2022-05-31T12:48:05Z [Info] Starting proxy http server on 127.0.0.1:2000
As far as I read, the error you can see in these logs doesn't affect the functionality (https://repost.aws/questions/QUr6JJxyeLRUK5M4tadg944w).
I'm pretty sure I'm missing a configuration or access right.
It's running already on staging but I set this up several weeks ago and I don't find any differences between the configurations.
Thanks in advance!
In my case, I made a copy-paste mistake by copying trailing line break into the name of the environment variable ENABLE_ENVOY_XRAY_TRACING which wasn't visible in the overview and only inside the text field.

JMeter SocketException when uploading 1GB file

JMeter 5.4.1
OpenJDK 15.0.1
My test server is configured by default to allow a max 1073741824 byte file to be uploaded, a limit that is configurable. My goal is to validate that the configured limit is respected.
When I configure it for 1048576 bytes and exceed that limit with my upload, the server sends the response:
"{"type":"https://tools.ietf.org/html/rfc7231#section-6.5.1","title":"One or more validation errors occurred.","status":400,"traceId":"|bd37f36b-4cd68e1b8b433669.","errors":{"":["Failed to read the request form. Multipart body length limit 1048576 exceeded."]}}"
When I configure it for 1073741824 bytes and exceed that limit with my upload, JMeter reports the following error:
java.net.SocketException: Connection reset by peer
at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1051)
at java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:342)
at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1277)
at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:124)
at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:136)
at org.apache.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:167)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)
at org.apache.http.entity.mime.content.FileBody.writeTo(FileBody.java:121)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$ViewableFileBody.writeTo(HTTPHC4Impl.java:1513)
at org.apache.http.entity.mime.AbstractMultipartForm.doWriteTo(AbstractMultipartForm.java:134)
at org.apache.http.entity.mime.AbstractMultipartForm.writeTo(AbstractMultipartForm.java:157)
at org.apache.http.entity.mime.MultipartFormEntity.writeTo(MultipartFormEntity.java:113)
at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:152)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$2.doSendRequest(HTTPHC4Impl.java:458)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.executeRequest(HTTPHC4Impl.java:935)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.sample(HTTPHC4Impl.java:646)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerProxy.sample(HTTPSamplerProxy.java:66)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1296)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1285)
at org.apache.jmeter.threads.JMeterThread.doSampling(JMeterThread.java:638)
at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:558)
at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:489)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:256)
at java.base/java.lang.Thread.run(Thread.java:832)
My JMeter.bat file has the heap set as follows:
set HEAP=-Xms1g -Xmx4g -XX:MaxMetaspaceSize=256m
This looks to be similar to a post from last Oct, but there was no solution suggested/reported.
SocketException after sending huge request via JMeter
Connection reset by peer means that your server has reset the connection so you should rather look for the clue in your server logs.
The only thing I suggest to do on JMeter side is to consider switching to HTTP Raw Request sampler, it has nice feature of streaming the file directly to the server without loading it to memory first, I think you will find it extremely helpful when it comes to load testing with more than one virtual user. See HTTP Raw Request for SOAP + MTOM post on JMeter Plugins support forum for more details.

Opscenter backup failure - S3 500 InternalError

The backup service using Opscenter is failing with below AWS S3 error - 500 InternalError. It is failing on 3-4 nodes for different files with this error code. AWS S3 documentation recommends to retry the operation (We encountered an internal error. Please try again). I can see this file name and error only once in agent log and S3 log, which means there was no retry done after this error. Is there any way to enable retry for S3 500 error code (InternalError) in opscenter? Any suggestion on how to fix this error?
Error while sending opsagent.backups.mytable.SSTable#1d6cb0c5 to
org.jclouds.http.HttpResponseException: request: HEAD https://my-backups.s3.amazonaws.com/snapshots/a3b72e7b-bd70-4f9e-aa2a-
cf6c2a5ff336/sstables/1458057843-my_ks-mytable-ka-57417-Index.db.gz HTTP/1.1 failed with response: HTTP/1.1 500 Internal Server Error
at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:63)
at org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:67)
at org.jclouds.http.internal.BaseHttpCommandExecutorService.shouldContinue(BaseHttpCommandExecutorService.java:135)
at org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:105)
at org.jclouds.rest.internal.InvokeSyncToAsyncHttpMethod.invoke(InvokeSyncToAsyncHttpMethod.java:128)
at org.jclouds.rest.internal.InvokeSyncToAsyncHttpMethod.apply(InvokeSyncToAsyncHttpMethod.java:94)
at org.jclouds.rest.internal.InvokeSyncToAsyncHttpMethod.apply(InvokeSyncToAsyncHttpMethod.java:55)
at org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:156)
at org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123)
at com.sun.proxy.$Proxy51.objectExists(Unknown Source)
at org.jclouds.s3.blobstore.S3BlobStore.blobExists(S3BlobStore.java:175)
at org.jclouds.blobstore2$blob_exists_QMARK_.invoke(blobstore2.clj:238)
at opsagent.backups.destinations$create_blob.invoke(destinations.clj:48)
at opsagent.backups.destinations$fn__12755.invoke(destinations.clj:185)
at opsagent.backups.destinations$fn__12385$G__12378__12396.invoke(destinations.clj:25)
at opsagent.backups.staging$start_staging_BANG_$fn__12925$state_machine__5264__auto____12926$fn__12931$fn__12962.invoke(staging.clj:61)
at opsagent.backups.staging$start_staging_BANG_$fn__12925$state_machine__5264__auto____12926$fn__12931.invoke(staging.clj:59)
at opsagent.backups.staging$start_staging_BANG_$fn__12925$state_machine__5264__auto____12926.invoke(staging.clj:56)
at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:940)
at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:944)
at clojure.core.async.impl.ioc_macros$take_BANG_$fn__5280.invoke(ioc_macros.clj:953)
at clojure.core.async.impl.channels.ManyToManyChannel$fn__1785.invoke(channels.clj:102)
at clojure.lang.AFn.run(AFn.java:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
1
Details:
OpsCenter 5.2.4
DSE 4.8.2
data size per node ~130GB x 3 nodes x 3 dc
Compression and S3 server side encryption enabled
JVM_OPTS="$JVM_OPTS -Xmx512M -Djclouds.mpu.parts.magnitude=100000 -Djclouds.mpu.parts.size=32000000"
There was an issue in OpsCenter 5.2.4 that placed the blob_exists code outside of our retry loop. This has been fixed in OpsCenter 6.0.1.

JDBC client to Hive - No data or no sasl data in the stream Exception

We have a Kerberised cluster and I'm trying to run a Java action in Oozie where I make a JDBC connection to Hive. This JDBC connections works fine on the Sandbox without Kerberos.
The connection string is as simple as the following, where I'm providing username and password in it:
Connection con = DriverManager.getConnection("jdbc:hive2://W12345:10000/control;principal=hive/W12345.companynet.net#COMPANYNET.NET","user123","passw123");
The Oozie action (strangely) completes succesfully, and the Java action log does not present any error:
1742 [main] INFO org.apache.hive.jdbc.Utils - Supplied authorities: W12345:10000
1742 [main] INFO org.apache.hive.jdbc.Utils - Resolved authority: W12345:10000
1766 [main] INFO org.apache.hive.jdbc.HiveConnection - Will try to open client transport with JDBC Uri: jdbc:hive2://W12345:10000/control;principal=hive/W12345.companynet.net#COMPANYNET.NET
<<< Invocation of Main class completed <<<
Oozie Launcher ends
1785 [main] INFO org.apache.hadoop.mapred.Task - Task:attempt_1464245290012_0129_m_000000_0 is done. And is in the process of committing
1847 [main] INFO org.apache.hadoop.mapred.Task - Task attempt_1464245290012_0129_m_000000_0 is allowed to commit now
1854 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_1464245290012_0129_m_000000_0' to hdfs://danskehadoop/user/user123/oozie-oozi/0000013-160527101253015-oozie-oozi-W/JavaAction--java/output/_temporary/1/task_1464245290012_0129_m_000000
1909 [main] INFO org.apache.hadoop.mapred.Task - Task 'attempt_1464245290012_0129_m_000000_0' done.
But in reality the Java main does not complete correctly the execution (and does not execute the needed queries) because the JDBC connection fails with an exception that I can see only in the Hive log:
ERROR [HiveServer2-Handler-Pool: Thread-78363]: server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:328)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 10 more
I'm actually connected to the cluster, and already done further kinit on my username.
Does anybody know what could the cause of this exception be?
Thanks in advance for the help!
Antonio
This happened to me on MapR hadoop distribution platform.
In my case it was Keepalived checking Hive port every 5 seconds and producing such error. I simply used "nc" command to check if Hive port is in use and did not use any authentication method. Later I switched to "maprcli" command which uses SASL authentication and the error was gone.

Datanode error : NameSystem.getDatanode

Help please Folks
I am trying to set up my Hadoop multinode env (1 master, 1 secondary and 3 slaves - hadoop 2.7.1/Ubuntu 14 on AWS) and i am getting "NameSystem.getDatanode" ERROR message. I browsed and read and tried but reach my limits. Could you point me at least in some direction
Logs (extract) from master - xxx-141/142/143 are the ip of the slaves
'''''''''''''''''''''''''''''''''''
Line 134: 2016-01-23 17:36:19,432 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.143:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 172.31.22.141:50010 is expected to serve this storage.
Line 135: 2016-01-23 17:36:19,457 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 172.31.22.141:50010 is expected to serve this storage.
Line 159: 2016-01-23 17:36:20,988 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.141:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node XXX.XX.XX.143:50010 is expected to serve this storage.
Extract From SLAVE2 SERVER logs
'''''''''''''''''''''''''''''''''''''
2016-01-23 17:36:14,812 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
`2016-01-23 17:36:18,607 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x3c90bbfe60c, containing 1 storage report(s), of which we sent 0. The reports had 1 total blocks and used 0 RPC(s). This took 4 msec to generate and 144 msecs for RPC and NN processing. Got back no commands.
2016-01-23 17:36:18,608 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000 is shutting down
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(1XX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 1XX.XX.XX.141:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:495)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1791)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1315)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:163)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28543)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy13.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:199)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:463)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:688)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823)
at java.lang.Thread.run(Thread.java:745)
2016-01-23 17:36:18,610 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000
2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d)
2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-1050309752-MAST.XX.XX.169-1453113991010
2016-01-23 17:36:20,611 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2016-01-23 17:36:20,613 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2016-01-23 17:36:20,614 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at ip-SLAV-XX-XX-142/1XX.XX.XX.142
************************************************************/`
it looks like you have three slaves
172.31.22.141:50010
172.31.22.142:50010
172.31.22.143:50010
and you created two of them from a clone of the first slave, after the slave was already included in the cluster.
The two clones now already have a copy of the DFS and use the same storage ID as the first slave. Only one slave with the same ID is expected by the name server. It is trying to tell you this by logging:
[...] is attempting to report storage ID [...].
Node [...]:50010 is expected to serve this storage.
You can try removing the dfs directory on two of the slaves, then restarting them.
i.e. stop the slaves, do a rm -rf on the dfs directory like:
rm -rf /tmp/hadoop-hadoop/dfs/
You can then restart and check that all slaves are connecting and test file replication, e.g. by setting the replication level to 4 for some files like:
hdfs dfs -setrep -w 4 -R /user/somedir
The -w option causes the command to wait until replication has succeeded.