Q: How Can I use data flow with ActiveMQ in Esper? - activemq

I am Learning Esper, and I want to learn about Data flow more.
I used ActiveMQ, so I want to using data flow and EsperIO to connect ActiveMQ.
I know about AMQP using EsperIO and ActiveMQ, so I hope that is work.
but my source is got error and they didn't find ActiveMQ. So, How I can Connect with ActiveMQ and EsperIO via Data Flow EPL clause?
bellows, my Error text :
Exception in thread "Thread-1" java.lang.NoClassDefFoundError: Lcom/rabbitmq/client/Connection;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Unknown Source)
at java.lang.Class.getDeclaredFields(Unknown Source)
at com.espertech.esper.util.JavaClassHelper.findFieldInternal(JavaClassHelper.java:1799)
at com.espertech.esper.util.JavaClassHelper.findAnnotatedFields(JavaClassHelper.java:1784)
at com.espertech.esper.dataflow.core.DataFlowServiceImpl.injectObjectProperties(DataFlowServiceImpl.java:338)
at com.espertech.esper.dataflow.core.DataFlowServiceImpl.instantiateOperator(DataFlowServiceImpl.java:320)
at com.espertech.esper.dataflow.core.DataFlowServiceImpl.instantiateOperators(DataFlowServiceImpl.java:294)
at com.espertech.esper.dataflow.core.DataFlowServiceImpl.instantiateInternal(DataFlowServiceImpl.java:226)
at com.espertech.esper.dataflow.core.DataFlowServiceImpl.instantiate(DataFlowServiceImpl.java:126)
at com.espertech.esper.dataflow.core.DataFlowServiceImpl.instantiate(DataFlowServiceImpl.java:113)
at Esper.EventReciveThread.run(EventReciveThread.java:40)
at java.lang.Thread.run(Unknown Source)
and bellows, is my EPL code :
"create dataflow writeAMQP EventBusSource -> outstream<Event>{} AMQPSink(outstream){host:'localhost', queueName: 'testQueue', collector:{class:'ObjectToAMQPCollectorSerializable'}}

Related

HQL agg execution got ‘Unable to load credentials from service endpoint’ error

I use minio as the hive storage system, and there is no problem when I execute query statements like 'select * from table'.
But when I execute agg query like 'select max(age) from student',then I got an error:
java.nio.file.AccessDeniedException: hive: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:187)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:236)
at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:375)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:311)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2610)
at org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2606)
at org.apache.hadoop.hive.ql.exec.Utilities$GetInputPathsCallable.call(Utilities.java:3432)
at org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3370)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:359)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:159)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1166)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:762)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:724)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1344)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1284)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:376)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
... 39 more
Should I add some config in my fs system?
Yes, the same issue facing by us. You can open a tunnel from your local machine to the amazon instance to check the access.
In medium an article says a custom class kind of solution.
https://medium.com/expedia-group-tech/service-slow-to-retrieve-aws-credentials-ebc02a38e95b

Spark job gets stuck indefinitely while trying to fetch data from ignite cluster

private static final ThreadLocal<IgniteClient> igniteClientContext = new ThreadLocal<>();
public static IgniteClient getIgniteClient(String[] address) {
if(igniteClientContext.get() == null) {
ClientConfiguration clientConfig = null;
if(cfg == null) {
clientConfig = new ClientConfiguration().setAddresses(address);
} else {
clientConfig = cfg;
}
IgniteClient igniteClient = Ignition.startClient(clientConfig);
logger.info("igniteClient initialized ");
igniteClientContext.set(igniteClient);
}
return igniteClientContext.get();
}
From spark code, I'm trying to create instance of ignite thin client and create cache object.
val address = config.igniteServers.split(",") // config.igniteServers ="10.xx.xxx.xxx:10800,10.xx.xx.xxx:10800"
Below code will be called from spark executor. We will be processing set or records in each executor and we are only reading data from cache and comparing with currently processing record. If it is already present in cache, we will ignore otherwise we will consume it.
val cacheCfg = new ClientCacheConfiguration()
.setName(PNR_CACHE)
.setCacheMode(CacheMode.REPLICATED)
.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC)
.setDefaultLockTimeout(30000)
val igniteClient = IgniteHelper.getIgniteClient(address)
val cache : ClientCache[Long, Boolean] = igniteClient.getOrCreateCache(cacheCfg);
At the end of the job, we will be updating cache with all valid records.
This has been running fine for couple of runs and at some point, it gets stuck indefinitely while trying to read data from cache.
In Executor logs, I can see IgniteClusterUnavailable exception.
org.apache.ignite.client.ClientConnectionException: Ignite cluster is unavailable [sock=Socket[addr=hdpct2ldap01g02.hadoop.sgdcprod.XXXX.com/10.xx.xx.xx,port=10800,localport=20214]]
at org.apache.ignite.internal.client.thin.TcpClientChannel.handleIOError(TcpClientChannel.java:499)
at org.apache.ignite.internal.client.thin.TcpClientChannel.handleIOError(TcpClientChannel.java:491)
at org.apache.ignite.internal.client.thin.TcpClientChannel.access$100(TcpClientChannel.java:92)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.read(TcpClientChannel.java:538)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.readInt(TcpClientChannel.java:572)
at org.apache.ignite.internal.client.thin.TcpClientChannel.processNextResponse(TcpClientChannel.java:272)
at org.apache.ignite.internal.client.thin.TcpClientChannel.receive(TcpClientChannel.java:234)
at org.apache.ignite.internal.client.thin.TcpClientChannel.service(TcpClientChannel.java:171)
at org.apache.ignite.internal.client.thin.ReliableChannel.service(ReliableChannel.java:160)
at org.apache.ignite.internal.client.thin.ReliableChannel.request(ReliableChannel.java:187)
at org.apache.ignite.internal.client.thin.TcpIgniteClient.getOrCreateCache(TcpIgniteClient.java:124)
at com.XXXX.eda.pnr.PnrApplication$$anonfun$2$$anonfun$apply$4.apply(PnrApplication.scala:305)
at com.XXXX.eda.pnr.PnrApplication$$anonfun$2$$anonfun$apply$4.apply(PnrApplication.scala:297)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:217)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1094)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1085)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1020)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1085)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:811)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:381)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection timed out (Read failed)
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.read(TcpClientChannel.java:535)
... 24 more
20/06/21 05:49:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1949
20/06/21 05:49:42 INFO executor.Executor: Running task 103.1 in stage 25.0 (TID 1949)
Threaddump contains below exception as well.
20/06/21 05:51:57 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.93.133.157:20952 remote=host.hadoop.sgdcprod.XXXX.com/10.93.133.136:1004]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2354)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:212)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:451)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
at org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3593)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:849)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:764)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:666)
at org.apache.hadoop.hdfs.DFSInputStream.seekToBlockSource(DFSInputStream.java:1663)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:877)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:913)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:981)
at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:197)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.avro.mapred.FsInput.read(FsInput.java:54)
at org.apache.avro.file.DataFileReader$SeekableInputStream.read(DataFileReader.java:210)
at org.apache.avro.io.BinaryDecoder$InputStreamByteSource.readRaw(BinaryDecoder.java:824)
at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:349)
at org.apache.avro.io.BinaryDecoder.readFixed(BinaryDecoder.java:302)
at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
at com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1$$anon$1.hasNext(DefaultSource.scala:215)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1094)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1085)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1020)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1085)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:811)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:381)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/06/21 05:51:57 WARN hdfs.DFSClient: Failed to connect to host.hadoop.sgdcprod.XXXX.com/10.93.133.136:1004 for block BP-1009813635-10.93.133.107-1555169940973:blk_1182405155_108738113, add to deadNodes and continue.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.93.133.157:20952 remote=host.hadoop.sgdcprod.XXXX.com/10.93.133.136:1004]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2354)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessage(DataTransferSaslUtil.java:212)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:451)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
at org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3593)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:849)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:764)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:666)
at org.apache.hadoop.hdfs.DFSInputStream.seekToBlockSource(DFSInputStream.java:1663)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:877)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:913)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:981)
at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:197)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.avro.mapred.FsInput.read(FsInput.java:54)
at org.apache.avro.file.DataFileReader$SeekableInputStream.read(DataFileReader.java:210)
at org.apache.avro.io.BinaryDecoder$InputStreamByteSource.readRaw(BinaryDecoder.java:824)
at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:349)
at org.apache.avro.io.BinaryDecoder.readFixed(BinaryDecoder.java:302)
at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
at com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1$$anon$1.hasNext(DefaultSource.scala:215)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1094)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1085)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1020)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1085)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:811)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:381)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
We are setting defaultReadTimeout in spark.properties file. But it is not getting timedout correctly.
spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseG1GC -Dsun.net.client.defaultReadTimeout:300000 -Dsun .net.client.defaultConnectTimeout=300000 -DIGNITE_REST_START_ON_CLIENT=true
spark.driver.exetraJavaOptions=-Dsun.net.client.defaultReadTimeout:300000 -Dsun.net.client.defaultConnectTimeout=300000 -DIGNITE_REST_START_ON_CLIENT=true
Please help in resolving the issue.
Ignite version using : 2.8.0 & 2.8.1
There is some connectivity problem with HDFS and Ignite cluster from your Spark cluster:
HDFS:
WARN hdfs.DFSClient: Failed to connect to host.hadoop.sgdcprod.XXXX.com/10.93.133.136:1004 for block BP-1009813635-10.93.133.107-1555169940973:blk_1182405155_108738113, add to deadNodes and continue.
And for Ignite:
Caused by: java.net.SocketException: Connection timed out (Read failed)
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
Taking into account that your spark job wasn't able to connect both of them then I guess that you have the problem with connectivity there.
However please check the following for HDFS:
1)Some of your Spark workers can't connect host.hadoop.sgdcprod.XXXX.com/10.93.133.136:1004 address because of closed ports or some connectivity issues. You can try to check this address via some tool like netstat.
2)You are going to use an incorrect HDFS port. Please check that the HDFS URL of namenode used in your Spark job code is correct.
3)Something wrong with your configuration of HDFS workers. Probably some of them can't connect to name node.
And for IGNITE:
1)Check that your cluster is alive and doesn't hang. I hope that your cluster was started somewhere outside and can be available via some monitoring tool.
2)Check that server addresses from IP finder can be resolved from every Spark worker.

Why is my RabbitMQ message impossible to serialize using Apache Beam?

I'm trying to read a RabbitMQ queue using Apache Beam.
I've written some transformation code to have messages written to Kafka.
I've even tested my scenario using simple text messages.
Now I try to deploy it with the effective messages this transformer is made to run on. These are JSON message of a quite moderate size.
Strangely, when i try to read "production" messages, I get this exception stack trace.
java.lang.IllegalArgumentException: Unable to encode element 'ValueWithRecordId{id=[], value=org.apache.beam.sdk.io.rabbitmq.RabbitMqMessage#f179a7f}' with coder 'ValueWithRecordId$ValueWithRecordIdCoder(org.apache.beam.sdk.coders.SerializableCoder#76190fb2)'.
org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:300)
org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:564)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:480)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)
org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:64)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1283)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:147)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1020)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Caused by: java.io.NotSerializableException: com.rabbitmq.client.impl.LongStringHelper$ByteArrayLongString
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
java.util.HashMap.internalWriteEntries(HashMap.java:1785)
java.util.HashMap.writeObject(HashMap.java:1362)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
org.apache.beam.sdk.coders.SerializableCoder.encode(SerializableCoder.java:183)
org.apache.beam.sdk.coders.SerializableCoder.encode(SerializableCoder.java:53)
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105)
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:99)
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81)
org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:297)
org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:564)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:480)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)
org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:64)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1283)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:147)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1020)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
My understanding is that the RabbitMQ reader consider the messages big enough to require the use of LongString, which is not serializable.
Am I right on this point ? And if so, how do I suggest RabbitMQ to use a simple String (which will be enough for these messages) ?
This is an Apache Beam (https://issues.apache.org/jira/browse/BEAM-7414) for which solution is ... not yet merged into Apache Beam repo by pure laziness (this is bad). If someone wants to have the fix immediatly, it is possible to build my branch https://github.com/Riduidel/beam/tree/fix/rabbitmq-message-not-serializable

OWL 2 Reasoners and custom datatypes not working

I am trying to run the Protege Reasoner, there are two reasoner available Fact++ and HermiT 1.3.7.
I tried to run the both but their is window comes and suddenly goes. It's difficult to see that one so I use the screen recorder to get that but it doesn't contain any information.
There is no error message or log message I can found.
The reasoner is not being started.
I tried to use the option Export inferred axioms as ontology and then I get the expected error message "No Reasoner intialized"
Please suggest.
EDIT 1:
Error log due to the new data type defined:
org.semanticweb.owlapi.reasoner.ReasonerInternalException: Unsupported datatype
'http://www.semanticweb.org/q49f318b/ontologies/2014/6/untitled-ontology-6#perce
ntage'
at uk.ac.manchester.cs.factplusplus.FaCTPlusPlus.getBuiltInDataType(Nati
ve Method)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasoner.toData
TypePointer(Unknown Source)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasoner$AxiomT
ranslator$DeclarationVisitorEx.visit(Unknown Source)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasoner$AxiomT
ranslator$DeclarationVisitorEx.visit(Unknown Source)
at uk.ac.manchester.cs.owl.owlapi.OWLDatatypeImpl.accept(OWLDatatypeImpl
.java:338)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasoner$AxiomT
ranslator.visit(Unknown Source)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasoner$AxiomT
ranslator.visit(Unknown Source)
at uk.ac.manchester.cs.owl.owlapi.OWLDeclarationAxiomImpl.accept(OWLDecl
arationAxiomImpl.java:128)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasoner.loadAx
iom(Unknown Source)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasoner.loadRe
asonerAxioms(Unknown Source)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasoner.<init>
(Unknown Source)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasonerFactory
.createReasoner(Unknown Source)
at uk.ac.manchester.cs.factplusplus.owlapiv3.FaCTPlusPlusReasonerFactory
.createReasoner(Unknown Source)
at org.protege.editor.owl.model.inference.ReasonerUtilities.createReason
er(ReasonerUtilities.java:21)
at org.protege.editor.owl.model.inference.OWLReasonerManagerImpl$Classif
icationRunner.ensureRunningReasonerInitialized(OWLReasonerManagerImpl.java:398)
at org.protege.editor.owl.model.inference.OWLReasonerManagerImpl$Classif
icationRunner.run(OWLReasonerManagerImpl.java:354)
at java.lang.Thread.run(Thread.java:745)
Ontology in use:
http://pastebin.com/L0heDLBy
The stack trace confirms that FaCT++ does not allow custom datatypes to be specified - this is currently a known limitation. I don't know about HermiT but there's a good chance the problem is the same.
Pellet used to have more flexible datatype treatment, you can try installing it and use it instead of HermiT or FaCT++

Create back of hbase data on S3 and the restore

I had hbase cluster running on amazon ec2 nodes. I want to create the backup of my hbase table. So, I came up with this tool. I was able to create the back up of table dummy on s3 using the following command :
java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.full backup.folder=s3://mybucket/ tables=dummy
But when i tried to restore the same data on some table(model). It failed with the following :
`13/10/24 10:52:52 WARN mapred.FileOutputCommitter: Output path is null in cleanup
13/10/24 10:52:52 WARN mapred.LocalJobRunner: job_local_0002
java.lang.NullPointerException
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveBlock(Jets3tFileSystemStore.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy5.retrieveBlock(Unknown Source)
at org.apache.hadoop.fs.s3.S3InputStream.blockSeekTo(S3InputStream.java:160)
at org.apache.hadoop.fs.s3.S3InputStream.read(S3InputStream.java:119)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
13/10/24 10:52:53 INFO mapred.JobClient: Job complete: job_local_0002
13/10/24 10:52:53 INFO mapred.JobClient: Counters: 0
Error in Job completetion Params
tablename inputputdir
model s3://mybucket/Wed_Oct_23_19_45_49_IST_2013/model
Access Failure to s3://mybucket/Wed_Oct_23_19_45_49_IST_2013/model , tries=1
`.
java com.bizosys.oneline.maintenance.HBaseBackup mode=restore backup.folder=s3://mybucket/Wed_Oct_23_19_45_49_IST_2013 tables="model"
FYI, please don't suggest me that there is an option of installation of hbase as well as back up on EMR. That i know but for some reason i am not using it.