Failure while running hive query - Map operator initialization failed - OOM - hive

I am executing hive query which is created by joining 3 tables.But I am geting below error i.e.Please let me know what this error is and how can I resolve this error.
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1535088835132_0435_1_02, diagnostics=[Task failed, taskId=task_1535088835132_0435_1_02_000028, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:262)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:379)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:247)
... 15 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:387)
... 20 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.io.Text.setCapacity(Text.java:268)
at org.apache.hadoop.io.Text.set(Text.java:224)
at org.apache.hadoop.io.Text.set(Text.java:214)
at org.apache.hadoop.io.Text.<init>(Text.java:93)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.copyObject(WritableStringObjectInspector.java:36)
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:311)
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:346)
at org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject.read(MapJoinKeyObject.java:112)
at org.apache.hadoop.hive.ql.exec.persistence.MapJoinKey.read(MapJoinKey.java:68)
at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.putRow(HashMapWrapper.java:127)
at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:211)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:310)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:179)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:175)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:75)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:92)
... 4 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:262)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:379)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:247)
... 15 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:387)
... 20 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.hive.serde2.io.HiveDecimalWritable.getHiveDecimal(HiveDecimalWritable.java:95)
at org.apache.hadoop.hive.serde2.io.HiveDecimalWritable.hashCode(HiveDecimalWritable.java:167)
at java.util.Arrays.hashCode(Arrays.java:4146)
at org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject.hashCode(MapJoinKeyObject.java:83)
at java.util.HashMap.hash(HashMap.java:339)
at java.util.HashMap.put(HashMap.java:612)
at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.put(HashMapWrapper.java:107)
at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.putRow(HashMapWrapper.java:131)
at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:211)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:310)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:179)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:175)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:75)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:92)
... 4 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:262)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:379)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:247)
... 15 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:387)
... 20 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.math.BigDecimal.valueOf(BigDecimal.java:1202)
at java.math.BigDecimal.createAndStripZerosToMatchScale(BigDecimal.java:4401)
at java.math.BigDecimal.stripTrailingZeros(BigDecimal.java:2598)
at org.apache.hadoop.hive.common.type.HiveDecimal.trim(HiveDecimal.java:238)
at org.apache.hadoop.hive.common.type.HiveDecimal.normalize(HiveDecimal.java:252)
at org.apache.hadoop.hive.common.type.HiveDecimal.create(HiveDecimal.java:71)
at org.apache.hadoop.hive.serde2.io.HiveDecimalWritable.getHiveDecimal(HiveDecimalWritable.java:95)
at org.apache.hadoop.hive.serde2.io.HiveDecimalWritable.hashCode(HiveDecimalWritable.java:167)
at java.util.Arrays.hashCode(Arrays.java:4146)
at org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject.hashCode(MapJoinKeyObject.java:83)
at java.util.HashMap.hash(HashMap.java:339)
at java.util.HashMap.put(HashMap.java:612)
at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.put(HashMapWrapper.java:107)
at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.putRow(HashMapWrapper.java:131)
at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:211)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:310)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:179)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:175)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:75)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:92)
... 4 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:262)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:379)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:247)
... 15 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:387)
... 20 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.hive.common.type.HiveDecimal.create(HiveDecimal.java:71)
at org.apache.hadoop.hive.serde2.io.HiveDecimalWritable.getHiveDecimal(HiveDecimalWritable.java:95)
at org.apache.hadoop.hive.serde2.io.HiveDecimalWritable.hashCode(HiveDecimalWritable.java:167)
at java.util.Arrays.hashCode(Arrays.java:4146)
at org.apache.hadoop.hive.ql.exec.persistence.MapJoinKeyObject.hashCode(MapJoinKeyObject.java:83)
at java.util.HashMap.hash(HashMap.java:339)
at java.util.HashMap.put(HashMap.java:612)
at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.put(HashMapWrapper.java:107)
at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.putRow(HashMapWrapper.java:131)
at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:211)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:310)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:179)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:175)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:75)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:92)
... 4 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:59, Vertex vertex_1535088835132_0435_1_02 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1535088835132_0435_1_03, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:51, Vertex vertex_1535088835132_0435_1_03 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
(less...)

According to the log provided, you are running map join on Tez and it failed on loading hashMap. And the exception is java.lang.OutOfMemoryError: GC overhead limit exceeded.
Try to increase mapper container size and java heap size. Check your current configuration and increase accordingly. This is only example:
set hive.tez.container.size=9216;
set hive.tez.java.opts=-Xmx6144m;
Good tuning guide with explanation about these configuration parameters can be found here: Demystify Apache Tez Memory Tuning - Step by Step
If nothing helps, maybe the table is too big to fit in memory, consider decreasing the value of hive.auto.convert.join.noconditionaltask.size to force the query to use a Shuffle Join for the table of this big size.

Related

Idea debug Error: Cannot initialize Kotlin context: org/jetbrains/kotlin/platform/jvm/JvmPlatforms

idea debug then the error is happen ...............................
is it possible any plugins auto update ?
recently idea runs well suddently this problem happens
Error:Kotlin: Error: Cannot initialize Kotlin context: org/jetbrains/kotlin/platform/jvm/JvmPlatforms
java.lang.Error: Cannot initialize Kotlin context: org/jetbrains/kotlin/platform/jvm/JvmPlatforms
at org.jetbrains.kotlin.jps.build.KotlinBuilder.ensureKotlinContextInitialized(KotlinBuilder.kt:124)
at org.jetbrains.kotlin.jps.build.KotlinBuilder.chunkBuildStarted(KotlinBuilder.kt:176)
at org.jetbrains.jps.incremental.IncProjectBuilder.runModuleLevelBuilders(IncProjectBuilder.java:1274)
at org.jetbrains.jps.incremental.IncProjectBuilder.runBuildersForChunk(IncProjectBuilder.java:1006)
at org.jetbrains.jps.incremental.IncProjectBuilder.buildTargetsChunk(IncProjectBuilder.java:1073)
at org.jetbrains.jps.incremental.IncProjectBuilder.buildChunkIfAffected(IncProjectBuilder.java:967)
at org.jetbrains.jps.incremental.IncProjectBuilder.buildChunks(IncProjectBuilder.java:796)
at org.jetbrains.jps.incremental.IncProjectBuilder.runBuild(IncProjectBuilder.java:378)
at org.jetbrains.jps.incremental.IncProjectBuilder.build(IncProjectBuilder.java:178)
at org.jetbrains.jps.cmdline.BuildRunner.runBuild(BuildRunner.java:140)
at org.jetbrains.jps.cmdline.BuildSession.runBuild(BuildSession.java:297)
at org.jetbrains.jps.cmdline.BuildSession.run(BuildSession.java:130)
at org.jetbrains.jps.cmdline.BuildMain$MyMessageHandler.lambda$channelRead0$0(BuildMain.java:232)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/jetbrains/kotlin/platform/jvm/JvmPlatforms
at org.jetbrains.kotlin.platform.DefaultIdeTargetPlatformKindProvider$Companion.getDefaultPlatform(DefaultIdeTargetPlatformKindProvider.kt:21)
at org.jetbrains.kotlin.jps.targets.KotlinTargetsIndexBuilder.detectTargetPlatform(KotlinTargetsIndex.kt:156)
at org.jetbrains.kotlin.jps.targets.KotlinTargetsIndexBuilder.access$detectTargetPlatform(KotlinTargetsIndex.kt:31)
at org.jetbrains.kotlin.jps.targets.KotlinTargetsIndexBuilder$ensureLoaded$1.apply(KotlinTargetsIndex.kt:138)
at org.jetbrains.kotlin.jps.targets.KotlinTargetsIndexBuilder$ensureLoaded$1.apply(KotlinTargetsIndex.kt:31)
at java.util.HashMap.computeIfAbsent(HashMap.java:1126)
at org.jetbrains.kotlin.jps.targets.KotlinTargetsIndexBuilder.ensureLoaded(KotlinTargetsIndex.kt:137)
at org.jetbrains.kotlin.jps.targets.KotlinTargetsIndexBuilder.build(KotlinTargetsIndex.kt:45)
at org.jetbrains.kotlin.jps.build.KotlinCompileContext.<init>(KotlinCompileContext.kt:60)
at org.jetbrains.kotlin.jps.build.KotlinBuilder.initializeKotlinContext(KotlinBuilder.kt:134)
at org.jetbrains.kotlin.jps.build.KotlinBuilder.ensureKotlinContextInitialized(KotlinBuilder.kt:122)
... 15 more
Caused by: java.lang.ClassNotFoundException: org.jetbrains.kotlin.platform.jvm.JvmPlatforms
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 26 more

Error while reading data in Hive using Dbeaver

I am trying to select a data using dbeaver with the following query:
Select * from tablename where datecol='';
Above query works fine with major dates, however for one date it is throwing below error:
SQL Error [2] [08S01]: Error while processing statement: FAILED:
Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Map 2, vertexId=vertex_1663507669262_19026_3_00,
diagnostics=[Task failed, taskId=task_1663507669262_19026_3_00_000009,
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running
task ( failure ) :
attempt_1663507669262_19026_3_00_000009_0:java.lang.RuntimeException:
java.lang.RuntimeException: java.io.IOException:
com.google.protobuf.InvalidProtocolBufferException: Protocol message
tag had invalid wire type.
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

Using Pandas UDF with Large Broadcast object

I am trying to use Pandas UDF GROUPEDMAP to do some processing. The code is below:
from pyspark.sql.functions import pandas_udf, PandasUDFType, spark_partition_id
models_broadcast = self.spark_session.sparkContext.broadcast(models)
#pandas_udf("id string, score string",
PandasUDFType.GROUPED_MAP)
def _segment_partition_score(segment_partition_pd):
my_models = models_broadcast.value # comment this line, the code ran run through
segment_partition_pd["score"] = "aaa"
segment_score_pd = segment_partition_pd[['id', 'score']]
return segment_score_pd
model_score = my_data_df.withColumn("pid", spark_partition_id()).groupby('pid').apply(_segment_partition_score)
model_score.show(100)
Here is the error message. It simply states " java.net.SocketException: Connection reset"
y4JJavaError: An error occurred while calling o1525.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 31.0 failed 4 times, most recent failure: Lost task 2.3 in stage 31.0 (TID 2466, 10.139.64.43, executor 10): java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:181)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:144)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:494)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:159)
at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:158)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
at org.apache.spark.scheduler.Task.run(Task.scala:113)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2362)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2350)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2349)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2349)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1102)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1102)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1102)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2582)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2529)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2517)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:897)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2280)
at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:270)
at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:280)
at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:80)
at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:86)
at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:508)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollectResult(limit.scala:57)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectResult(Dataset.scala:2905)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3517)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2634)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2634)
at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3501)
at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3496)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1$$anonfun$apply$1.apply(SQLExecution.scala:112)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:217)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:98)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:835)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:74)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:169)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3496)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2634)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2848)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:279)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:316)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:181)
at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:144)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:494)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:159)
at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:158)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
at org.apache.spark.scheduler.Task.run(Task.scala:113)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
The broadcast object is a pickle object which is around ~100 MB. It is a trained ski-learn machine learning model.
But you may find, in the UDF _segment_partition_score, I actually didn't use this broadcast object. I just want to get it at the UDF, for debugging purpose. Even this, the pyspark program will crash.
If I don't receive this broadcast object in the Pandas UDF, the program can run through, and generate result.
The dataframe "my_data_df" is very small (700 MB in Parquet) . I am sure given the existing cluster size (64GB) * 20 workers, it should be no problem to deal with it.
I highly suspect that it is the problem of serialization for the broadcast object, either size or type of serializer. But I have no idea what should I tune.
Could anyone point me out?
Thanks in advance.

HIVE NIFI ORC java.lang.IllegalArgumentException: Column has wrong number of index entries found: 0 expected: 1

I have a pipeline in NIFI and everything works fine except sometimes(very rarely) some record go to the retry queue in PUTHIVESTREAMING processor when trying to write to HIVE.
The nifi logs show the exception
ERROR [put-hive-streaming-0] o.a.h.h.streaming.AbstractRecordWriter Unable to close org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater[hdfs://hdfscluster/user/hive/metastore/bi_sureyield_db.db/bi_sureyield_events/year=2020/month=1/day=10/delta_127318799_127318808/bucket_00000] due to: Column has wrong number of index entries found: 0 expected: 1
java.lang.IllegalArgumentException: Column has wrong number of index entries found: 0 expected: 1
at org.apache.orc.impl.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:695)
at org.apache.orc.impl.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:2147)
at org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:2661)
at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2834)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:321)
at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.close(OrcRecordUpdater.java:502)
at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.closeBatch(AbstractRecordWriter.java:218)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$6.run(HiveEndPoint.java:998)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$6.run(HiveEndPoint.java:995)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.closeImpl(HiveEndPoint.java:994)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.markDead(HiveEndPoint.java:760)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.commit(HiveEndPoint.java:854)
at org.apache.nifi.util.hive.HiveWriter$4.call(HiveWriter.java:237)
at org.apache.nifi.util.hive.HiveWriter$4.call(HiveWriter.java:234)
at org.apache.nifi.util.hive.HiveWriter.lambda$null$3(HiveWriter.java:373)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.nifi.util.hive.HiveWriter.lambda$callWithTimeout$4(HiveWriter.java:373)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-04-29 14:55:39,542 ERROR [put-hive-streaming-0] o.a.h.h.streaming.AbstractRecordWriter Unable to close org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater[hdfs://hdfscluster/user/hive/metastore/bi_sureyield_db.db/bi_sureyield_events/year=2020/month=1/day=10/delta_127318799_127318808/bucket_00001] due to: null
java.nio.channels.ClosedChannelException: null
at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1521)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.orc.impl.PhysicalFsWriter$BufferedStream.spillToDiskAndClear(PhysicalFsWriter.java:286)
at org.apache.orc.impl.PhysicalFsWriter.finalizeStripe(PhysicalFsWriter.java:337)
at org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:2665)
at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2834)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:321)
at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.close(OrcRecordUpdater.java:502)
at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.closeBatch(AbstractRecordWriter.java:218)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$6.run(HiveEndPoint.java:998)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$6.run(HiveEndPoint.java:995)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.closeImpl(HiveEndPoint.java:994)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.markDead(HiveEndPoint.java:760)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.commit(HiveEndPoint.java:854)
at org.apache.nifi.util.hive.HiveWriter$4.call(HiveWriter.java:237)
at org.apache.nifi.util.hive.HiveWriter$4.call(HiveWriter.java:234)
at org.apache.nifi.util.hive.HiveWriter.lambda$null$3(HiveWriter.java:373)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.nifi.util.hive.HiveWriter.lambda$callWithTimeout$4(HiveWriter.java:373)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I'm unable to understand where the issue lies and how to fix it.

Hive error : java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

I have hive query which run successful sometimes but maximum time gives an error "java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
Below is my error log
java.lang.RuntimeException: java.io.IOException: Couldn't create proxy
provider class org.apache.hadoop.hdfs.server.namenode.ha.Con\
figuredFailoverProxyProvider at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.isSplitable(CombineFileInputFormat.java:154)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:283)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:239)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:75)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:336)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:302)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:435)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:525)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:517)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:399)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at
org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564) at
org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)
at
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
at
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1516) at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1283) at
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1101) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:924) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:914) at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464)
at
org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694) at
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by:
java.io.IOException: Couldn't create proxy provider class
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverPr\
oxyProvider at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:475)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:148)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:632) at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:570) at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:147)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.isSplitable(CombineFileInputFormat.java:151)
... 45 more Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown
Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:458)
... 53 more Caused by: java.lang.OutOfMemoryError: GC overhead limit
exceeded at java.util.Arrays.copyOf(Arrays.java:2219) at
java.util.ArrayList.grow(ArrayList.java:242) at
java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216) at
java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208) at
java.util.ArrayList.add(ArrayList.java:440) at
java.lang.String.split(String.java:2288) at
sun.net.util.IPAddressUtil.textToNumericFormatV4(IPAddressUtil.java:47)
at java.net.InetAddress.getAllByName(InetAddress.java:1129) at
java.net.InetAddress.getAllByName(InetAddress.java:1098) at
java.net.InetAddress.getByName(InetAddress.java:1048) at
org.apache.hadoop.security.SecurityUtil$StandardHostResolver.getByName(SecurityUtil.java:474)
at
org.apache.hadoop.security.SecurityUtil.getByName(SecurityUtil.java:461)
at
org.apache.hadoop.net.NetUtils.createSocketAddrForHost(NetUtils.java:235)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:215)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152)
at
org.apache.hadoop.hdfs.DFSUtil.getAddressesForNameserviceId(DFSUtil.java:677)
at
org.apache.hadoop.hdfs.DFSUtil.getAddressesForNsIds(DFSUtil.java:645)
at org.apache.hadoop.hdfs.DFSUtil.getAddresses(DFSUtil.java:628) at
org.apache.hadoop.hdfs.DFSUtil.getHaNnRpcAddresses(DFSUtil.java:727)
at
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.(ConfiguredFailoverProxyProvider.java:88)
at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown
Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:458)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:148)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:632) at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:570) at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:147)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) Job
Submission failed with exception
'java.lang.RuntimeException(java.io.IOException: Couldn't create proxy
provider class org.apac\
he.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider)'
Could anyone tell me why this happen?
I just stumbled across a similar exception myself, and increasing the hive client heap didn't help. I found I was able to clear up the OutOfMemory GC Overhead exception by adding a partition column to the where clause of the query, so I've concluded that having a very large number of splits is causing this exception. I haven't dug into the code, but I believe I've seen this happen with string concatenation in a loop triggering gc thrashing, and something similar might be happening in the CombineHiveInputFormat.getSplits method.