Configuration values for hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode in HIVE - hive

I am trying to add data to an external table using apache-hive. I am getting the following error in the hive logs
2015-06-15 17:27:44,614 ERROR [LocalJobRunner Map Task Executor #0]: mr.ExecMapper (ExecMapper.java:map(171)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"transactiondate":"05-01-2015 08:26:21","transactiontype":"CASHOUT","transactionid":144590889,"sourcenumber":null,"destnumber":null,"amount":19000,"assumedfield1":880,"customerid":33394093,"transactionstatus":"COMPLETED","assumedfield2":325,"assumedfield3":175870}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 256
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:933)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:709)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 10 more
I googled for this error and came across this link which says that we must change the values of hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode variables to higher values. What are the optimum configurations for these variables on a single node hadoop installation? None of these configuration values are working for me. Please help.

set hive.exec.max.dynamic.partitions=1000;
set hive.exec.max.dynamic.partitions.pernode=250;
Please do not try to increase hive partitions to higher value .
It may cause Namenode crash . If possible try to change the partition column and apply new logic over it

Related

Issue in saving the content of a dataframe to table

I have a data source (hive external tables) which refresh the data in adhoc manner. To avoid any discrepancies in the execution i'm trying to save the data as a table in my location.
Initially, i have loaded the data from data source to a dataframe
source = hqlContext.table("datasourcedb.table1") // this is working fine
Then, trying to save it the my application location -
source.write.mode('overwrite').saveAsTable("appdb.table1") //No read/write operations on appdb.table1 while doing this action
Above actions throwing exceptions:
java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: BLOCK
at org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:146)
at org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:138)
at org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:195)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:153)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.abortTask$1(WriterContainer.scala:294)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:271)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/03/02 04:31:32 ERROR TaskSetManager: Task 9 in stage 1.0 failed 4 times; aborting job
18/03/02 04:31:32 ERROR InsertIntoHadoopFsRelation: Aborting job.
**Note: The size of the source is abot 6GB. Hence, no persist action is planned **

Kylin build is failed at 3rd step

I am new to Kylin,I create kylin model and cube by following url,
http://kylin.apache.org/
initially it is successfull,again i created new cube for the same model,at that time cube build is failed at 3rd step as,
#3 Step Name: Extract Fact Table Distinct Columns
actually i have some duplicated rows,so i deleted those rows in hive and i did sync kylin tables with hive tables.But that is not completing that 3rd step.I gone through the logs,i find the following error,
2016-12-29 11:50:45,421 ERROR [IPC Server handler 18 on 46096] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1482297779079_0128_m_000000_0 - exited : java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.putRowKeyToHLL(FactDistinctHiveColumnsMapper.java:179)
at org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.map(FactDistinctHiveColumnsMapper.java:155)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
2016-12-29 11:50:45,421 INFO [IPC Server handler 18 on 46096] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1482297779079_0128_m_000000_0: Error: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.putRowKeyToHLL(FactDistinctHiveColumnsMapper.java:179)
at org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.map(FactDistinctHiveColumnsMapper.java:155)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
anybody please share any idea how to solve this one.what is cardinality means in kylin data sources.

Hive with Tez, No input paths specified in job

I have used hadoop-0.20.x.x, hive-0.11.0. I would talk about hive queries: with the specified configuration every thing is good and working fine.
Now, we have upgraded to hadoop-2.6.x (hadoop2)and hive-0.14.x. Also using Apache Tez.
The problem is, hadoop works as is. But hive sql queries doesn't.
The below query works fine in the older version's. But throw errors in the upgraded version's:
QUERY : SELECT abc.property_name, xyz.date, xyz.time, xyz.value_as_number, xyz.value_units FROM dbname.xyz JOIN dbname.abc ON (xyz.id = abc.src_id) WHERE xyz.person_id=138312;
EXCEPTION:
INFO : Session is already open
INFO : Tez session was closed. Reopening...
INFO : Session re-established.
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1435524970199_0035)
INFO : Map 1: -/- Map 2: -/-
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1435524970199_0035_1_00, diagnostics=[Vertex vertex_1435524970199_0035_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: concept initializer failed, vertex=vertex_1435524970199_0035_1_00 [Map 1], java.io.IOException: No input paths specified in job
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:328)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:130)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]
ERROR : Vertex failed, vertexName=Map 2, vertexId=vertex_1435524970199_0035_1_01, diagnostics=[Vertex vertex_1435524970199_0035_1_01 [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: observation initializer failed, vertex=vertex_1435524970199_0035_1_01 [Map 2], java.io.IOException: No input paths specified in job
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputPaths(HiveInputFormat.java:318)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:328)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:130)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]
ERROR : DAG failed due to vertex failure. failedVertices:2 killedVertices:0
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=2)
Exception says, No input path specified. Well, i understand and know how to do solve in haodop-mapreduce program. But, how do we do it using hive query. Anyway, i don't think this is the same.
To make out, i have used hive shell and beeline shell, hive returned expected output but, beeline returned the same exception as above.
The beauty of the problem is query on individual table works fine. But, when i try to work on the JOIN, it throws the above exception.
But, i have understood that, there's an impact of Apache Tez on my query. Can some one suggest the solution or pin point tez reference, so i could read and rewrite the query accordingly. Thanks
It worked by disabling apache tez.
Look's like apache tez isn't stable yet.

PriviledgedActionException: Able to populate Hbase via Hive, however, unable to query HBase via Hive

I'm using the current Cloudera Quick Start VM. I've created an Hive table with some data. Then, I've created an external table with the Hive Storage Handler. I was able to populate the HBase table. However, while quering the Hive/HBase table, I got the following error (NullpointerException):
14/04/16 01:18:51 ERROR security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:BeeswaxException(message:java.io.IOException: java.lang.NullPointerException, log_context:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11, handle:QueryHandle(id:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11, log_context:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11), SQLState: )
14/04/16 01:18:51 ERROR beeswax.BeeswaxServiceImpl: Caught BeeswaxException
BeeswaxException(message:java.io.IOException: java.lang.NullPointerException, log_context:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11, handle:QueryHandle(id:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11, log_context:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11), SQLState: )
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.fetch(BeeswaxServiceImpl.java:545)
at com.cloudera.beeswax.BeeswaxServiceImpl$5.run(BeeswaxServiceImpl.java:986)
at com.cloudera.beeswax.BeeswaxServiceImpl$5.run(BeeswaxServiceImpl.java:981)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at com.cloudera.beeswax.BeeswaxServiceImpl.doWithState(BeeswaxServiceImpl.java:772)
at com.cloudera.beeswax.BeeswaxServiceImpl.fetch(BeeswaxServiceImpl.java:980)
at com.cloudera.beeswax.api.BeeswaxService$Processor$fetch.getResult(BeeswaxService.java:987)
at com.cloudera.beeswax.api.BeeswaxService$Processor$fetch.getResult(BeeswaxService.java:971)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I embedded Guava, zookeeper, hbase and hive-hbase-handler JARs. I followed the instructions made in this tutorial: http://www.n10k.com/blog/hbase-via-hive-pt2/
I am using the current Cloudera-Quick-Start VM. Job and Task-Tracker logs as well as Beeswax logs are telling me nothing.
Do you have any ideas about what I am doing wrong?
I am thankfull for any advise!
Best regards, Lena
This is the solution:
Nullpointer exception in HBase MapReduce
The logs were misleading (for me). HBase or Hive was not able to resolve the NameNode.

Pig: STORE with MongoInsertStorage don't work

I'm executing this simple code in a pig script:
REGISTER /home/myuser/mongodb/mongo-2.10.1.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-core_cdh4.3.0-1.2.0.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-pig_cdh4.3.0-1.2.0.jar
set mapred.map.tasks.speculative.execution false;
set mapred.reduce.tasks.speculative.execution false;
col = LOAD 'mongodb://localhost:27017/mydb.mycollection' using com.mongodb.hadoop.pig.MongoLoader ('id:chararray, companyId:chararray, ts:chararray', 'id');
STORE col INTO 'mongodb://localhost:27017/mydb.mycollection2' USING com.mongodb.hadoop.pig.MongoInsertStorage ('', '');
it returns the following error:
Location Config: Configuration: For URI: file:/tmp/temp449583595/tmp-109467318
2014-04-04 14:30:40,913 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
Details at logfile: /home/myuser/pig/pig_1396614639609.log
the end of file pig_1396614639609.log:
... at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused
by: java.lang.IllegalArgumentException: Invalid URI Format. URIs must
begin with a mongodb:// protocol string. at
com.mongodb.hadoop.pig.MongoInsertStorage.setStoreLocation(MongoInsertStorage.java:159)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:576)
... 17 more
I don't know where is the error so that mongodb protocol string "mongodb://" is well-written.
I have a similar issue when running LOAD and STORE using mongo-hadoop on the same Pig script.
It throws
java.net.UnknownHostException: localhost:27017 is not a valid Inet address
at org.apache.hadoop.net.NetUtils.verifyHostnames(NetUtils.java:587)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:734)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I didn't investigate further, but either is a bug or some parameter related to locking. I don't know.
If I run the same code, but loading and storing in different scripts it runs without a problem.