Unable to increase hive dynamic partitions in spark using spark-sql - dynamic

I am running a hive query which selects data from a table and inserts result into another hive partitioned table using spark-sql. While inserting it requires 1536 partitions. But spark is not able to insert data with 1536 partitions eventhough I increased max partitions to 2000.
Below is command:
spark-sql --master yarn --num-executors 14 --executor-memory 45G
--executor-cores 30 --driver-memory 10G --conf spark.dynamicAllocation.enabled=false -e "SET
hive.exec.dynamic.partition = true;SET
hive.exec.dynamic.partition.mode = nonstrict;SET
hive.exec.max.dynamic.partitions = 2000; insert into table
weatherdata_part_rv.weather_data_daily_model_location_mapping_rv
partition (model_id,record_date) select
y.rec_id,x.municipal_id,x.model_id,y.record_date from (select * from
weatherdata_part_rv.model_location_xref) x left outer join
weatherdata_part_rv.weather_data_daily y on
x.municipal_id=y.weather_station_id;"
Error stack:
spark-sql --master yarn --num-executors 14 --executor-memory 45G --executor-cores 30 --driver-memory 10G --conf spark.dynamicAllocation.enabled=false -e "SET hive.exec.dynamic.partition = true;SET hive.exec.dynamic.partition.mode = nonstrict;SET hive.exec.max.dynamic.partitions = 2000;
> insert into table weatherdata_part_rv.weather_data_daily_model_location_mapping_rv partition (model_id,record_date) select y.rec_id,x.municipal_id,y.temprature_min_in_celcius,y.temprature_max_in_celcius,y.rainfall_in_mm,y.relative_humidity_min,y.relative_humidity_max,y.radiation_max,y.wind_intensity,y.wind_direction,y.cloud_coverage,y.soil_temprature_in_celcius,y.water_quantity_in_soil,y.lmdt,y.icon,y.probablity_of_rainfall,y.rain_acc_20feb_onwards,x.model_id,y.record_date from (select * from weatherdata_part_rv.model_location_xref) x left outer join weatherdata_part_rv.weather_data_daily y on x.municipal_id=y.weather_station_id;"
17/05/12 09:44:05 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/05/12 09:44:05 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/05/12 09:44:08 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
hive.exec.dynamic.partition true
Time taken: 1.874 seconds, Fetched 1 row(s)
hive.exec.dynamic.partition.mode nonstrict
Time taken: 0.67 seconds, Fetched 1 row(s)
hive.exec.max.dynamic.partitions 2000
Time taken: 0.047 seconds, Fetched 1 row(s)
17/05/12 09:58:30 ERROR SparkSQLDriver: Failed in [
insert into table weatherdata_part_rv.weather_data_daily_model_location_mapping_rv partition (model_id,record_date) select y.rec_id,x.municipal_id,y.temprature_min_in_celcius,y.temprature_max_in_celcius,y.rainfall_in_mm,y.relative_humidity_min,y.relative_humidity_max,y.radiation_max,y.wind_intensity,y.wind_direction,y.cloud_coverage,y.soil_temprature_in_celcius,y.water_quantity_in_soil,y.lmdt,y.icon,y.probablity_of_rainfall,y.rain_acc_20feb_onwards,x.model_id,y.record_date from (select * from weatherdata_part_rv.model_location_xref) x left outer join weatherdata_part_rv.weather_data_daily y on x.municipal_id=y.weather_station_id]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:823)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:689)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272)
at org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:687)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:796)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)
at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:784)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:268)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:168)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1536, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1536.
at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1578)
... 48 more
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:823)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:689)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272)
at org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:687)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:796)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)
at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:784)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:268)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:168)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1536, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1536.
at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1578)
... 48 more
Is there any limit in maximum hive partitions in spark ?
If so, is there any way to increase maximum no of partitions ?

Can you add below property at hive-site.xml at spark_home/conf/hive-site.xml and hive-home/conf/hive-site.xml
hive.exec.max.dynamic.partitions=2000
<name>hive.exec.max.dynamic.partitions</name>
<value>2000</value>
<description></description>
hope this should resolve the issue.
If value is not picking up, try to restart the hs2 process.

i resolved this error by keeping partition column at end of the data frame.
check your column order in df and make at end while selecting in spark.sql

Related

hive execute merge into failed with HiveAuthzPluginException Invalid number of user privilege objects: 2

When execute below sql in hive
MERGE INTO foo a USING bar b ON (a.datamonth = b.datamonth AND a.productlink = b.productlink)
WHEN MATCHED THEN UPDATE SET product_name_keywords = b.keywords;
It throws below error
2022-10-28T21:06:44,274 ERROR [e5363ba5-57fc-4f0b-a8ab-e0e27eb324f5 HiveServer2-Handler-Pool: Thread-80] ql.Driver: FAILED: HiveAuthzPluginException Invalid number of user privilege objects: 2
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthzPluginException: Invalid number of user privilege objects: 2
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLAuthorizationUtils.getRequiredPrivsFromThrift(SQLAuthorizationUtils.java:328)
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLAuthorizationUtils.getPrivilegesFromMetaStore(SQLAuthorizationUtils.java:209)
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidator.checkPrivileges(SQLStdHiveAuthorizationValidator.java:145)
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidator.checkPrivileges(SQLStdHiveAuthorizationValidator.java:84)
at org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerImpl.checkPrivileges(HiveAuthorizerImpl.java:86)
at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1307)
at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1071)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:698)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:247)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy37.executeStatementAsync(Unknown Source)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
What is the reason and how to fix it?

How to use WHERE statement on JSON stored in Presto SQL column to filter?

In Presto, I have data for a column in a table is as follows:
header
header 2
{Data: [{'item1': 'stuff1', 'item2': 'stuff2', 'item3': 'stuff3'}, {...}]}
cell 2
{Data: [{'item1': 'stuff11', 'item2': 'stuff21', 'item3': 'stuff31'}, {...}]}
cell 4
I was able to SELECT using JSON syntax using:
SELECT header.Data[1].item1 FROM table
and returns:
header
stuff1
stuff11
However, if I want to filter the table using the WHERE statement:
SELECT * FROM table WHERE header.Data[1].item1 = 'stuff1'
The above statement threw an error and didn't work.
I would like to return something like
header
header 2
{Data: [{'item1': 'stuff1', 'item2': 'stuff2', 'item3': 'stuff3'}, {...}]}
cell 2
Any input would be helpful. Thanks
I've tried several other queries using SQL as well such as but all returned similar error:
WHERE header.Data[1].item1 = 'stuff1'
An example of the error:
Query:
`SELECT header.Data[1].item1 AS f FROM table WHERE f LIKE '%stuff%'
'''
An error occurred while calling o12.execute. : java.sql.SQLException: Query failed (#20220330_200148_01673_9bq5k): line 2:7: Column 'f' cannot be resolved at io.prestosql.jdbc.AbstractPrestoResultSet.resultsException(AbstractPrestoResultSet.java:1761) at io.prestosql.jdbc.PrestoResultSet.getColumns(PrestoResultSet.java:252) at io.prestosql.jdbc.PrestoResultSet.create(PrestoResultSet.java:54) at io.prestosql.jdbc.PrestoStatement.internalExecute(PrestoStatement.java:249) at io.prestosql.jdbc.PrestoStatement.execute(PrestoStatement.java:227) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:750) Caused by: io.prestosql.spi.PrestoException: line 2:7: Column 'f' cannot be resolved at io.prestosql.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:48) at io.prestosql.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:43) at io.prestosql.sql.analyzer.SemanticExceptions.missingAttributeException(SemanticExceptions.java:33) at io.prestosql.sql.analyzer.Scope.lambda$resolveField$7(Scope.java:228) at java.base/java.util.Optional.orElseThrow(Optional.java:408) at io.prestosql.sql.analyzer.Scope.resolveField(Scope.java:228) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.visitIdentifier(ExpressionAnalyzer.java:438) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.visitIdentifier(ExpressionAnalyzer.java:342) at io.prestosql.sql.tree.Identifier.accept(Identifier.java:72) at io.prestosql.sql.tree.StackableAstVisitor.process(StackableAstVisitor.java:27) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.process(ExpressionAnalyzer.java:365) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.visitLikePredicate(ExpressionAnalyzer.java:702) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.visitLikePredicate(ExpressionAnalyzer.java:342) at io.prestosql.sql.tree.LikePredicate.accept(LikePredicate.java:76) at io.prestosql.sql.tree.StackableAstVisitor.process(StackableAstVisitor.java:27) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.process(ExpressionAnalyzer.java:365) at io.prestosql.sql.analyzer.ExpressionAnalyzer.analyze(ExpressionAnalyzer.java:303) at io.prestosql.sql.analyzer.ExpressionAnalyzer.analyzeExpression(ExpressionAnalyzer.java:1691) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.analyzeExpression(StatementAnalyzer.java:2606) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.analyzeWhere(StatementAnalyzer.java:2465) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.lambda$visitQuerySpecification$23(StatementAnalyzer.java:1528) at java.base/java.util.Optional.ifPresent(Optional.java:183) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:1528) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:322) at io.prestosql.sql.tree.QuerySpecification.accept(QuerySpecification.java:144) at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:339) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:349) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:1039) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:322) at io.prestosql.sql.tree.Query.accept(Query.java:107) at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:339) at io.prestosql.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:308) at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:83) at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:75) at io.prestosql.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:256) at io.prestosql.execution.SqlQueryExecution.(SqlQueryExecution.java:182) at io.prestosql.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:757) at io.prestosql.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:123) at io.prestosql.$gen.Presto_343____20220330_135137_2.call(Unknown Source) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)
'''
Alias f introduced by SELECT header.Data[1].item1 AS f is not available in WHERE so you need to use the whole expression:
where header.Data[1].item1 LIKE '%stuff%'

Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

I was executing a hive job successfully but since last day it is giving me an error after mapper job is completed below are the logs and query:
INSERT INTO TABLE zong_dwh.TEMP_P_UFDR_imp6
SELECT
from_unixtime(begin_time+5*3600,'yyyy-MM-dd') AS Date1,
from_unixtime(begin_time+5*3600,'HH') AS Hour1,
MSISDN AS MSISDN,
A.prot_type AS Protocol,
B.protocol as Application,
host AS Domain,
D.browser_name AS browser_type,
cast (null as varchar(10)) as media_format,
C.ter_type_name_en as device_category,
C.ter_brand_name as device_brand,
rat as session_technology,
case
when rat=1 then Concat(mcc,mnc,lac,ci)
when rat=2 then Concat(mcc,mnc,lac,sac)
when rat=6 then concat(mcc,mnc,eci)
end AS Actual_Site_ID,
sum(coalesce(L4_DW_THROUGHPUT,0)+coalesce(L4_UL_THROUGHPUT,0)) as total_data_volume,
sum(coalesce(TCP_UL_RETRANS_WITHPL,0)/coalesce(TCP_DW_RETRANS_WITHPL,1)) AS retrans_rate,
sum(coalesce(DATATRANS_UL_DURATION,0) + coalesce(DATATRANS_DW_DURATION,0)) as duration,
count(sessionkey) as usage_quantity,
round(sum(L4_DW_THROUGHPUT)/1024/1024,4)/sum(end_time*1000+end_time_msel-begin_time*1000-begin_time_msel) AS downlink_throughput,
round(sum(L4_UL_THROUGHPUT)/1024/1024,4)/sum(end_time*1000+end_time_msel-begin_time*1000-begin_time_msel) as uplink_throughput
from
ps.detail_ufdr_http_browsing_17923 A
INNER JOIN ps.dim_protocol B ON B.protocol_id=A.prot_type
INNER JOIN ps.dim_terminal C on substr(A.imei,1,8)=C.tac
inner join ps.dim_browser_type D on A.browser_type=D.browser_type_id
Group by
from_unixtime(begin_time+5*3600,'yyyy-MM-dd'),
from_unixtime(begin_time+5*3600,'HH'),MSISDN,
prot_type,
B.protocol,
host,
D.browser_name,
cast (null as varchar(10)),
C.ter_type_name_en,
C.ter_brand_name,
rat,
case
when rat=1 then Concat(mcc,mnc,lac,ci)
when rat=2 then Concat(mcc,mnc,lac,sac)
when rat=6 then concat(mcc,mnc,eci)
end;
Logs :
Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row (tag=0)
{"key":{"_col0":"2019-02-11","_col1":"05","_col2":"3002346407","_col3":146,"_col4":"","_col5":null,"_col6":null,"_col7":"35538908","_col8":6,"_col9":"","_col10":"","_col11":"","_col12":"0ED1102"},"value":{"_col0":75013,"_col1":4.0,"_col2":2253648000,"_col3":5,"_col4":0}}
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:182) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1769)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:176) Caused
by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row (tag=0)
{"key":{"_col0":"2019-02-11","_col1":"05","_col2":"3002346407","_col3":146,"_col4":"","_col5":null,"_col6":null,"_col7":"35538908","_col8":6,"_col9":"","_col10":"","_col11":"","_col12":"0ED1102"},"value":{"_col0":75013,"_col1":4.0,"_col2":2253648000,"_col3":5,"_col4":0}}
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more Caused by:
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute
method public org.apache.hadoop.io.Text
org.apache.hadoop.hive.ql.udf.UDFConv.evaluate(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.IntWritable)
on object org.apache.hadoop.hive.ql.udf.UDFConv#2e2f720 of class
org.apache.hadoop.hive.ql.udf.UDFConv with arguments
{:org.apache.hadoop.io.Text, 16:org.apache.hadoop.io.IntWritable,
10:org.apache.hadoop.io.IntWritable} of size 3 at
org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1034)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:193)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:104)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1019)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:821)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
... 7 more Caused by: java.lang.reflect.InvocationTargetException at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1010)
... 18 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at
org.apache.hadoop.hive.ql.udf.UDFConv.evaluate(UDFConv.java:160) ...
23 more

Hive sum(column1 * column2) Issue

Hive version : 1.0
select SUM(table.quantity * table.our_price) from table;
This simple query fails with this error,
Diagnostic Messages for this Task: Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row (tag=0) [Error getting row data with exception
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:310)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:215)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:199)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:197)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:183)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:248)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:455)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166) ]
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:455)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row (tag=0) [Error getting row data
with exception java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:310)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:215)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:199)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:197)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:183)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:248)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:455)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166) ]
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:791)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:310)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:215)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:199)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:597)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:888)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:718)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:786)
... 8 more
I dont get much from this error.
I guess from "ArrayIndexOutOfBoundsException" that you might have NULL, empty in table.quantity and table.price, or that the sum result is too big. If the SUM is too big you should cast your value to bigint :
SELECT CAST(SUM(table.quantity * table.our_price) AS bigint) FROM table;

Hive throws ArrayIndexOutOfBoundsException when select count(1) on ORC table

I have a simple table with 9 fields, using ORCFile format (I followed the steps mentioned here). When I try to count the number of rows in that table (350 million rows, btw) by submitting:
select count(1) from my_orc_table;
I get an 'ArrayIndexOutOfBoundsException'. Let me copy the stack, just in case it provides more information:
Error: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Thanks!!