Hive sum(column1 * column2) Issue - apache

Hive version : 1.0
select SUM(table.quantity * table.our_price) from table;
This simple query fails with this error,
Diagnostic Messages for this Task: Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row (tag=0) [Error getting row data with exception
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:310)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:215)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:199)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:197)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:183)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:248)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:455)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166) ]
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:455)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row (tag=0) [Error getting row data
with exception java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:310)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:215)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:199)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:353)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:197)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:183)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:248)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:455)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166) ]
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:791)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:310)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:215)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:199)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:597)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:888)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:718)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:786)
... 8 more
I dont get much from this error.

I guess from "ArrayIndexOutOfBoundsException" that you might have NULL, empty in table.quantity and table.price, or that the sum result is too big. If the SUM is too big you should cast your value to bigint :
SELECT CAST(SUM(table.quantity * table.our_price) AS bigint) FROM table;

Related

org.apache.hadoop.hive.ql.exec.UDF is deprecated

why hive3.x deprecate org.apache.hadoop.hive.ql.exec.UDF
then I use org.apache.hadoop.hive.ql.udf.generic.GenericUDF to deal join job
sql like
select
dw_rk.STRDEREPEAT(',', t1.d_deptname, t2.d_deptname, t3.d_deptname) as d_deptname,
dw_rk.STRDEREPEAT(',', t1.d_tabname, t2.d_tabname, t3.d_tabname) as d_tabname
from
dw_rk.f_t_rk_jcxx t1
LEFT join dw_rk.f_t_rk_hjxx t2 on t1.sfzhm = t2.sfzhm
LEFT JOIN dw_rk.f_t_rk_dzxx t3 on t1.sfzhm = t3.sfzhm
it means use a symbol to split cols,and return new union string
then
have an error in yarn
2022-12-09 15:22:54,852 ERROR [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Unexpected exception from VectorMapJoinOperator : Error evaluating _col1
org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating _col1
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinBaseOperator.flushOutput(VectorMapJoinBaseOperator.java:254)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinBaseOperator.internalForward(VectorMapJoinBaseOperator.java:249)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:848)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:932)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:517)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.process(VectorMapJoinOperator.java:233)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:812)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:845)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.NullPointerException
at com.cestc.str.StrDerepeat.evaluate(StrDerepeat.java:33)
at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:190)
at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
... 26 more
2022-12-09 15:22:54,853 ERROR [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:973)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception from VectorMapJoinOperator : Error evaluating _col1
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:530)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.process(VectorMapJoinOperator.java:233)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:812)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:845)
... 10 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating _col1
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinBaseOperator.flushOutput(VectorMapJoinBaseOperator.java:254)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinBaseOperator.internalForward(VectorMapJoinBaseOperator.java:249)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:848)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:932)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:517)
... 19 more
Caused by: java.lang.NullPointerException
at com.cestc.str.StrDerepeat.evaluate(StrDerepeat.java:33)
at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:190)
at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
... 26 more
I set
hive.vectorized.execution.enabled
and
hive.vectorized.execution.reduce.enabled
false alse wrong
then I take concat_ws() to 3 cols join into temp table,and take STRDEREPEAT to overwrite itself ,I success.
but I can`t know reason

hive execute merge into failed with HiveAuthzPluginException Invalid number of user privilege objects: 2

When execute below sql in hive
MERGE INTO foo a USING bar b ON (a.datamonth = b.datamonth AND a.productlink = b.productlink)
WHEN MATCHED THEN UPDATE SET product_name_keywords = b.keywords;
It throws below error
2022-10-28T21:06:44,274 ERROR [e5363ba5-57fc-4f0b-a8ab-e0e27eb324f5 HiveServer2-Handler-Pool: Thread-80] ql.Driver: FAILED: HiveAuthzPluginException Invalid number of user privilege objects: 2
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthzPluginException: Invalid number of user privilege objects: 2
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLAuthorizationUtils.getRequiredPrivsFromThrift(SQLAuthorizationUtils.java:328)
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLAuthorizationUtils.getPrivilegesFromMetaStore(SQLAuthorizationUtils.java:209)
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidator.checkPrivileges(SQLStdHiveAuthorizationValidator.java:145)
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidator.checkPrivileges(SQLStdHiveAuthorizationValidator.java:84)
at org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerImpl.checkPrivileges(HiveAuthorizerImpl.java:86)
at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:1307)
at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:1071)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:698)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:247)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy37.executeStatementAsync(Unknown Source)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
What is the reason and how to fix it?

Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

I was executing a hive job successfully but since last day it is giving me an error after mapper job is completed below are the logs and query:
INSERT INTO TABLE zong_dwh.TEMP_P_UFDR_imp6
SELECT
from_unixtime(begin_time+5*3600,'yyyy-MM-dd') AS Date1,
from_unixtime(begin_time+5*3600,'HH') AS Hour1,
MSISDN AS MSISDN,
A.prot_type AS Protocol,
B.protocol as Application,
host AS Domain,
D.browser_name AS browser_type,
cast (null as varchar(10)) as media_format,
C.ter_type_name_en as device_category,
C.ter_brand_name as device_brand,
rat as session_technology,
case
when rat=1 then Concat(mcc,mnc,lac,ci)
when rat=2 then Concat(mcc,mnc,lac,sac)
when rat=6 then concat(mcc,mnc,eci)
end AS Actual_Site_ID,
sum(coalesce(L4_DW_THROUGHPUT,0)+coalesce(L4_UL_THROUGHPUT,0)) as total_data_volume,
sum(coalesce(TCP_UL_RETRANS_WITHPL,0)/coalesce(TCP_DW_RETRANS_WITHPL,1)) AS retrans_rate,
sum(coalesce(DATATRANS_UL_DURATION,0) + coalesce(DATATRANS_DW_DURATION,0)) as duration,
count(sessionkey) as usage_quantity,
round(sum(L4_DW_THROUGHPUT)/1024/1024,4)/sum(end_time*1000+end_time_msel-begin_time*1000-begin_time_msel) AS downlink_throughput,
round(sum(L4_UL_THROUGHPUT)/1024/1024,4)/sum(end_time*1000+end_time_msel-begin_time*1000-begin_time_msel) as uplink_throughput
from
ps.detail_ufdr_http_browsing_17923 A
INNER JOIN ps.dim_protocol B ON B.protocol_id=A.prot_type
INNER JOIN ps.dim_terminal C on substr(A.imei,1,8)=C.tac
inner join ps.dim_browser_type D on A.browser_type=D.browser_type_id
Group by
from_unixtime(begin_time+5*3600,'yyyy-MM-dd'),
from_unixtime(begin_time+5*3600,'HH'),MSISDN,
prot_type,
B.protocol,
host,
D.browser_name,
cast (null as varchar(10)),
C.ter_type_name_en,
C.ter_brand_name,
rat,
case
when rat=1 then Concat(mcc,mnc,lac,ci)
when rat=2 then Concat(mcc,mnc,lac,sac)
when rat=6 then concat(mcc,mnc,eci)
end;
Logs :
Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row (tag=0)
{"key":{"_col0":"2019-02-11","_col1":"05","_col2":"3002346407","_col3":146,"_col4":"","_col5":null,"_col6":null,"_col7":"35538908","_col8":6,"_col9":"","_col10":"","_col11":"","_col12":"0ED1102"},"value":{"_col0":75013,"_col1":4.0,"_col2":2253648000,"_col3":5,"_col4":0}}
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:182) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1769)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:176) Caused
by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row (tag=0)
{"key":{"_col0":"2019-02-11","_col1":"05","_col2":"3002346407","_col3":146,"_col4":"","_col5":null,"_col6":null,"_col7":"35538908","_col8":6,"_col9":"","_col10":"","_col11":"","_col12":"0ED1102"},"value":{"_col0":75013,"_col1":4.0,"_col2":2253648000,"_col3":5,"_col4":0}}
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
... 7 more Caused by:
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute
method public org.apache.hadoop.io.Text
org.apache.hadoop.hive.ql.udf.UDFConv.evaluate(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable,org.apache.hadoop.io.IntWritable)
on object org.apache.hadoop.hive.ql.udf.UDFConv#2e2f720 of class
org.apache.hadoop.hive.ql.udf.UDFConv with arguments
{:org.apache.hadoop.io.Text, 16:org.apache.hadoop.io.IntWritable,
10:org.apache.hadoop.io.IntWritable} of size 3 at
org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1034)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:193)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:104)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1019)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:821)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
at
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
... 7 more Caused by: java.lang.reflect.InvocationTargetException at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1010)
... 18 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at
org.apache.hadoop.hive.ql.udf.UDFConv.evaluate(UDFConv.java:160) ...
23 more

apache-spark-sql: Error does not return the column name with error

When I use spark sql to query the data in the dataframe, my query returns the error. From the error, I cannot figure out what column has errors.
My table is gigantic with 120 columns and 176M rows.
Here is my query:
%sql
select order_entry_date, count(1) cnt, sum(paid_units) paid_unit, sum(total_revenue) rev
from mart_bc_order_item
group by 1
order by 1
The error is below:
java.lang.NumberFormatException: For input string: "�"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:252)
at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:125)
at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Can someone help here?
Thanks,
Vivek
NumberFormatException' you are getting due to the reason that String can not be parsed properly, check your code and data again.

Hive throws ArrayIndexOutOfBoundsException when select count(1) on ORC table

I have a simple table with 9 fields, using ORCFile format (I followed the steps mentioned here). When I try to count the number of rows in that table (350 million rows, btw) by submitting:
select count(1) from my_orc_table;
I get an 'ArrayIndexOutOfBoundsException'. Let me copy the stack, just in case it provides more information:
Error: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Thanks!!