java.lang.ArrayIndexOutOfBoundsException: -1 - Hive update statement - hive

When i trying to run the Hive update statement getting the following error.
2021-02-25 15:38:54,934 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1592334694783_33388_r_000007_3: Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":3}},"value":{"_col0":"T","_col1":1111111,"......."_col44":""}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:790)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
The update query is simple.
All the columns in the Target table are string or Decimal .
Identified another issue point Cloudera Link, But the problem is this query runs most of the time, but fails when run for certain type of Data.
Update Statement
UPDATE Table1 a
SET
email = MaskData(email,1)
WHERE d_Date >= '2017-01-01' and
email IN (select distinct email from Table2);
Any Path forward or assistance will be helpful. Thanks in Advance.

Looks like the data is not bucketed properly as we are inserting the data from Spark.
Had to re-do the complete tables and it was working properly.

Related

Flink hive api query exception: Distinct without an aggregation

I have the following exception when querying Hive with Flink's Hive table API connector: Distinct without an aggregation.
However, the sql query is executed correctly when using Hue interface to query hive.
I'm wondering if this problem due to a bad compatibility with flink?
Flink version: 1.14.2
Hive version: 2.1.1
SQL statement:
select devid as pdevid,
count(distinct vtype) as vip_type_trans
from events
where dt = '20220702'
and utype > -1
group by devid
having count(distinct vtype) > 1
Exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.hadoop.hive.ql.parse.SemanticException: Distinct without an aggregation.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.SemanticException: Distinct without an aggregation.
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.logicalPlan(HiveParserCalcitePlanner.java:304)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:272)
at org.apache.flink.table.planner.delegation.hive.HiveParser.analyzeSql(HiveParser.java:290)
at org.apache.flink.table.planner.delegation.hive.HiveParser.processCmd(HiveParser.java:238)
at org.apache.flink.table.planner.delegation.hive.HiveParser.parse(HiveParser.java:208)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:716)
at com.zhhainiao.wp.stat.PaidConversationRate$.main(PaidConversationRate.scala:158)
at com.zhhainiao.wp.stat.PaidConversationRate.main(PaidConversationRate.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
... 11 more
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Distinct without an aggregation.
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genSelectLogicalPlan(HiveParserCalcitePlanner.java:2275)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2749)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2647)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2688)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2647)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2688)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.logicalPlan(HiveParserCalcitePlanner.java:284)
That is, flink is not fully compatible with hive sql syntax?
Flink is not fully compatible with the Hive SQL Syntax. There is one open Flink ticket regarding Hive while using DISTINCT, see https://issues.apache.org/jira/browse/FLINK-19004. If that matches with your current problem, you can track that item. Else I would recommend that you open a new Flink Jira ticket for this bug.

Druid : No column with timestamp with local time-zone type on query result; one column should be of timestamp with local time-zone type

I am trying to create a druid table based on existing druid table using below query for which I'm facing error.
query :
CREATE TABLE IF NOT EXISTS database.druid_table2 STORED BY
'org.apache.hadoop.hive.druid.DruidStorageHandler' AS SELECT '__time' as `__time`,column1, column2, column3 FROM database.druid_table1;
error :
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED:
SemanticException No column with timestamp with local time-zone type on query result; one column
should be of timestamp with local time-zone type
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286)
at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:324)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:26`enter code here`5)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:718)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:801)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
NOTE : I'm using Hive interactive mode since this is a druid related query and Hadoop version is 3.2.x.
I am not familiar with this method or syntax, but judging from the error it works for other cases? If so, perhaps you have to cast __time into a format indicating timezone? I'm not sure how it's being passed.

SMB join not working over Hive Tables

While performing SMB join over two ORC tables, bucketed and sorted on subscription_id, the join fails giving below error:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:210)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.joinFinalLeftData(SMBMapJoinOperator.java:345)
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.closeOp(SMBMapJoinOperator.java:610)
at org.apache.hadoop.hive.ql.exec.vector.VectorSMBMapJoinOperator.closeOp(VectorSMBMapJoinOperator.java:275)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:192)
... 8 more
The task tracker URL also doesn't give much details.
The query is:
SELECT * FROM
user_plays_buck
INNER JOIN small_user_subscription_buck
ON user_plays_buck.subscription_id = small_user_subscription_buck.subscription_id
LIMIT 1;
Got exactly the same issue in Hive 1.1. The same query works in Hive 2.1. So upgrade your hive.

Query Failed Error: Unexpected. Please try again. Google BigQuery

I am unable to execute a simple select query (select * from [<DatasetName>.<TableName>] LIMIT 1000) with Google BigQuery. It is giving me below error:
Query Failed
Error: Unexpected. Please try again.
Job ID: job_51MSM2uo1vE5QomZA8Sv_RUE7M8
The table contains around 10 records. I am able to execute queries on other tables.
Job Id is - job_51MSM2uo1vE5QomZA8Sv_RUE7M8.
It looks like there was a temporary issue where we were seeing timeouts when querying tables that had been inserted to recently via streaming insert (tabledata.insertall()). We're currently addressing the underlying issue, but it should be working now. Please ping this thread if you see it again.

Insufficient data written when inserting rows

I am facing this error when running my unit test to insert some rows into my bigquery table today :
Caused by: java.io.IOException: insufficient data written
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3213)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:960)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:482)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithBackOffAndGZip(MediaHttpUploader.java:504)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeUploadInitiation(MediaHttpUploader.java:456)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:348)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:418)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:460)
I thought it was due to the new version of google-http-client (1.16.0.rc) because i updated just before running the test. But rollbacking to 1.15.0-rc has no effect.
Any idea ?
Me too. Also, it seems like a sign that Bigquery just stops receiving any data. Because if you query you table by count(*) after this exception, result won't change anymore. If I keep my program running for a while, it will give me errors such as:
javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1343)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
waiting for answers...
These errors usually happen when there is a communication failure, specially for large files.
The way to avoid it is to use resumable upload.