Flink hive api query exception: Distinct without an aggregation - hive

I have the following exception when querying Hive with Flink's Hive table API connector: Distinct without an aggregation.
However, the sql query is executed correctly when using Hue interface to query hive.
I'm wondering if this problem due to a bad compatibility with flink?
Flink version: 1.14.2
Hive version: 2.1.1
SQL statement:
select devid as pdevid,
count(distinct vtype) as vip_type_trans
from events
where dt = '20220702'
and utype > -1
group by devid
having count(distinct vtype) > 1
Exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.hadoop.hive.ql.parse.SemanticException: Distinct without an aggregation.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.SemanticException: Distinct without an aggregation.
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.logicalPlan(HiveParserCalcitePlanner.java:304)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:272)
at org.apache.flink.table.planner.delegation.hive.HiveParser.analyzeSql(HiveParser.java:290)
at org.apache.flink.table.planner.delegation.hive.HiveParser.processCmd(HiveParser.java:238)
at org.apache.flink.table.planner.delegation.hive.HiveParser.parse(HiveParser.java:208)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:716)
at com.zhhainiao.wp.stat.PaidConversationRate$.main(PaidConversationRate.scala:158)
at com.zhhainiao.wp.stat.PaidConversationRate.main(PaidConversationRate.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
... 11 more
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Distinct without an aggregation.
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genSelectLogicalPlan(HiveParserCalcitePlanner.java:2275)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2749)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2647)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2688)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2647)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2688)
at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.logicalPlan(HiveParserCalcitePlanner.java:284)

That is, flink is not fully compatible with hive sql syntax?
Flink is not fully compatible with the Hive SQL Syntax. There is one open Flink ticket regarding Hive while using DISTINCT, see https://issues.apache.org/jira/browse/FLINK-19004. If that matches with your current problem, you can track that item. Else I would recommend that you open a new Flink Jira ticket for this bug.

Related

HSQLDB error writing an object into the database

I have one server which should write in a HSQLDB Database stored in another server. But while trying to do the writing process. It appears to me the following error, shown below as log trace.
Apparently when the logs mention, "current / latest database version", is not related with the process of trying to upgrade the Database Itself. It`s related with some data object which is trying to write in a field of database, but without success.
No idea, why is caused this error, and how to solve it. Because there is another node which also attacks to the server database, which apparently has the same configuration and runs perfect.
Any help would be appreciated
Many thanks
2022-11-08 12:45:00,001 INFO [.maudit.task.BackupLogProcessorTask] Task finished: Backup Log Proccesor.
**2022-11-08 12:45:00,001 INFO [.common.operations.rest.client.CommonOperationsClient] Starting the call to the getLastDDBBVersion method of the CommonOperations rest service. Current database version: 167.
2022-11-08 12:45:03,080 INFO [.gob.afirma.common.operations.rest.client.CommonOperationsClient] The latest version of the database available is 329. The update proceeds.**
**2022-11-08 12:45:04,155 ERROR [.common.operations.rest.client.CommonOperationsClient] Error updating database. Some statement has not been executed correctly.**
**java.sql.BatchUpdateException: integrity constraint violation: unique constraint or index violation; "PSC_UNIQUE_IDENTIFICATOR" table: "PSC"
java.sql.BatchUpdateException: integrity constraint violation: unique constraint or index violation; "PSC_UNIQUE_IDENTIFICATOR" table: "PSC"**
at org.hsqldb.jdbc.JDBCStatement.executeBatch(Unknown Source)
at .persistence.utils.UtilsDataBase.updateDataBase(UtilsDataBase.java:411)
at .common.operations.rest.client.CommonOperationsClient.getLastDDBBVersion(CommonOperationsClient.java:142)
at sun.reflect.GeneratedMethodAccessor576.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at .rest.utils.UtilsCommonOperationsClientOperations.getLastDDBBVersion(UtilsCommonOperationsClientOperations.java:228)
at .ddbb.version.rest.task.DDBBVersionTask.doActionOfTheTask(DDBBVersionTask.java:86)
at .mplanificacion.Task.doTask(Task.java:52)
at .quartz.job.AbstractAfirmaTaskQuartzJob.execute(AbstractAfirmaTaskQuartzJob.java:105)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
2022-11-08 12:45:04,157 WARN [.malarm.AlarmsModuleManager] It has been generated the AL055 alarm in the module [MODGENERAL]: Not possible upgrade to the latest local database version...

java.lang.ArrayIndexOutOfBoundsException: -1 - Hive update statement

When i trying to run the Hive update statement getting the following error.
2021-02-25 15:38:54,934 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1592334694783_33388_r_000007_3: Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":3}},"value":{"_col0":"T","_col1":1111111,"......."_col44":""}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:790)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
The update query is simple.
All the columns in the Target table are string or Decimal .
Identified another issue point Cloudera Link, But the problem is this query runs most of the time, but fails when run for certain type of Data.
Update Statement
UPDATE Table1 a
SET
email = MaskData(email,1)
WHERE d_Date >= '2017-01-01' and
email IN (select distinct email from Table2);
Any Path forward or assistance will be helpful. Thanks in Advance.
Looks like the data is not bucketed properly as we are inserting the data from Spark.
Had to re-do the complete tables and it was working properly.

Druid : No column with timestamp with local time-zone type on query result; one column should be of timestamp with local time-zone type

I am trying to create a druid table based on existing druid table using below query for which I'm facing error.
query :
CREATE TABLE IF NOT EXISTS database.druid_table2 STORED BY
'org.apache.hadoop.hive.druid.DruidStorageHandler' AS SELECT '__time' as `__time`,column1, column2, column3 FROM database.druid_table1;
error :
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED:
SemanticException No column with timestamp with local time-zone type on query result; one column
should be of timestamp with local time-zone type
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286)
at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:324)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:26`enter code here`5)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:718)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:801)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
NOTE : I'm using Hive interactive mode since this is a druid related query and Hadoop version is 3.2.x.
I am not familiar with this method or syntax, but judging from the error it works for other cases? If so, perhaps you have to cast __time into a format indicating timezone? I'm not sure how it's being passed.

CTE based sequence generation with HSQLDB

I am using a recursive common table expression to fetch a batch of sequence number. The following query works with Postgres, SQL Server and H2 (minus the VALUES part).
WITH RECURSIVE t(n, level_num) AS (
SELECT next value for seq_parent_id as n,
1 as level_num
FROM (VALUES(0))
UNION ALL
SELECT next value for seq_parent_id as n,
level_num + 1 as level_num
FROM t
WHERE level_num < ?)
SELECT n FROM t
However with HSQLDB 2.4.0 I get the following exception
java.sql.SQLSyntaxErrorException: user lacks privilege or object not found: T
at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
at org.hsqldb.jdbc.JDBCStatement.fetchResult(Unknown Source)
at org.hsqldb.jdbc.JDBCStatement.executeQuery(Unknown Source)
...
Caused by: org.hsqldb.HsqlException: user lacks privilege or object not found: T
at org.hsqldb.error.Error.error(Unknown Source)
at org.hsqldb.error.Error.error(Unknown Source)
at org.hsqldb.ParserDQL.readTableName(Unknown Source)
at org.hsqldb.ParserDQL.readTableOrSubquery(Unknown Source)
at org.hsqldb.ParserDQL.XreadTableReference(Unknown Source)
at org.hsqldb.ParserDQL.XreadFromClause(Unknown Source)
at org.hsqldb.ParserDQL.XreadTableExpression(Unknown Source)
at org.hsqldb.ParserDQL.XreadQuerySpecification(Unknown Source)
at org.hsqldb.ParserDQL.XreadSimpleTable(Unknown Source)
at org.hsqldb.ParserDQL.XreadQueryPrimary(Unknown Source)
at org.hsqldb.ParserDQL.XreadQueryTerm(Unknown Source)
at org.hsqldb.ParserDQL.XreadSetOperation(Unknown Source)
at org.hsqldb.ParserDQL.XreadQueryExpressionBody(Unknown Source)
at org.hsqldb.ParserDQL.XreadQueryExpression(Unknown Source)
at org.hsqldb.ParserDQL.XreadSubqueryTableBody(Unknown Source)
at org.hsqldb.ParserDQL.XreadTableNamedSubqueryBody(Unknown Source)
at org.hsqldb.ParserDQL.XreadQueryExpression(Unknown Source)
at org.hsqldb.ParserDQL.compileCursorSpecification(Unknown Source)
at org.hsqldb.ParserCommand.compilePart(Unknown Source)
at org.hsqldb.ParserCommand.compileStatements(Unknown Source)
at org.hsqldb.Session.executeDirectStatement(Unknown Source)
at org.hsqldb.Session.execute(Unknown Source)
... 37 more
This specific use case could also be solved with a combination of UNNEST and SEQUENCE_ARRAY but I'd like to avoid having to introduce an HSQLDB specific code path.
I'd start with the simplest form of the recursive query without use of sequences and with hard-coded limit and then gradually add extra bits to it.
Based on the example in the docs With Clause and Recursive Queries the syntax should look like this:
WITH RECURSIVE
t(level_num)
AS
(
VALUES(1)
UNION ALL
SELECT
level_num + 1
FROM t
WHERE level_num < 10
)
SELECT level_num
FROM t
;
By the way, the docs say:
HyperSQL limits recursion to 265 rounds. If this is exceeded, an error
is raised.
I'd try the simplest query, like the one above, make sure that it works, then try it with, say, 1000 instead of 10 and see what error it returns. If it is the same error that you had originally, then you found the reason.
A side note: I'd use a permanent table of numbers instead of generating them on the fly recursively for the task of this kind. We have a table with 100K numbers in our system. It is simple and would work in any DBMS. Populate it once and use as needed. I know that in SQL Server recursive query is significantly slower (in this kind of task), I don't know about HyperSQL though. Also, the limit of recursion depth to just 265 is rather harsh. Most likely, with such low limit on the recursion depth it would be impossible to detect any difference in performance. But, again, is 265 numbers enough for your purposes?
HSQLDB has an issue with the 'UNION ALL'. In above referred example "With Clause and Recursive Queries" there is no 'UNION ALL', but only 'UNION' (note that documentation might be changed).
In this thread there is some discussion about it. But at the moment I cannot get a working recursive statement in HSQLDB that uses UNION ALL.
So use UNION for HSQLDB v 2.3.2+
Fred Toussi commented on above thread. In version 2.5.1 and upwards UNION ALL should behave as expected.

SMB join not working over Hive Tables

While performing SMB join over two ORC tables, bucketed and sorted on subscription_id, the join fails giving below error:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:210)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.joinFinalLeftData(SMBMapJoinOperator.java:345)
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.closeOp(SMBMapJoinOperator.java:610)
at org.apache.hadoop.hive.ql.exec.vector.VectorSMBMapJoinOperator.closeOp(VectorSMBMapJoinOperator.java:275)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:192)
... 8 more
The task tracker URL also doesn't give much details.
The query is:
SELECT * FROM
user_plays_buck
INNER JOIN small_user_subscription_buck
ON user_plays_buck.subscription_id = small_user_subscription_buck.subscription_id
LIMIT 1;
Got exactly the same issue in Hive 1.1. The same query works in Hive 2.1. So upgrade your hive.