how to do count in flink sql - sql

I'd like to do count(0)in flink SQL, but it gives exception like
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: SQL parse failed. UDT in DDL is not supported yet.
don't know is there anything wrong?
expect the output should work fine
INSERT INTO request_join
select requestId,count(0) from requests
GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR),requestId;
The schema of the table is here
name: request_join
schema:
- '`requestId` VARCHAR'
- '`count` LONG'
properties:
'connector.type': 'kafka'
'connector.version': 'universal'
'connector.topic': 'request_join_test'
'connector.startup-mode': 'latest-offset'
'connector.properties.0.key': 'zookeeper.connect'
'connector.properties.0.value': '10.XXXXXXXXX'
'connector.properties.1.key': 'bootstrap.servers'
'connector.properties.1.value': '10.XXXXXXXXX'
'connector.properties.2.key': 'group.id'
'connector.properties.2.value': 'request_join_test'
'update-mode': 'append'
'format.type': 'json'
'format.json-schema': '{type: "object", properties: {requestId: { type: "string"},count:{type:
"number"}}}'
didn't find anything wrong, but it just doesn't work, if I do not count and delete the count from the schema it will work well so I'm sure the sql itself is good.
I checked the flink sql it says some of the functions are not supported in DDL, so don't flink support count? I can see from examples that it support SUM very well.

There is sth wrong with your schema,
schema:
- 'requestId VARCHAR'
- 'count BIGINT'

Related

Delta Table : org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'FROM'

I am trying to run the query on EMR/EMR Notebooks (Spark with Scala) -
SELECT max(version), max(timestamp) FROM (DESCRIBE HISTORY delta.`s3://a/b/c/d`)
But I am getting the following error -
The same query works fine on Databricks.
Another doubt that I have is - why does the colour of s3 location change post //.
So I tried to break the above query and only run the Describe HISTORY query. And for some reason it says -
Error Log -
An error was encountered:
org.apache.spark.sql.AnalysisException: Table or view not found: HISTORY;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:835)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:787)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:817)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:810)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:71)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:810)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:756)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:88)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:80)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:164)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withLocalMetrics(Analyzer.scala:104)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:155)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:126)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:125)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:125)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:630)
at org.apache.spark.sql.execution.command.DescribeColumnCommand.run(tables.scala:714)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3391)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3390)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:196)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:81)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:644)
... 50 elided
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'history' not found in database 'default';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:84)
at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:141)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:723)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:723)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:98)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:722)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:138)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:706)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:832)
UPDATED (18-Feb-2021) -> What I have tried till now.
Query Using Spark Sql -
spark.sql("SELECT max(version), max(timestamp) FROM (DESCRIBE HISTORY delta.s3://a/b/c/d)")
But this Didnt work. Same Error.
Create Spark Session with -
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
and spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog.
But its throwing the same error.
UPDATE 2 (18-Feb-2021) :- Trying the approach as mentioned by #alex.
Using PySpark.
It was working partly and but not completely.
Thanks in Advance.
Per documentation, to get support for DESCRIBE HISTORY you need to configure Spark SQL Extensions and Catalog by passing 2 properties (see docs):
spark.sql.extensions to value io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog to value org.apache.spark.sql.delta.catalog.DeltaCatalog
Update:
For Spark 2.4.x, the Delta 0.6.1 should be used, and its documentation has following code snippet to activate extensions:
spark.sparkContext._jvm.io.delta.sql.DeltaSparkSessionExtension() \
.apply(spark._jsparkSession.extensions())
spark = SparkSession(spark.sparkContext, spark._jsparkSession.cloneSession())

Not able to start the ignite server through java code

I am using ignite native and using atomicity as TRANSACTIONAL_SNAPSHOT when I am trying the load the old storage which was configured with amoticity TRNASACTIONAL it is giving the Unknown page type issue after deleting the .dat file but if I am using new storage it is working fine. Can anybody help me?
org.h2.jdbc.JdbcSQLException: General error: "java.lang.IllegalStateException: Unknown page type: 10009 pageId: 0002ffff00000006"; SQL statement:
CREATE TABLE "DFM"."ANSWER_TYPE_ENUM" (_KEY VARCHAR INVISIBLE NOT NULL,_VAL OTHER INVISIBLE,"ID" VARCHAR,"ENUM_VALUE" VARCHAR) engine "org.apache.ignite.internal.processors.query.h2.H2TableEngine" [50000-197]
I've never seen errors like these, but I would say that TRANSACTIONAL_SNAPSHOT is experimental and should be avoided for now.

Spring Boot - Hibernate - Import Date from SQL Query

I am currently building a REST API with Spring Data (and Boot).
I have a sql Dumb from an h2database, which is accessed by hibernate.
My Application.yaml:
spring:
profiles: "dev"
datasource:
data: classpath:/data_api.sql
jpa:
hibernate:
naming:
physical-strategy: org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl
ddl-auto: create-drop
The sql query lools like this:
Insert into TABLE (name, number, date) values ("James", 123, to_date('28-JUL-17','DD-MON-RR'))
Information.java looks like:
#Entity
#Table(name="TABLE")
#Data
public class Information {
#Column(name="name")
String name;
#Column(name="number")
Integer number;
#Column(name="date")
String date;
When I try to run the API I got the following exception:
enter image description here
Caused by: org.h2.jdbc.JdbcSQLException: Function "TO_DATE": Invalid
date format: " Tried to parse one of '[Jul, Feb, Apr, Jun, Aug, Mai,
Nov, Jan, Dez, Okt, Mär, Sep]' but failed (may be an internal
error?). Details:
TO_DATE('16-MAR-17', 'DD-MON-RR')
^ , ^ <-- Parsing failed at this point"; SQL statement
My guess is that hibernate takes some locale where the months are configured in german - does anybody know how to change that?
Another way would be to tell hibernate --> ignore the sql function to_date() and just read the field as string. I tried something like writing #org.hibernate.annotations.Type(type =“text“) over the #column annotation - but it doesn't work :(
EDIT: When changing the Monthname in the import queries to german --> like OCT --> OKT ; DEC --> DEZ.. it works. So it seems that Hibernate is using my local language settings for mapping. Is there a way to change this to english?
If anybody else comes to this problem - I figuered it out. Hibernate reads the Locate settings. So with a simple code line in the Main Application this is done:
Locale.setDefault(new Locale("en", "US"));

liquibase-hibernate shows all tables as "unexpected"

I followed these steps to get liquibase-hibernate working. I hope I correctly understood the instructions in the wiki.
Our hibernate entities are declared in the file applicationContext.xml. We do not have a hibernate.cfg.xml. My liquibase properties are:
url=jdbc:postgresql://localhost:1234/MY_DATABASE
username=user
password=pass
referenceUrl=hibernate:spring:somePackage?dialect=org.hibernate.dialect.PostgreSQLDialect
The thing is no matter what I enter as somePackage, liquibase shows everything (tables, columns, constraints) as "unexpected". Liquibase "finds" somePackage even if it does not exist.
liquibase diff
INFO 09.08.17 10:41: liquibase-hibernate: Reading hibernate configuration hibernate:spring:somePackage?dialect=org.hibernate.dialect.PostgreSQLDialect
INFO 09.08.17 10:41: liquibase-hibernate: Found package somePackage
And the comparison result is like
Reference Database: null # hibernate:spring:somePackage?dialect=org.hibernate.dialect.PostgreSQLDialect (Default Schema: HIBERNATE)
Comparison Database: postgres # jdbc:postgresql://localhost:1234/MY_DATABASE (Default Schema: public)
Compared Schemas: HIBERNATE -> public
Product Name:
Reference: 'Hibernate'
Target: 'PostgreSQL'
Product Version:
Reference: '4.3.11.Final'
Target: '9.5.4'
Missing Catalog(s): NONE
Unexpected Catalog(s): NONE
Changed Catalog(s):
HIBERNATE
name changed from 'HIBERNATE' to 'MY_DATABASE'
Missing Column(s): NONE
[...]
Unexpected Table(s):
activityentity
addressentity
advertisemententity
advertisementusageentity
[...]
I really don't know what's going on or whether I'm doing something wrong. Any help would be appreciated.

MS SQL JDBC error on Execution exception - Invalid object name [Play 2.x scala app]

I'm using Play framework 2.x with SQL driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
I'm trying to run a simple query:
SELECT [org].[name] FROM [ref].[organisations_bak] AS org
but I get the following error:
play.api.Application$$anon$1: Execution exception[[SQLServerException: Invalid object name 'ref.organisations_bak'.]]
at play.api.Application$class.handleError(Application.scala:293) ~[play_2.10-2.2.2.jar:2.2.2]
at play.api.DefaultApplication.handleError(Application.scala:399) [play_2.10-2.2.2.jar:2.2.2]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$12$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:165) [play_2.10-2.2.2.jar:2.2.2]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$12$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:162) [play_2.10-2.2.2.jar:2.2.2]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) [scala-library-2.10.3.jar:na]
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185) [scala-library-2.10.3.jar:na]
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'ref.organisations_bak'.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216) ~[sqljdbc4-4.0.2206.100.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515) ~[sqljdbc4-4.0.2206.100.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:792) ~[sqljdbc4-4.0.2206.100.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:689) ~[sqljdbc4-4.0.2206.100.jar:na]
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696) ~[sqljdbc4-4.0.2206.100.jar:na]
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715) ~[sqljdbc4-4.0.2206.100.jar:na]
I need to use the schema reference in my queries but I cant even get a simple query like this to work on my play app, simple queries without schema references work fine
SELECT name FROM organisations_bak
My Scala code looks like this:
import java.sql.ResultSet
import play.api.db.DB
DB.withConnection {
conn =>
val res = conn.createStatement.execute("SELECT [org].[name] FROM [ref].[organisations_bak] AS org")
}
Any help would be appreciated.
Thanks
In my case the issue was different user permissions. The server I used for development has been setup in some weird way that my user didn't have the permission to access the [ref] schema.
Just as a test I switched over to an AWS RDS SQL Server instance with the default DBA (owner) user settings and everything worked.
This means that the library and the code works it's my server that's at fault, but that's another issue.
try using
SELECT [org].[name] FROM [ref].[dbo].[organisations_bak] AS org