I´m using solr 3.2 version.
I need to get the current date in this format: yyyyMMdd and then use that result in a delta query
I´ve tried using this wiki http://wiki.apache.org/solr/DataImportHandler#A_VariableResolver
${dataimporter.functions.formatDate('NOW', yyyyMMdd)}
But I get this exception:
Throwable occurred: java.lang.NullPointerException
at org.apache.solr.handler.dataimport.EvaluatorBag$4.evaluate(EvaluatorBag.java:146)
at org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:222)
at org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:209)
at org.apache.solr.handler.dataimport.VariableResolverImpl.resolve(VariableResolverImpl.java:113)
at org.apache.solr.handler.dataimport.TemplateString.fillTokens(TemplateString.java:81)
at org.apache.solr.handler.dataimport.TemplateString.replaceTokens(TemplateString.java:75)
at org.apache.solr.handler.dataimport.VariableResolverImpl.replaceTokens(VariableResolverImpl.java:96)
at org.apache.solr.handler.dataimport.ContextImpl.replaceTokens(ContextImpl.java:256)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextModifiedRowKey(SqlEntityProcessor.java:84)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextModifiedRowKey(EntityProcessorWrapper.java:262)
at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:884)
at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:284)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:178)
at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:374)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:413)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
You need to quote both arguments.
${dataimporter.functions.formatDate('NOW', 'yyyyMMdd')}
Related
I have a column in my databricks table, with a customised date time format as string,
while trying to convert the string to datetime I am observing below error
PARSE_DATETIME_BY_NEW_PARSER
SQL Command
select to_date(ORDERDATE, 'M/dd/yyyy H:mm') from sales_kaggle_chart limit 10;
The format of ORDERDATE column is M/dd/yyyy H:mm
example of ORDERDATE columns 10/10/2003 0:00 and 8/25/2003 0:00
complete error message
Job aborted due to stage failure: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0:
Fail to parse '5/7/2003' in the new parser. You can set "legacy_time_parser_policy" to "LEGACY" to restore the behavior before Spark 3.0, or set to "CORRECTED" and treat it as an invalid datetime string.
Note: the same command works for a single value
SELECT to_date("12/24/2003 0:00", 'M/d/yyyy H:mm') as date;
Have you tried setting to legacy parser, like the error message is hinting you?
SET legacy_time_parser_policy = legacy;
SELECT to_date(ORDERDATE, 'M/dd/yyyy H:mm') FROM sales_kaggle_chart LIMIT 10;
This error is quite common, and adjusting configuration typically does the job.
I am facing error while converting the string to datetime format in databricks :
select to_date('01Jan1971:00:00:00','DDMONYYYY:HH:MI:SS')
Error in SQL statement: IllegalArgumentException: All week-based patterns are unsupported since Spark 3.0, detected: Y, Please use the SQL function EXTRACT instead
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: java.lang.IllegalArgumentException: All week-based patterns are unsupported since Spark 3.0, detected: Y, Please use the SQL function EXTRACT instead
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$.$anonfun$convertIncompatiblePattern$4(DateTimeFormatterHelper.scala:323)
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$.$anonfun$convertIncompatiblePattern$4$adapted(DateTimeFormatterHelper.scala:321)
This command worked
select to_date(upper('01Jan1971:00:00:00'),'ddMMMyyyy:HH:mm:ss')
I am trying to run the query on EMR/EMR Notebooks (Spark with Scala) -
SELECT max(version), max(timestamp) FROM (DESCRIBE HISTORY delta.`s3://a/b/c/d`)
But I am getting the following error -
The same query works fine on Databricks.
Another doubt that I have is - why does the colour of s3 location change post //.
So I tried to break the above query and only run the Describe HISTORY query. And for some reason it says -
Error Log -
An error was encountered:
org.apache.spark.sql.AnalysisException: Table or view not found: HISTORY;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:835)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:787)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:817)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:810)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:71)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:810)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:756)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1$$anonfun$2.apply(RuleExecutor.scala:92)
at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:88)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:80)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:164)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$execute$1.apply(Analyzer.scala:156)
at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withLocalMetrics(Analyzer.scala:104)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:155)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:126)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:125)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:125)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:630)
at org.apache.spark.sql.execution.command.DescribeColumnCommand.run(tables.scala:714)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:196)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3391)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3390)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:196)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:81)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:644)
... 50 elided
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'history' not found in database 'default';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:81)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:81)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:84)
at org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:141)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:723)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:723)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:98)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:722)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:138)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:706)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:832)
UPDATED (18-Feb-2021) -> What I have tried till now.
Query Using Spark Sql -
spark.sql("SELECT max(version), max(timestamp) FROM (DESCRIBE HISTORY delta.s3://a/b/c/d)")
But this Didnt work. Same Error.
Create Spark Session with -
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
and spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog.
But its throwing the same error.
UPDATE 2 (18-Feb-2021) :- Trying the approach as mentioned by #alex.
Using PySpark.
It was working partly and but not completely.
Thanks in Advance.
Per documentation, to get support for DESCRIBE HISTORY you need to configure Spark SQL Extensions and Catalog by passing 2 properties (see docs):
spark.sql.extensions to value io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog to value org.apache.spark.sql.delta.catalog.DeltaCatalog
Update:
For Spark 2.4.x, the Delta 0.6.1 should be used, and its documentation has following code snippet to activate extensions:
spark.sparkContext._jvm.io.delta.sql.DeltaSparkSessionExtension() \
.apply(spark._jsparkSession.extensions())
spark = SparkSession(spark.sparkContext, spark._jsparkSession.cloneSession())
I am using Manifoldcf v2.7.1 and Solr v5.2.1 and trying to crawl Jira using the Jira connector and am getting the following error in Manifoldcf:
Error: Repeated service interruptions - failure processing document:
Error from server at (servername:port/solr/jira): String index out of range: -11
Note: I removed my server and port info from the error message
One of the error logs from Solr is showing the following at the top of the stacktrace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -11
at org.apache.solr.request.macro.MacroExpander._expand(MacroExpander.java:144)
Don't know what is causing this area and how to fix it. Thanks in advance!
Turns out that there was a Jira issue with Java written in the comments section. I think that it wasn't being exited out properly by Manifold. To avoid this, I excluded this one issue that was causing issues from future crawls.
I am using kibi-community-demo-full-4.6.4-linux-x64 version.
In datasource:
"connection_string": "jdbc:hive://localhost:10000/root",
"libpath": "/home/pare/Downloads/jar/",
"drivername": "org.apache.hadoop.hive.jdbc.HiveDriver",
"libs": "hive-jdbc-0.11.0.jar,hive-metastore-0.11.0.jar,libthrift-0.9.1.jar,hive-service-0.13.1.jar,hive-jdbc-1.2.1.2.3.2.0-2950-standalone.jar,hadoop-common-2.7.1.2.3.2.0-2950.jar",
After that when in queries I write a query it will show error like:
Queries Editor: Error 400 Bad Request: Error running static method java.lang.IllegalArgumentException: Bad URL format at org.apache.hive.jdbc.Utils.parseURL(Utils.java:185) at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:84)
What is the error can any one explain me how to solve it?
I am able to connect after changing the jars version.
and also I changed the driver name "org.apache.hive.jdbc.HiveDriver".