Apache Spark Catalyst Parser SQL Exception - sql

The following scala code (you could run it in a scala worksheet)
import org.apache.spark.sql.catalyst.parser._
import org.apache.spark.sql.internal.SQLConf
val sqlParser = new CatalystSqlParser(SQLConf.get)
val query = """select col1 from table1;"""
//import sqlParser.astBuilder
val parsed = sqlParser.parseExpression(query)
//println(astBuilder.toString)
println(s"parsed: ${parsed.prettyJson}")
throws what looks like an absurd error -
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'from' expecting {<EOF>, '-'}(line 1, pos 12)
== SQL ==
select col1 from table1;
------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseExpression(ParseDriver.scala:49)
... 37 elided
Has anybody seen this before? I saw the error message on SO, but this is a very simple query, and it shouldn't be erroring out this way.

I am not familiar with calling CatalystSqlParser directly, but the SparkSession.sql method seems happy with your query. Perhaps using that suits your needs?
The following is successfully parsed:
val query = """select col1 from table1;"""
val df = spark.sql(query)
(CatalystSqlParser doesn't appear to be part of the documented API).

Related

Passing table name and list of values as argument to psycopg2 query

Context
I would like to pass a table name along with query parameters in a psycopg2 query in a python3 function.
If I understand correctly, I should not format the query string using python .format() method prior to the execution of the query, but let psycopg2 do that.
Issue
I can't succeed passing both the table name and the parameters as argument to my query string.
Code sample
Here is a code sample:
import psycopg2
from psycopg2 import sql
connection_string = "host={} port={} dbname={} user={} password={}".format(*PARAMS.values())
conn = psycopg2.connect(connection_string)
curs = conn.cursor()
table = 'my_customers'
cities = ["Paris", "London", "Madrid"]
data = (table, tuple(customers))
query = sql.SQL("SELECT * FROM {} WHERE city = ANY (%s);")
curs.execute(query, data)
rows = cursLocal.fetchall()
Error(s)
But I get the following error message:
TypeError: not all arguments converted during string formatting
I also tried to replace the data definition by:
data = (sql.Identifier(table), tuple(object_types))
But then this error pops:
ProgrammingError: can't adapt type 'Identifier'
If I put ANY {} instead of ANY (%s) in the query string, in both previous cases this error shows:
SyntaxError: syntax error at or near "{"
LINE 1: ...* FROM {} WHERE c...
^
Initially, I didn't used the sql module and I was trying to pass the data as the second argument to the curs.execute() method, but then the table name was single quoted in the command, which caused troubles. So I gave the sql module a try, hopping it's not a deprecated habit.
If possible, I would like to keep the curly braces {} for parameters substitution instead of %s, except if it's a bad idea.
Environment
Ubuntu 18.04 64 bit 5.0.0-37-generic x86_64 GNU/Linux
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
psycopg2.__version__
'2.8.4 (dt dec pq3 ext lo64)'
You want something like:
table = 'my_customers'
cities = ["Paris", "London", "Madrid"]
query = sql.SQL("SELECT * FROM {} WHERE city = ANY (%s)").format(sql.Identifier(table))
curs.execute(query, (cities,))
rows = cursLocal.fetchall()

Passing a parameter to a sql query using pyodbc failing

I have read dozens of similar posts and tried everything but I still get an error message when trying to pass a parameter to a simple query using pyodbc. Apologies if there is an answer to this elsewhere but I cannot find it
I have a very simple table:
select * from Test
yields
a
b
c
This works fine:
import pyodbc
import pandas
connection = pyodbc.connect('DSN=HyperCube SYSTEST',autocommit=True)
result = pandas.read_sql("""select * from Test where value = 'a'""",connection,params=None)
print(result)
result:
value
0 a
However if I try to do the where clause with a parameter it fails
result = pandas.read_sql("""select * from Test where value = ?""",connection,params='a')
yields
Error: ('01S02', '[01S02] Unknown column/parameter value (9001) (SQLPrepare)')
I also tried this
cursor = connection.cursor()
cursor.execute("""select * from Test where value = ?""",['a'])
pyodbcResults = cursor.fetchall()
and still received the same error
Does anyone know what is going on? Could it be an issue with the database I am querying?
PS. I looked at the following post and the syntax there in the first part of answer 9 where dates are passed by strings looks identical to what I am doing
pyodbc the sql contains 0 parameter markers but 1 parameters were supplied' 'hy000'
Thanks
pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)[https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html]ΒΆ
params : list, tuple or dict, optional, default: None
example:
cursor.execute("select * from Test where value = %s",['a'])
or Named arguments example:
result = pandas.read_sql(('select * from Test where value = %(par)s'),
db,params={"par":'p'})
in pyodbc write parms directly after sql parameter:
cursor.execute(sql, *parameters)
for example:
onepar = 'a'
cursor.execute("select * from Test where value = ?", onepar)
cursor.execute("select a from tbl where b=? and c=?", x, y)

Spark sql SQLContext

I'm trying to select data from MSSQL database via SQLContext.sql in Spark application.
Connection works but I'm not able to select data from table, because it always fail on table name.
Here is my code:
val prop=new Properties()
val url2="jdbc:jtds:sqlserver://servername;instance=MSSQLSERVER;user=sa;password=Pass;"
prop.setProperty("user","username")
prop.setProperty("driver" , "net.sourceforge.jtds.jdbc.Driver")
prop.setProperty("password","mypassword")
val test=sqlContext.read.jdbc(url2,"[dbName].[dbo].[Table name]",prop)
sqlContext.sql("""
SELECT *
FROM 'dbName.dbo.Table name'
""")
I tried table name without (') or [dbName].[dbo].[Table name] but still the same ....
Exception in thread "main" java.lang.RuntimeException: [3.14] failure:
``union'' expected but `.' found
dependencies:
// https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.1" //%"provided"
// https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector_2.10
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.6.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1" //%"provided"
I think the problem in your code is that the query you pass to sqlContext has no access to the original table in the source database. It has only access to the tables saved within the sql context, for example with df.write.saveAsTable() or with df.registerTempTable() (df.createTempView in Spark 2+).
So, in your specific case, I can suggest a couple of options:
1) if you want the query to be executed on the source database with the exact syntax of your database SQL, you can pass the query to the "dbtable" argument:
val query = "SELECT * FROM dbName.dbo.TableName"
val df = sqlContext.read.jdbc(url2, s"($query) AS subquery", prop)
df.show
Note that the query needs to be in parentheses, because it will be passed to a "FROM" clause, as specified in the docs:
dbtable: The JDBC table that should be read. Note that anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.
2) If you don't need to run the query on the source database, you can just pass the table name and then create a temp view in the sqlContext:
val table = sqlContext.read.jdbc(url2, "dbName.dbo.TableName", prop)
table.registerTempTable("temp_table")
val df = sqlContext.sql("SELECT * FROM temp_table")
// or sqlContext.table("temp_table")
df.show()

how to write not like queries in spark sql using scala api?

I want to convert the following query to Spark SQL using Scala API:
select ag.part_id name from sample c join testing ag on c.part=ag.part and concat(c.firstname,c.lastname) not like 'Dummy%'
Any ideas?
Thanks in advance
Maybe this would work:
import org.apache.spark.sql.functions._
val c = sqlContext.table("sample")
val ag = sqlContext.table("testing")
val fullnameCol = concat(c("firstname"), c("lastname))
val resultDF = c.join(ag, (c("part") === ag("part")) && !fullnameCol.like("Dummy%"))
For more information about the functions I used above, please check the following links:
org.apache.spark.sql.functions
org.apache.spark.sql.DataFrame
org.apache.spark.sql.Column
You mean this
df.filter("filed1 not like 'Dummy%'").show
or
df.filter("filed1 ! like 'Dummy%'").show
use like this
df.filter(!'col1.like("%COND%").show

QueryDSL like operation SimplePath

Similarly to this question I would like to perform an SQL "like" operation using my own user defined type called "AccountNumber".
The QueryDSL Entity class the field which defines the column looks like this:
public final SimplePath<com.myorg.types.AccountNumber> accountNumber;
I have tried the following code to achieve a "like" operation in SQL but get an error when the types are compared before the query is run:
final Path path=QBusinessEvent.businessEvent.accountNumber;
final Expression<AccountNumber> constant = Expressions.constant(AccountNumber.valueOfWithWildcard(pRegion.toString()));
final BooleanExpression booleanOperation = Expressions.booleanOperation(Ops.STARTS_WITH, path, constant);
expressionBuilder.and(booleanOperation);
The error is:
org.springframework.dao.InvalidDataAccessApiUsageException: Parameter value [7!%%] did not match expected type [com.myorg.types.AccountNumber (n/a)]
Has anyone ever been able to achieve this using QueryDSL/JPA combination?
Did you try using a String constant instead?
Path<?> path = QBusinessEvent.businessEvent.accountNumber;
Expression<String> constant = Expressions.constant(pRegion.toString());
Predicate predicate = Expressions.predicate(Ops.STARTS_WITH, path, constant);
In the end, I was given a tip by my colleague to do the following:
if (pRegion != null) {
expressionBuilder.and(Expressions.booleanTemplate("{0} like concat({1}, '%')", qBusinessEvent.accountNumber, pRegion));
}
This seems to do the trick!
It seems like there is bug/ambiguity. In my case, I need to search by couple fields with different types (String, Number), e.g. SQL looks like:
SELECT * FROM table AS t WHERE t.name = "%some%" OR t.id = "%some%";
My code looks like:
BooleanBuilder where = _getDefaultPredicateBuilder();
BooleanBuilder whereLike = new BooleanBuilder();
for(String likeField: _likeFields){
whereLike = whereLike.or(_pathBuilder.getString(likeField).contains(likeValue));
}
where.and(whereLike);
If first _likeFields is type of String - request works fine, otherwise it throws Exception.