Confused about behavior of setResultsName in Pyparsing - sql

I am trying to parse a few SQL statements. Here is a sample:
select
ms.member_sk a,
dd.date_sk b,
st.subscription_type,
(SELECT foo FROM zoo) e
from dim_member_subscription_all p,
dim_subs_type
where a in (select moo from t10)
I am interested in getting tables only at this time. So I would like to see
[zoo, dim_member_subscription_all, dim_subs_type] & [t10]
I have put together a small script looking at Paul McGuire's example
#!/usr/bin/env python
import sys
import pprint
from pyparsing import *
pp = pprint.PrettyPrinter(indent=4)
semicolon = Combine(Literal(';') + lineEnd)
comma = Literal(',')
lparen = Literal('(')
rparen = Literal(')')
update_kw, volatile_kw, create_kw, table_kw, as_kw, from_kw, \
where_kw, join_kw, left_kw, right_kw, cross_kw, outer_kw, \
on_kw , insert_kw , into_kw= \
map(lambda x: Keyword(x, caseless=True), \
['UPDATE', 'VOLATILE', 'CREATE', 'TABLE', 'AS', 'FROM',
'WHERE', 'JOIN' , 'LEFT', 'RIGHT' , \
'CROSS', 'OUTER', 'ON', 'INSERT', 'INTO'])
select_kw = Keyword('SELECT', caseless=True) | Keyword('SEL' , caseless=True)
reserved_words = (update_kw | volatile_kw | create_kw | table_kw | as_kw |
select_kw | from_kw | where_kw | join_kw |
left_kw | right_kw | cross_kw | on_kw | insert_kw |
into_kw)
ident = ~reserved_words + Word(alphas, alphanums + '_')
table = Combine(Optional(ident + Literal('.')) + ident)
column = Combine(Optional(ident + Literal('.')) + (ident | Literal('*')))
column_alias = Optional(Optional(as_kw).suppress() + ident)
table_alias = Optional(Optional(as_kw).suppress() + ident).suppress()
select_stmt = Forward()
nested_table = lparen.suppress() + select_stmt + rparen.suppress() + table_alias
table_list = delimitedList((nested_table | table) + table_alias)
column_list = delimitedList((nested_table | column) + column_alias)
txt = """
select
ms.member_sk a,
dd.date_sk b,
st.subscription_type,
(SELECT foo FROM zoo) e
from dim_member_subscription_all p,
dim_subs_type
where a in (select moo from t10)
"""
select_stmt << select_kw.suppress() + column_list + from_kw.suppress() + \
table_list.setResultsName('tables', listAllMatches=True)
print txt
for token in select_stmt.searchString(txt):
pp.pprint(token.asDict())
I am getting the following nested output. Can anybody please help me understand what I am doing wrong?
{ 'tables': ([(['zoo'], {}), (['dim_member_subscription_all', 'dim_subs_type'], {})], {})}
{ 'tables': ([(['t10'], {})], {})}

searchString will return a list of all matching ParseResults - you can see the tables value of each using:
for token in select_stmt.searchString(txt):
print token.tables
Giving:
[['zoo'], ['dim_member_subscription_all', 'dim_subs_type']]
[['t10']]
So searchString found two SELECT statements.
Recent versions of pyparsing support summing this list into a single consolidated using Python builtin sum. Accessing the tables value of this consolidated result looks like this:
print sum(select_stmt.searchString(txt)).tables
[['zoo'], ['dim_member_subscription_all', 'dim_subs_type'], ['t10']]
I think the parser is doing all you want, you just need to figure out how to process the returned results.
For further debugging, you should start using the dump method on ParseResults to see what you are getting, which will print the nested list of returned tokens, and then a hierarchical tree of all named results. For your example:
for token in select_stmt.searchString(txt):
print token.dump()
print
prints:
['ms.member_sk', 'a', 'dd.date_sk', 'b', 'st.subscription_type', 'foo', 'zoo', 'dim_member_subscription_all', 'dim_subs_type']
- tables: [['zoo'], ['dim_member_subscription_all', 'dim_subs_type']]
['moo', 't10']
- tables: [['t10']]

Related

karate.exec() breaks the argument by space when passed via table

When I pass a text or string as a variable from table to feature, for some reason karate.exec is breaking the argument based on space.
I have main feature where the code is
#Example 1
* def calcModel = '":: decimal calcModel = get_calc_model();"'
#Example 2
* text calcModel =
"""
:: decimal calcModel = get_calc_model();
return calcModel;
"""
* table calcDetails
| field | code | desc |
| 31 | '":: return get_name();" | '"this is name"' |
| 32 | calcModel | '"this is the calc model"' |
* call read('classpath:scripts/SetCalcModel.feature') calcDetails
Inside SetCalcModel.feature the code is
* def setCalcModel = karate.exec('/opt/local/SetCalcModel.sh --timeout 100 -field ' + field + ' -code ' + code + ' -description '+desc)
For row 1 of the table it works fine and executes following command:
command: [/opt/local/SetCalcModel.sh, --timeout, 100, -field, 31, -code, :: decimal calcModel = get_calc_model();, -description, this is the calc model], working dir: null
For row 2 it breaks with following command:
command: [/opt/local/SetCalcModel.sh, --timeout, 100, -field, 32, -code, ::, decimal, calcModel, =, get_calc_model();, -description, this is the calc model], working dir: null
I have tried this with example 1 and 2 and it keeps doing the same.
I have also tried passing line json as argument to karate.exec(), that also has same issue.
Is there a workaround here??
There is a way to pass arguments as an array of strings, use that approach instead.
For example:
* karate.exec({ args: [ 'curl', 'https://httpbin.org/anything' ] })
Refer: https://stackoverflow.com/a/73230200/143475

Create a select with a struct within a list pyspark

I have the following Dataframe View df_view:
+---+-----+
| b | c |
+---+-----+
| 1 | 3 |
+---+-----+
I needed to select this data to form a key with a list of structs.
{
"a": [
{
"b": 1,
"c": 3
}
]
}
With the select below it only creates a struct but not the list
df = spark.sql(
'''
SELECT
named_struct(
'b', b,
'c', c
) AS a
FROM df_view
'''
)
And after that I'll save to the database
df.write
.mode("overwrite")
.format("com.microsoft.azure.cosmosdb.spark")
.options(**cosmosConfig)
.save()
How is it possible to create a struct inside a list in SQL?
You can wrap the struct in array():
df = spark.sql(
'''
SELECT
array(named_struct(
'b', b,
'c', c
)) AS a
FROM df_view
'''
)

Adding a new row to empty dataframe located in data lake

I created a empty dataframe table to location at Delta by using this code below:
deltaResultPath = "/ml/streaming-analysis/delta/Result"
# Create Delta Lake table
sqlq = "CREATE TABLE stockDailyPrices_delta USING DELTA LOCATION '" + deltaResultPath + "'"
spark.sql(sqlq)
I am new to spark and do not fully understand sparkSQL code. What I want to do is instead of inserting values from another dataframe, I would like to add values generated in python script.
Something like modifying the code from:
insert_sql = "insert into stockDailyPrices_delta select f.* from stockDailyPrices f where f.price_date >= '" + price_date_min.strftime('%Y-%m-%d') + "' and f.price_date <= '" + price_date_max.strftime('%Y-%m-%d') + "'"
spark.sql(insert_sql)
to
Time = 10
cpu_temp = 3
dsp_temp = 5
insert_sql = "insert into df (Time, cpu_temp, dsp_temp) values (%s, %s, %s)"
spark.sql(insert_sql)
However, I see the error following:
org.apache.spark.sql.catalyst.parser.ParseException:
ParseException: "\nmismatched input 'Time' expecting {'(', 'SELECT', 'FROM', 'DESC', 'VALUES', 'TABLE', 'INSERT', 'DESCRIBE', 'MAP', 'MERGE', 'UPDATE', 'REDUCE'}(line 1, pos 16)\n\n== SQL ==\ninsert into df (Time, cpu_temp, dsp_temp) values (%s, %s, %s)\n----------------^^^\n"
How can I fix this code?
I could make it work with something like this
spark.sql("insert into Result_delta select {} as Time, {} as cpu_temp, {} as dsp_temp".format(Time, cpu_temp, dsp_temp))

Karate - Can i send multiple dynamic data in Scenario Outline

Below is the code :
Feature:
Background:
* def Json = Java.type('Json')
* def dq = new Json()
* def result = dq.makeJson()
* def Sku = dq.makeSku()
Scenario Outline: id : <id>
* print '<id>' #From result
* print '<abc>' #From Sku
Examples:
|result|Sku|
The following is the output I need. Is it possible in Karate?
If i have id = {1,2} and abc = {3,4} i want output to be
id = 1 and abc = 3
id = 1 and abc = 4
id = 2 and abc = 3
id = 2 and abc = 4
Also can this be done for more than 2 variable inputs as well?
Write the permutation logic yourself, build an array with the results.
Note that you can iterate over key-value pairs of a JSON using karate.forEach()
Then either use a data-driven loop call (of a second feature file):
# array can be [{ id: 1, abc: 3 }, {id: 1, abc: 4 }] etc
* def result = call read('second.feature') array
Or a dynamic scenario outline:
Scenario Outline:
* print 'id = <id> and abc = <abc>'
Examples:
| array |
Refer:
https://github.com/intuit/karate#data-driven-features
https://github.com/intuit/karate#dynamic-scenario-outline

In Karate how can we verify if the query with where condition has two results?

I have a scenario where the SQL query with a where condition will result in 2 Rows. How can assert if it is resulting in 2 rows? At present, the karate is throwing an error org.springframework.dao.IncorrectResultSizeDataAccessException: Incorrect result size: expected 1, actual 2
* def response = db.readRow( 'SELECT * from database_name.table_name where id = \"'+ id + '\";')
I believe this should help you : https://github.com/intuit/karate#schema-validation
* def foo = ['bar', 'baz']
# should be an array of size 2
* match foo == '#[2]'
Also, you should use db.readRows instead of db.readRow.
* def dogs = db.readRows('SELECT * FROM DOGS')
* match dogs contains { ID: '#(id)', NAME: 'Scooby' }
* def dog = db.readRow('SELECT * FROM DOGS D WHERE D.ID = ' + id)
* match dog.NAME == 'Scooby'