How to pass dynamic variable in Scenario outline in Karate DSL - karate

I have a situation where I need to pass a different variety of Date type variables in Karate.
For this, I created a JAVA method and calling it in a feature file as shown below.
I read that its cucumber limitation which can not support dynamic variables in Scenario Outline. I also read https://github.com/intuit/karate#the-karate-way but somehow, I am not getting any idea how to solve the below situation.
Scenario Outline: test scenario outline
* def testData = Java.type('zoomintegration.utils.DataGenerator')
* def meetDate = testData.futureDate(2)
* def jsonPayLoad =
"""
{
"meetingSource": <meetingSource>,
"hostId": <host>,
"topic": <topic>,
"agenda": <topic>,
"startDateTime": <meetingDate>",
"timeZone": "Asia/Calcutta",
"duration": <duration>
}
"""
* print jsonPayLoad
Examples:
|meetingSource|host|topic|duration|meetingDate|
|ZOOM | abc |Quarter meeting|30|0|
|SKYPE | abc |Quarter meeting|30|'1980-08-12'|
|MS | abc |Quarter meeting|30|'2030-12-12'|

Below code works for me:
Scenario Outline: test scenario outline
* def testData = Java.type('zoomintegration.utils.DataGenerator')
* def meetDate = testData.futureDate(<meetingDate>)
* def jsonPayLoad =
"""
{
"meetingSource": <meetingSource>,
"hostId": <host>,
"topic": <topic>,
"agenda": <topic>,
"startDateTime": #(meetDate),
"timeZone": "Asia/Calcutta",
"duration": <duration>
}
"""
* print jsonPayLoad
Examples:
| meetingSource | host | topic | duration | meetingDate |
| ZOOM | abc | Quarter meeting | 30 | 1 |
| SKYPE | abc | Quarter meeting | 30 | 2 |
| MS | abc | Quarter meeting | 30 | 3 |

Feature: test something
Scenario Outline: test scenario outline
* def testData = Java.type('zoomintegration.utils.DataGenerator')
* def meetDate = testData.futureDate(2)
* def jsonPayLoad =
"""
{
"meetingSource": <meetingSource>,
"hostId": <host>,
"topic": <topic>,
"agenda": <topic>,
"startDateTime": <meetingDate>,
"timeZone": "Asia/Calcutta",
"duration": <duration>
}
"""
* eval if (jsonPayLoad.startDateTime == 0) jsonPayLoad.startDateTime = meetDate
* print jsonPayLoad
Examples:
|meetingSource|host|topic|duration|meetingDate|
|ZOOM | abc |Quarter meeting|30|0|
|SKYPE | abc |Quarter meeting|30|'1980-08-12'|
|MS | abc |Quarter meeting|30|'1999-08-12'|

You must be missing something, and it looks like you have a few typos.
Let's take a simple example that works for me:
Feature:
Background:
* def fun = function(x){ return 'hello ' + x }
Scenario Outline:
* match fun(name) == 'hello foo'
Examples:
| name |
| foo |
So the point is - you can plug in a function that uses data from your Examples table to dynamically generate more data.
If you are still stuck, please follow this process: https://github.com/intuit/karate/wiki/How-to-Submit-an-Issue

Related

karate.exec() breaks the argument by space when passed via table

When I pass a text or string as a variable from table to feature, for some reason karate.exec is breaking the argument based on space.
I have main feature where the code is
#Example 1
* def calcModel = '":: decimal calcModel = get_calc_model();"'
#Example 2
* text calcModel =
"""
:: decimal calcModel = get_calc_model();
return calcModel;
"""
* table calcDetails
| field | code | desc |
| 31 | '":: return get_name();" | '"this is name"' |
| 32 | calcModel | '"this is the calc model"' |
* call read('classpath:scripts/SetCalcModel.feature') calcDetails
Inside SetCalcModel.feature the code is
* def setCalcModel = karate.exec('/opt/local/SetCalcModel.sh --timeout 100 -field ' + field + ' -code ' + code + ' -description '+desc)
For row 1 of the table it works fine and executes following command:
command: [/opt/local/SetCalcModel.sh, --timeout, 100, -field, 31, -code, :: decimal calcModel = get_calc_model();, -description, this is the calc model], working dir: null
For row 2 it breaks with following command:
command: [/opt/local/SetCalcModel.sh, --timeout, 100, -field, 32, -code, ::, decimal, calcModel, =, get_calc_model();, -description, this is the calc model], working dir: null
I have tried this with example 1 and 2 and it keeps doing the same.
I have also tried passing line json as argument to karate.exec(), that also has same issue.
Is there a workaround here??
There is a way to pass arguments as an array of strings, use that approach instead.
For example:
* karate.exec({ args: [ 'curl', 'https://httpbin.org/anything' ] })
Refer: https://stackoverflow.com/a/73230200/143475

data frame parsing column scala

I have some problem with parsing Dataframe
val result = df_app_clickstream.withColumn(
"attributes",
explode(expr(raw"transform(attributes, x -> str_to_map(regexp_replace(x, '{\\}',''), ' '))"))
).select(
col("userId"),
col("attributes").getField("campaign_id").alias("app_campaign_id"),
col("attributes").getField("channel_id").alias("app_channel_id")
)
result.show()
I have input like this :
-------------------------------------------------------------------------------
| userId | attributes |
-------------------------------------------------------------------------------
| f6e8252f-b5cc-48a4-b348-29d89ee4fa9e |{'campaign_id':082,'channel_id':'Chnl'}|
-------------------------------------------------------------------------------
and need to get output like this :
--------------------------------------------------------------------
| userId | campaign_id | channel_id|
--------------------------------------------------------------------
| f6e8252f-b5cc-48a4-b348-29d89ee4fa9e | 082 | Facebook |
--------------------------------------------------------------------
but have error
you can try below solution
import org.apache.spark.sql.functions._
val data = Seq(("f6e8252f-b5cc-48a4-b348-29d89ee4fa9e", """{'campaign_id':082, 'channel_id':'Chnl'}""")).toDF("user_id", "attributes")
val out_df = data.withColumn("splitted_col", split(regexp_replace(col("attributes"),"'|\\}|\\{", ""), ","))
.withColumn("campaign_id", split(element_at(col("splitted_col"), 1), ":")(1))
.withColumn("channel_id", split(element_at(col("splitted_col"), 2), ":")(1))
out_df.show(truncate = false)
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+
|user_id |attributes |splitted_col |campaign_id|channel_id|
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+
|f6e8252f-b5cc-48a4-b348-29d89ee4fa9e|{'campaign_id':082, 'channel_id':'Chnl'}|[campaign_id:082, channel_id:Chnl]|082 |Chnl |
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+

Transform column with int flags to array of strings in pyspark

I have a dataframe with a column called "traits" which is an integer composed of multiple flags.
I need to convert this column to a list of strings (for elastic search indexing). Conversion looks like this.
TRAIT_0 = 0
TRAIT_1 = 1
TRAIT_2 = 2
def flag_to_list(flag: int) -> List[str]:
trait_list = []
if flag & (1 << TRAIT_0):
trait_list.append("TRAIT_0")
elif flag & (1 << TRAIT_1):
trait_list.append("TRAIT_1")
elif flag & (1 << TRAIT_2):
trait_list.append("TRAIT_2")
return trait_list
What is the most efficient way of doing this transformation in pyspark? I saw lots of examples on how to do concatenation and splitting of strings, but not an operation like this.
Using pyspark vesion 2.4.5
Input json looks like this:
{ "name": "John Doe", "traits": 5 }
Output json should look like this:
{ "name": "John Doe", "traits": ["TRAIT_0", "TRAIT_2"] }
IIUC, you can try SparkSQL built-in functions: (1) use conv + split to convert integer(base-10) -> binary(base-2) -> string -> array of strings(reversed), (2) based on 0 or 1 values and their array indices to filter and transform the array into the corresponding array of named traits:
from pyspark.sql.functions import expr
df = spark.createDataFrame([("name1", 5),("name2", 1),("name3", 0),("name4", 12)], ['name', 'traits'])
#DataFrame[name: string, traits: bigint]
traits = [ "Traits_{}".format(i) for i in range(8) ]
traits_array = "array({})".format(",".join("'{}'".format(e) for e in traits))
# array('Traits_0','Traits_1','Traits_2','Traits_3','Traits_4','Traits_5','Traits_6','Traits_7')
sql_expr = """
filter(
transform(
/* convert int -> binary -> string -> array of strings, and then reverse the array */
reverse(split(string(conv(traits,10,2)),'(?!$)')),
/* take the corresponding items from the traits_array when value > 0, else NULL */
(x,i) -> {}[IF(x='1',i,NULL)]
),
/* filter out NULL items from the array */
y -> y is not NULL
) AS trait_list
""".format(traits_array)
# filter(
# transform(
# reverse(split(string(conv(traits,10,2)),'(?!$)')),
# (x,i) -> array('Traits_0','Traits_1','Traits_2','Traits_3','Traits_4','Traits_5','Traits_6','Traits_7')[IF(x='1',i,NULL)]
# ),
# y -> y is not NULL
# )
df.withColumn("traits_list", expr(sql_expr)).show(truncate=False)
+-----+------+--------------------+
|name |traits|traits_list |
+-----+------+--------------------+
|name1|5 |[Traits_0, Traits_2]|
|name2|1 |[Traits_0] |
|name3|0 |[] |
|name4|12 |[Traits_2, Traits_3]|
+-----+------+--------------------+
Below is the result after running reverse(split(string(conv(traits,10,2)),'(?!$)')), notice that the split-pattern (?!$) is used to avoid a NULL shown as the last array item.
df.selectExpr("*", "reverse(split(string(conv(traits,10,2)),'(?!$)')) as t1").show()
+-----+------+------------+
| name|traits| t1|
+-----+------+------------+
|name1| 5| [1, 0, 1]|
|name2| 1| [1]|
|name3| 0| [0]|
|name4| 12|[0, 0, 1, 1]|
+-----+------+------------+
We can define a UDF to wrap your function and then call it. This is some sample code:
from typing import List
from pyspark.sql.types import ArrayType, StringType
TRAIT_0 = 0
TRAIT_1 = 1
TRAIT_2 = 2
def flag_to_list(flag: int) -> List[str]:
trait_list = []
if flag & (1 << TRAIT_0):
trait_list.append("TRAIT_0")
elif flag & (1 << TRAIT_1):
trait_list.append("TRAIT_1")
elif flag & (1 << TRAIT_2):
trait_list.append("TRAIT_2")
return trait_list
flag_to_list_udf = udf(lambda x: None if x is None else flag_to_list(x),
ArrayType(StringType()))
# Create dummy data to test
data = [
{ "name": "John Doe", "traits": 5 },
{ "name": "Jane Doe", "traits": 2 },
{ "name": "Jane Roe", "traits": 0 },
{ "name": "John Roe", "traits": 6 },
]
df = spark.createDataFrame(data, 'name STRING, traits INT')
df.show()
# +--------+------+
# | name|traits|
# +--------+------+
# |John Doe| 5|
# |Jane Doe| 2|
# |Jane Roe| 0|
# |John Roe| 6|
# +--------+------+
df = df.withColumn('traits_processed', flag_to_list_udf(df['traits']))
df.show()
# +--------+------+----------------+
# | name|traits|traits_processed|
# +--------+------+----------------+
# |John Doe| 5| [TRAIT_0]|
# |Jane Doe| 2| [TRAIT_1]|
# |Jane Roe| 0| []|
# |John Roe| 6| [TRAIT_1]|
# +--------+------+----------------+
If you don't want to create a new column, you can replace traits_processed with traits.

Spark dataframe inner join without duplicate match

I want to join two dataframes based on certain condition is spark scala. However the catch is if row in df1 matches any row in df2, it should not try to match same row of df1 with any other row in df2. Below is sample data and outcome I am trying to get.
DF1
--------------------------------
Emp_id | Emp_Name | Address_id
1 | ABC | 1
2 | DEF | 2
3 | PQR | 3
4 | XYZ | 1
DF2
-----------------------
Address_id | City
1 | City_1
1 | City_2
2 | City_3
REST | Some_City
Output DF
----------------------------------------
Emp_id | Emp_Name | Address_id | City
1 | ABC | 1 | City_1
2 | DEF | 2 | City_3
3 | PQR | 3 | Some_City
4 | XYZ | 1 | City_1
Note:- REST is like wild card. Any value can be equal to REST.
So in above sample emp_name "ABC" can match with City_1, City_2 or Some_City. Output DF contains only City_1 because it finds it first.
You seem to have a custom logic for your join. Basically I've been to come up with the below UDF.
Note that you may want to change the logic for the UDF as per your requirement.
import spark.implicits._
import org.apache.spark.sql.functions.to_timestamp
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions.first
//dataframe 1
val df_1 = Seq(("1", "ABC", "1"), ("2", "DEF", "2"), ("3", "PQR", "3"), ("4", "XYZ", "1")).toDF("Emp_Id", "Emp_Name", "Address_Id")
//dataframe 2
val df_2 = Seq(("1", "City_1"), ("1", "City_2"), ("2", "City_3"), ("REST","Some_City")).toDF("Address_Id", "City_Name")
// UDF logic
val join_udf = udf((a: String, b: String) => {
(a,b) match {
case ("1", "1") => true
case ("1", _) => false
case ("2", "2") => true
case ("2", _) => false
case(_, "REST") => true
case(_, _) => false
}})
val dataframe_join = df_1.join(df_2, join_udf(df_1("Address_Id"), df_2("Address_Id")), "inner").drop(df_2("Address_Id"))
.orderBy($"City_Name")
.groupBy($"Emp_Id", $"Emp_Name", $"Address_Id")
.agg(first($"City_Name"))
.orderBy($"Emp_Id")
dataframe_join.show(false)
Basically post applying UDF, what you get is all possible combinations of the matches.
Post that when you apply groupBy and make use of first function of agg, you would only get the filtered values as what you are looking for.
+------+--------+----------+-----------------------+
|Emp_Id|Emp_Name|Address_Id|first(City_Name, false)|
+------+--------+----------+-----------------------+
|1 |ABC |1 |City_1 |
|2 |DEF |2 |City_3 |
|3 |PQR |3 |Some_City |
|4 |XYZ |1 |City_1 |
+------+--------+----------+-----------------------+
Note that I've made use of Spark 2.3 and hope this helps!
{
import org.apache.spark.sql.{SparkSession}
import org.apache.spark.sql.functions._
object JoinTwoDataFrame extends App {
val spark = SparkSession.builder()
.master("local")
.appName("DataFrame-example")
.getOrCreate()
import spark.implicits._
val df1 = Seq(
(1, "ABC", "1"),
(2, "DEF", "2"),
(3, "PQR", "3"),
(4, "XYZ", "1")
).toDF("Emp_id", "Emp_Name", "Address_id")
val df2 = Seq(
("1", "City_1"),
("1", "City_2"),
("2", "City_3"),
("REST", "Some_City")
).toDF("Address_id", "City")
val restCity: Option[String] = Some(df2.filter('Address_id.equalTo("REST")).select('City).first()(0).toString)
val res = df1.join(df2, df1.col("Address_id") === df2.col("Address_id") , "left_outer")
.select(
df1.col("Emp_id"),
df1.col("Emp_Name"),
df1.col("Address_id"),
df2.col("City")
)
.withColumn("city2", when('City.isNotNull, 'City).otherwise(restCity.getOrElse("")))
.drop("City")
.withColumnRenamed("city2", "City")
.orderBy("Address_id", "City")
.groupBy("Emp_id", "Emp_Name", "Address_id")
.agg(collect_list("City").alias("cityList"))
.withColumn("City", 'cityList.getItem(0))
.drop("cityList")
.orderBy("Emp_id")
res.show(false)
// +------+--------+----------+---------+
// |Emp_id|Emp_Name|Address_id|City |
// +------+--------+----------+---------+
// |1 |ABC |1 |City_1 |
// |2 |DEF |2 |City_3 |
// |3 |PQR |3 |Some_City|
// |4 |XYZ |1 |City_1 |
// +------+--------+----------+---------+
}
}

Karate - Can i send multiple dynamic data in Scenario Outline

Below is the code :
Feature:
Background:
* def Json = Java.type('Json')
* def dq = new Json()
* def result = dq.makeJson()
* def Sku = dq.makeSku()
Scenario Outline: id : <id>
* print '<id>' #From result
* print '<abc>' #From Sku
Examples:
|result|Sku|
The following is the output I need. Is it possible in Karate?
If i have id = {1,2} and abc = {3,4} i want output to be
id = 1 and abc = 3
id = 1 and abc = 4
id = 2 and abc = 3
id = 2 and abc = 4
Also can this be done for more than 2 variable inputs as well?
Write the permutation logic yourself, build an array with the results.
Note that you can iterate over key-value pairs of a JSON using karate.forEach()
Then either use a data-driven loop call (of a second feature file):
# array can be [{ id: 1, abc: 3 }, {id: 1, abc: 4 }] etc
* def result = call read('second.feature') array
Or a dynamic scenario outline:
Scenario Outline:
* print 'id = <id> and abc = <abc>'
Examples:
| array |
Refer:
https://github.com/intuit/karate#data-driven-features
https://github.com/intuit/karate#dynamic-scenario-outline