Scala Spark: Parse SQL string to Column - sql

I have two functions, foo and bar, that I want to write like follows:
def foo(df : DataFrame, conditionString : String) =
val conditionColumn : Column = something(conditionString) //help me define "something"
bar(df, conditionColumn)
}
def bar(df : DataFrame, conditionColumn : Column) = {
df.where(conditionColumn)
}
Where condition is a sql string like "person.age >= 18 AND person.citizen == true" or something.
Because reasons, I don't want to change the type signatures here. I feel this should work because if I could change the type signatures, I could just write:
def foobar(df : DataFrame, conditionString : String) = {
df.where(conditionString)
}
As .where is happy to accept a sql string expression.
So, how can I turn a string representing a column expression into a column? If the expression were just the name of a single column in df I could just do col(colName), but that doesn't seem to take the range of expressions that .where does.
If you need more context for why I'm doing this, I'm working on a databricks notebook that can only accept string arguments (and needs to take a condition as an argument), which calls a library I want to take column-typed arguments.

You can use functions.expr:
def expr(expr: String): Column
Parses the expression string into the column that it represents

Related

Convert String into list of Pairs: Kotlin

Is there an easier approach to convert an Intellij IDEA environment variable into a list of Tuples?
My environment variable for Intellij is
GROCERY_LIST=[("egg", "dairy"),("chicken", "meat"),("apple", "fruit")]
The environment variable gets accessed into Kotlin file as String.
val g_list = System.getenv("GROCERY_LIST")
Ideally I'd like to iterate over g_list, first element being ("egg", "dairy") and so on.
And then ("egg", "dairy") is a tuple/pair
I have tried to split g_list by comma that's NOT inside quotes i.e
val splitted_list = g_list.split(",(?=(?:[^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*\$)".toRegex()).toTypedArray()
this gives me first element as [("egg", second element as "dairy")] and so on.
Also created a data class and tried to map the string into data class using jacksonObjectMapper following this link:
val mapper = jacksonObjectMapper()
val g_list = System.getenv("GROCERY_LIST")
val myList: List<Shopping> = mapper.readValue(g_list)
data class Shopping(val a: String, val b: String)
You can create a regular expression to match all strings in your environmental variable.
Regex::findAll()
Then loop through the strings while creating a list of Shopping objects.
// Raw data set.
val groceryList: String = "[(\"egg\", \"dairy\"),(\"chicken\", \"meat\"),(\"apple\", \"fruit\")]"
// Build regular expression.
val regex = Regex("\"([\\s\\S]+?)\"")
val matchResult = regex.findAll(groceryList)
val iterator = matchResult.iterator()
// Create a List of `Shopping` objects.
var first: String = "";
var second: String = "";
val shoppingList = mutableListOf<Shopping>()
var i = 0;
while (iterator.hasNext()) {
val value = iterator.next().value;
if (i % 2 == 0) {
first = value;
} else {
second = value;
shoppingList.add(Shopping(first, second))
first = ""
second = ""
}
i++
}
// Print Shopping List.
for (s in shoppingList) {
println(s)
}
// Output.
/*
Shopping(a="egg", b="dairy")
Shopping(a="chicken", b="meat")
Shopping(a="apple", b="fruit")
*/
data class Shopping(val a: String, val b: String)
Never a good idea to use regex to match parenthesis.
I would suggest a step-by-step approach:
You could first match the name and the value by
(\w+)=(.*)
There you get the name in group 1 and the value in group 2 without caring about any subsequent = characters that might appear in the value.
If you then want to split the value, I would get rid of start and end parenthesis first by matching by
(?<=\[\().*(?=\)\])
(or simply cut off the first and last two characters of the string, if it is always given it starts with [( and ends in )])
Then get the single list entries from splitting by
\),\(
(take care that the split operation also takes a regex, so you have to escape it)
And for each list entry you could split that simply by
,\s*
or, if you want the quote character to be removed, use a match with
\"(.*)\",\s*\"(.*)\"
where group 1 contains the key (left of equals sign) and group 2 the value (right of equals sign)

What is Diff between "contains" and "in" (Kotlin)

fun main() {
val input = "ABC"
val output = "ABC,"
println(input.contains(output,false))
print(input in (output))
}
Output :
false
true
I just checked that, in and contains are using same method, but why giving difference results.
The Kotlin contains function
Returns true if this char sequence contains the specified other sequence of characters as a substring.
Actually, you're checking if input.contains(output), and "ABC" does not contain "ABC,".
The correct syntax is
println(output.contains(input,false))

calling a double from object seems to return the possition not value of the double

In a function i have this:
val sunRise = SunEquation(2459622)
binding.timeDisplay.setText("$sunRise.n")
The SunEquation-Class looks like this:
class SunEquation(var jDate: Int,) {
val jYear = 2451545
val ttOffset = .0008
var n = jDate - jYear + ttOffset
}
the button- text that appears is:
com.example.soluna.SunEquation#6d1a94b.n
i would expect a double-value
You have to add curly brackets around the value you want to inject into the String, like this:
binding.timeDisplay.setText("${sunRise.n}")
The shorthand syntax without brackets only works for a single variable, but not
for access to a nested field or other more complex expressions.
In your case, this results in the object itself being injected into the String, which is resembled by com.example.soluna.SunEquation#6d1a94b based on the result of the corresponding toString() call, which defaults to the class name and the reference id of the object. Followed by the literal String .n.
Alternatively, you could extract the value into a val beforehand and reference that.
val customN = sunRise.n
binding.timeDisplay.setText("$customN")

Kotlin String substitution not working when string is read from file

I have written a code that reads a text file. The text files contain placeholders which I would like to replace. The substitution does not work this way and the string is printed with the placeholders. Here is the code that I have written for this:
class TestSub(val sub: Sub) {
fun create() = template()
fun template() = Files.newBufferedReader(ClassPathResource(templateId.location).file.toPath()).readText()
}
data class Sub(val name: String, val age: Int)
Here is the main function that tries to print the final string:
fun main(args: Array<String>) {
val sub = Sub("Prashant", 32)
println(TestSub(sub).create())
}
However, when, instead of reading a file, I use a String, the following code works (Replacing fun template())
fun template() = "<h1>Hello ${sub.name}. Your age is ${sub.age}</h1>"
Is there a way to make string Substitution work when reading the content of a file?
Kotlin does not support String templates from files. I.e. code like "some variable: $variable" gets compiled to "some variable: " + variable. String templates are handled at compile time, which means it does not work with text loaded from files, or if you do something else to get the String escaped into a raw form. Either way, it would, as danielspaniol mentioned, be a security threat.
That leaves three options:
String.format(str)
MessageFormat.format(str)
Creating a custom engine
I don't know what your file contains, but if it's the String you used in the template function, change it to:
<h1>Hello {0}. Your age is {1,integer}</h1>
This is for MessageFormat, which is my personal preference. If you use String.format, use %s instead, and the other appropriate formats.
Now, use that in MessageFormat.format:
val result = MessageFormat.format(theString, name, age);
Note that if you use MessageFormat, you'll need to escape ' as ''. See this.
String substitution using ${...} is part of the string literals syntax and works roughly like this
val a = 1
val b = "abc ${a} def" // gets translated to something like val b = "abc " + a + " def"
So there is no way for this to work when you load from a text file. This would also be a huge security risk as it would allow for arbitrary code execution.
However I assume that Kotlin has something like a sprintf function where you can have placeholders like %s in your string and you can replace them with values
Take a look here. It looks like the easiest way is to use String.format
You are looking for something similar to Kotlin String templates for raw Strings, where placeholders like $var or ${var} are substituted by values, but this functionality needs to be available at runtime (for text read from files).
Methods like String.format(str) or MessageFormat.format(str) use other formats than the notation with the dollar prefix of Kotlin String templates. For "Kotlin-like" placeholder substitution you could use the function below (which I developed for similar reasons). It supports placeholders as $var or ${var} as well as dollar escaping by ${'$'}
/**
* Returns a String in which placeholders (e.g. $var or ${var}) are replaced by the specified values.
* This function can be used for resolving templates at RUNTIME (e.g. for templates read from files).
*
* Example:
* "\$var1\${var2}".resolve(mapOf("var1" to "VAL1", "var2" to "VAL2"))
* returns VAL1VAL2
*/
fun String.resolve(values: Map<String, String>): String {
val result = StringBuilder()
val matcherSimple = "\\$([a-zA-Z_][a-zA-Z_0-9]*)" // simple placeholder e.g. $var
val matcherWithBraces = "\\$\\{([a-zA-Z_][a-zA-Z_0-9]*)}" // placeholder within braces e.g. ${var}
// match a placeholder (like $var or ${var}) or ${'$'} (escaped dollar)
val allMatches = Regex("$matcherSimple|$matcherWithBraces|\\\$\\{'(\\\$)'}").findAll(this)
var position = 0
allMatches.forEach {
val range = it.range
val placeholder = this.substring(range)
val variableName = it.groups.filterNotNull()[1].value
val newText =
if ("\${'\$'}" == placeholder) "$"
else values[variableName] ?: throw IllegalArgumentException("Could not resolve placeholder $placeholder")
result.append(this.substring(position, range.start)).append(newText)
position = range.last + 1
}
result.append(this.substring(position))
return result.toString()
}
String templates only work for compile-time Sting literals, while what u read from a file is generated at runtime.
What u need is a template engine, which can render templates with variables or models at runtime.
For simple cases, String.format or MessageFormat.format in Java would work.
And for complex cases, check thymeleaf, velocity and so on.

Strings concatenation in Spark SQL query

I'm experimenting with Spark and Spark SQL and I need to concatenate a value at the beginning of a string field that I retrieve as output from a select (with a join) like the following:
val result = sim.as('s)
.join(
event.as('e),
Inner,
Option("s.codeA".attr === "e.codeA".attr))
.select("1"+"s.codeA".attr, "e.name".attr)
Let's say my tables contain:
sim:
codeA,codeB
0001,abcd
0002,efgh
events:
codeA,name
0001,freddie
0002,mercury
And I would want as output:
10001,freddie
10002,mercury
In SQL or HiveQL I know I have the concat function available, but it seems Spark SQL doesn't support this feature. Can somebody suggest me a workaround for my issue?
Thank you.
Note:
I'm using Language Integrated Queries but I could use just a "standard" Spark SQL query, in case of eventual solution.
The output you add in the end does not seem to be part of your selection, or your SQL logic, if I understand correctly. Why don't you proceed by formatting the output stream as a further step ?
val results = sqlContext.sql("SELECT s.codeA, e.code FROM foobar")
results.map(t => "1" + t(0), t(1)).collect()
It's relatively easy to implement new Expression types directly in your project. Here's what I'm using:
case class Concat(children: Expression*) extends Expression {
override type EvaluatedType = String
override def foldable: Boolean = children.forall(_.foldable)
def nullable: Boolean = children.exists(_.nullable)
def dataType: DataType = StringType
def eval(input: Row = null): EvaluatedType = {
children.map(_.eval(input)).mkString
}
}
val result = sim.as('s)
.join(
event.as('e),
Inner,
Option("s.codeA".attr === "e.codeA".attr))
.select(Concat("1", "s.codeA".attr), "e.name".attr)