Dataframe processing generically using Scala - dataframe

This code below I understand and was helpful.
But I would like to make this a generic approach, but cannot actually get started, and think that it is not possible actually with the case statement. I am looking at another approach, but am interested if a generic approach is also possible here.
import spark.implicits._
import org.apache.spark.sql.Encoders
// Creating case classes with the schema of your json objects. We're making
// these to make use of strongly typed Datasets. Notice that the MyChgClass has
// each field as an Option: this will enable us to choose between "chg" and
// "before"
case class MyChgClass(b: Option[String], c: Option[String], d: Option[String])
case class MyFullClass(k: Int, b: String, c: String, d: String)
case class MyEndClass(id: Int, after: MyFullClass)
// Creating schemas for the from_json function
val chgSchema = Encoders.product[MyChgClass].schema
val beforeSchema = Encoders.product[MyFullClass].schema
// Your dataframe from the example
val df = Seq(
(1, """{"b": "new", "c": "new"}""", """{"k": 1, "b": "old", "c": "old", "d": "old"}""" ),
(2, """{"b": "new", "d": "new"}""", """{"k": 2, "b": "old", "c": "old", "d": "old"}""" )
).toDF("id", "chg", "before")
// Parsing the json string into our case classes and finishing by creating a
// strongly typed dataset with the .as[] method
val parsedDf = df
.withColumn("parsedChg",from_json(col("chg"), chgSchema))
.withColumn("parsedBefore",from_json(col("before"), beforeSchema))
.drop("chg")
.drop("before")
.as[(Int, MyChgClass, MyFullClass)]
// Mapping over our dataset with a lot of control of exactly what we want. Since
// the "chg" fields are options, we can use the getOrElse method to choose
// between either the "chg" field or the "before" field
val output = parsedDf.map{
case (id, chg, before) => {
MyEndClass(id, MyFullClass(
before.k,
chg.b.getOrElse(before.b),
chg.c.getOrElse(before.c),
chg.d.getOrElse(before.d)
))
}
}
output.show(false)
parsedDf.printSchema()
We have many such situations, but with differing payload. I can get the fields of the case class, but cannot see the forest for the trees how to make this generic. E,g, [T] type approach for the below. I am wondering if this can be done in fact?
I can get a List of attributes, and am wondering if something like attrList.map(x => ...) with substitution can be used for the chg.b etc?
val output = parsedDf.map{
case (id, chg, before) => {
MyEndClass(id, MyFullClass(
before.k,
chg.b.getOrElse(before.b),
chg.c.getOrElse(before.c),
chg.d.getOrElse(before.d)
))
}
}

Does the following macro work for your use case?
// libraryDependencies += scalaOrganization.value % "scala-reflect" % scalaVersion.value
import scala.language.experimental.macros
import scala.reflect.macros.blackbox
def mkInstance[A, B](before: A, chg: B): A = macro mkInstanceImpl[A]
def mkInstanceImpl[A: c.WeakTypeTag](c: blackbox.Context)(before: c.Tree, chg: c.Tree): c.Tree = {
import c.universe._
val A = weakTypeOf[A]
val classAccessors = A.decls.collect {
case m: MethodSymbol if m.isCaseAccessor => m
}
val arg = q"$before.${classAccessors.head}"
val args = classAccessors.tail.map(m => q"$chg.${m.name}.getOrElse($before.$m)")
q"new $A($arg, ..$args)"
}
// in a different subproject
val output = parsedDf.map{
case (id, chg, before) => {
MyEndClass(id,
mkInstance(before, chg)
)
}
}
// scalacOptions += "-Ymacro-debug-lite"
// scalac: new MyFullClass(before.k, chg.b.getOrElse(before.b), chg.c.getOrElse(before.c), chg.d.getOrElse(before.d))
https://scastie.scala-lang.org/bXq5FHb3QuC5PqlhZOfiqA
Alternatively you can use Shapeless
// libraryDependencies += "com.chuusai" %% "shapeless" % "2.3.10"
import shapeless.{Generic, HList, LabelledGeneric, Poly2}
import shapeless.ops.hlist.{IsHCons, Mapped, ZipWith}
import shapeless.ops.record.Keys
def mkInstance[A, B, L <: HList, H, T <: HList, OptT <: HList, L1 <: HList, T1 <: HList, T2 <: HList, K <: HList](
before: A, chg: B
)(implicit
// checking that field names in tail of A are equal to field names in B
aLabelledGeneric: LabelledGeneric.Aux[A, L1],
bLabelledGeneric: LabelledGeneric.Aux[B, T2],
isHCons1: IsHCons.Aux[L1, _, T1],
keys: Keys.Aux[T1, K],
keys1: Keys.Aux[T2, K],
// checking that field types in B are Options of field types in tail of A
aGeneric: Generic.Aux[A, L],
isHCons: IsHCons.Aux[L, H, T],
mapped: Mapped.Aux[T, Option, OptT],
bGeneric: Generic.Aux[B, OptT],
zipWith: ZipWith.Aux[OptT, T, getOrElsePoly.type, T],
): A = {
val aHList = aGeneric.to(before)
aGeneric.from(isHCons.cons(isHCons.head(aHList), zipWith(bGeneric.to(chg), isHCons.tail(aHList))))
}
object getOrElsePoly extends Poly2 {
implicit def cse[A]: Case.Aux[Option[A], A, A] = at(_ getOrElse _)
}
Since all the classes are now known at compile-time it's better to use compile-time reflection (macros themselves or macros hidden inside type classes as in Shapeless) but in principle runtime reflection also can be used
import scala.reflect.ClassTag
import scala.reflect.runtime.{currentMirror => rm}
import scala.reflect.runtime.universe._
def mkInstance[A: TypeTag : ClassTag, B: TypeTag : ClassTag](before: A, chg: B): A = {
val A = typeOf[A]
val B = typeOf[B]
val classAccessors = A.decls.collect {
case m: MethodSymbol if m.isCaseAccessor => m
}.toList
val arg = rm.reflect(before).reflectMethod(classAccessors.head)()
val args = classAccessors.tail.map(m =>
rm.reflect(chg).reflectMethod(B.decl(m.name).asMethod)()
.asInstanceOf[Option[_]].getOrElse(
rm.reflect(before).reflectMethod(m)()
)
)
rm.reflectClass(A.typeSymbol.asClass)
.reflectConstructor(A.decl(termNames.CONSTRUCTOR).asMethod)(arg :: args : _*)
.asInstanceOf[A]
}

Related

Error when trying to convert a list of objects in a string using reduce function

I am playing with kotlin language and I tried the following code:
data class D( val n: Int, val s: String )
val lst = listOf( D(1,"one"), D(2, "two" ) )
val res = lst.reduce { acc:String, d:D -> acc + ", " + d.toString() }
The last statement causes the following errors:
Expected parameter of type String
Expected parameter of type String
Type mismatch: inferred type is D but String was expected
while this version of the last statement works:
val res = lst.map { e -> e.toString() }.reduce { acc, el -> acc + ", " + el }
I do not understand why the first version does not work. The formal definition of the reduce function, found here, is the following:
inline fun <S, T : S> Iterable<T>.reduce(
operation: (acc: S, T) -> S
): S
But this seems in contrast with the following sentence, on the same page:
Accumulates value starting with the first element and applying
operation from left to right to current accumulator value and each
element.
That is, as explained here:
The difference between the two functions is that fold() takes an
initial value and uses it as the accumulated value on the first step,
whereas the first step of reduce() uses the first and the second
elements as operation arguments on the first step.
But, to be able to apply the operation on first and second element, and so on, it seems to me tha the operation shall have both arguments of the base type of the Iterable.
So, what am I missing ?
Reduce is not the right tool here. The best function in this case is joinToString:
listOf(D(1, "one"), D(2, "two"))
.joinToString(", ")
.let { println(it) }
This prints:
D(n=1, s=one), D(n=2, s=two)
reduce is not designed for converting types, it's designed for reducing a collection of elements to a single element of the same type. You don't want to reduce to a single D, you want a string. You could try implementing it with fold, which is like reduce but takes an initial element you want to fold into:
listOf(D(1, "one"), D(2, "two"))
.fold("") { acc, d -> "$acc, $d" }
.let { println(it) }
However, this will add an extra comma:
, D(n=1, s=one), D(n=2, s=two)
Which is exactly why joinToString exists.
You can see the definition to understand why its not working
To make it work, you can simply create an extension function:
fun List<D>.reduce(operation: (acc: String, D) -> String): String {
if (isEmpty())
throw UnsupportedOperationException("Empty list can't be reduced.")
var accumulator = this[0].toString()
for (index in 1..lastIndex) {
accumulator = operation(accumulator, this[index])
}
return accumulator
}
you can use it as:
val res = lst.reduce { acc:String, d:D -> acc + ", " + d.toString() }
or simply:
val res = lst.reduce { acc, d -> "$acc, $d" }
You can modify the function to be more generic if you want to.
TL;DR
Your code acc:String is already a false statement inside this line:
val res = lst.reduce { acc:String, d:D -> acc + ", " + d.toString() }
Because acc can only be D, never a String! Reduce returns the same type as the Iterable it is performed on and lst is Iterable<D>.
Explanation
You already looked up the definition of reduce
inline fun <S, T : S> Iterable<T>.reduce(
operation: (acc: S, T) -> S
): S
so lets try to put your code inside:
lst is of type List<D>
since List extends Iterable, we can write lst : Iterable<D>
reduce will look like this now:
inline fun <D, T : D> Iterable<T>.reduce(
operation: (acc: D, T) -> D //String is literally impossible here, because D is not a String
): S
and written out:
lst<D>.reduce { acc:D, d:D -> }

Chaining Lambdas - Kotlin

I'm working on a problem from the Kotlin Apprentice book in the Lambdas chapter. Here is the problem:
Using the same nameList list, first filter the list to contain only names which
have more than four characters in them, and then create the same concatenation
of names as in the above exercise. (Hint: you can chain these operations
together.)
How can I chain the two lambdas together? Below is my code with separate lines for the 2 lambdas.
fun main() {
val nameList = listOf("John", "Jacob", "Jingleheimer", "Schmidt")
val onlyLongNames = nameList.filter {it.length > 4}
val myNameToo = onlyLongNames.fold("") {a, b -> a + b}
println(myNameToo)
}
fun main() {
val nameList = listOf("John", "Jacob", "Jingleheimer", "Schmidt")
val myNameToo = nameList.filter {it.length > 4}.fold("") {a, b -> a + b}
println(myNameToo)
}

How to declare an abstract function in Inox

I'm proving certain properties on elliptic curves and for that I rely on some functions that deal with field operations. However, I don't want Inox to reason about the implementation of these functions but to just assume certain properties on them.
Say for instance I'm proving that the addition of points p1 = (x1,y1) and p2 = (x2,y2) is commutative. For implementing the addition of points I need a function that implements addition over its components (i.e. the elements of a field).
The addition will have the following shape:
val addFunction = mkFunDef(addID)() { case Seq() =>
val args: Seq[ValDef] = Seq("f1" :: F, "f2" :: F)
val retType: Type = F
val body: Seq[Variable] => Expr = { case Seq(f1,f2) =>
//do the addition for this field
}
(args, retType, body)
}
For this function I can state properties such as:
val addAssociative: Expr = forall("x" :: F, "y" :: F, "z" :: F){ case (x, y, z) =>
(x ^+ (y ^+ z)) === ((x ^+ y) ^+ z)
}
where ^+ is just the infix operator corresponding to add as presented in this other question.
What is a proper expression to insert in the body so that Inox does not assume anything on it while unrolling?
There are two ways you can go about this:
Use a choose statement in the body of addFunction:
val body: Seq[Variable] => Expr = {
choose("r" :: F)(_ => E(true))
}
During unrolling, Inox will simply replace the choose with a fresh
variables and assume the specified predicate (in this case true) on
this variable.
Use a first-class function. Instead of using add as a named function,
use a function-typed variables:
val add: Expr = Variable(FreshIdentifier("add"), (F, F) =>: F)
You can then specify your associativity property on add and prove the
relevant theorems.
In your case, it's probably better to go with the second option. The issue with proving things about an addFunction with a choose body is that you can't substitute add with some other function in the theorems you've shown about it. However, since the second option only shows things about a free variable, you can then instantiate your theorems with concrete function implementations.
Your theorem would then look something like:
val thm = forallI("add" :: ((F,F) =>: F)) { add =>
implI(isAssociative(add)) { isAssoc => someProperty }
}
and you can instantiate it through
val isAssocAdd: Theorem = ... /* prove associativity of concreteAdd */
val somePropertyForAdd = implE(
forallE(thm)(\("x" :: F, "y" :: F)((x,y) => E(concreteAdd)()(x, y))),
isAssocAdd
)

Pattern behind shapeless Aux classes

While studying shapeless and spray libraries, i've seen many inner Aux types, traits, objects and classes. It's not hard to understand that it is used for augmenting existing internal API, it looks much like a "companion object pattern" for factories and helper method. Example from HList sources:
trait Length[-L <: HList] {
type Out <: Nat
def apply() : Out
}
trait LengthAux[-L <: HList, N <: Nat] {
def apply() : N
}
object Length {
implicit def length[L <: HList, N <: Nat](implicit length : LengthAux[L, N]) = new Length[L] {
type Out = N
def apply() = length()
}
}
object LengthAux {
import Nat._
implicit def hnilLength = new LengthAux[HNil, _0] {
def apply() = _0
}
implicit def hlistLength[H, T <: HList, N <: Nat](implicit lt : LengthAux[T, N], sn : Succ[N]) = new LengthAux[H :: T, Succ[N]] {
def apply() = sn
}
}
In the case of Length for instance, the Length trait is the shape we are hoping to end up with, because it conveniently has the length encoded as a member, but that isn't a convenient form doing an implicit search. So an "Aux" class is introduced which takes the result parameter, named Out in the Length trait, and adds it to the type parameters of the LengthAux as N, which is the length. Once this result parameter is encoded into the actual type of the trait, we can search for LengthAux traits in implicit scope, knowing that if we find any with the L we are searching for, that this type will have the correct length as the N parameter.

Should there be an indicesWhere method on Scala's List class?

Scala's List classes have indexWhere methods, which return a single index for a List element which matches the supplied predicate (or -1 if none exists).
I recently found myself wanting to gather all indices in a List which matched a given predicate, and found myself writing an expression like:
list.zipWithIndex.filter({case (elem, _) => p(elem)}).map({case (_, index) => index})
where p here is some predicate function for selecting matching elements. This seems a bit of an unwieldy expression for such a simple requirement (but I may be missing a trick or two).
I was half expecting to find an indicesWhere function on List which would allow me to write instead:
list.indicesWhere(p)
Should something like this be part of the Scala's List API, or is there a much simpler expression than what I've shown above for doing the same thing?
Well, here's a shorter expression that removes some of the syntactic noise you have in yours (modified to use Travis's suggestion):
list.zipWithIndex.collect { case (x, i) if p(x) => i }
Or alternatively:
for ((x,i) <- list.zipWithIndex if p(x)) yield i
But if you use this frequently, you should just add it as an implicit method:
class EnrichedWithIndicesWhere[T, CC[X] <: Seq[X]](xs: CC[T]) {
def indicesWhere(p: T => Boolean)(implicit bf: CanBuildFrom[CC[T], Int, CC[Int]]): CC[Int] = {
val b = bf()
for ((x, i) <- xs.zipWithIndex if p(x)) b += i
b.result
}
}
implicit def enrichWithIndicesWhere[T, CC[X] <: Seq[X]](xs: CC[T]) = new EnrichedWithIndicesWhere(xs)
val list = List(1, 2, 3, 4, 5)
def p(i: Int) = i % 2 == 1
list.indicesWhere(p) // List(0, 2, 4)
You could use unzip to replace the map:
list.zipWithIndex.filter({case (elem, _) => p(elem)}).unzip._2