Mapping Slick query to default projection after modifying column value - sql

When creating a table query, I would like to modify my select statement by mapping the default table query. However, I cannot find a way to map the value of a column and still map to my case class
case class MyRecord(id: Int, name: String, value: Int)
class MyTable(tag: Tag) extends Table[MyRecord](tag, "MYTABLE") {
def id = column[Int]("id")
def name = column[String]("name")
def value = column[Int]("value")
def * = (id, name, value) <> (MyRecord.tupled, MyRecord.unapply)
}
lazy val tableQuery = TableQuery[MyTable]
I would like to trim the value of name with this function:
def trimLeading0: (Rep[String]) => Rep[String] = SimpleExpression.unary[String, String] {
(str, queryBuilder) =>
import slick.util.MacroSupport._
import queryBuilder._
b"TRIM(LEADING 0 FROM $str)"
}
Now I am at a loss about what to do here:
val trimmedTableQuery: Query[MyTable, MyRecord, Seq] = tableQuery.map(s => ???)
I have tried mapping the Rep like I would do with a case class:
val trimmedTableQuery = tableQuery.map(s => s.copy(name = trimLeading0(s.name)))
This refuses to compile with value copy is not a member of MyTable
My current workaround is to use a custom function instead of MyRecord.tupled for the default projection:
def trimming(t: (Int, String, Int)) = MyRecord(t._1, t._2.dropWhile(_ == "0"), t._3)
def * = (id, name, value) <> (trimming, MyRecord.unapply)
Alternatively, I could map the returned result of the DBIOAction returning a tuple to the case class, which is much less elegant:
val action = tableQuery.map{ s => (s.id, trimLeading0(s.name), s.value)}.result
val futureTuples: Future[Seq[(Int, String, Int)]] = db.run(action)
val records = futureTuples map (s => s.map(MyRecord.tupled))
But how can I do it inside the map method while building the query? OR would it be better to change the def name column description?

You can't mess with the default projection (i.e. def *) in MyTable as it needs to be symmetric. It's used for query and insert. But you can create a trimmedTableQuery based on a specialisation of MyTable with an overridden default projection. Then you can also have tableQuery based on the symmetric default projection. You will get an error if you try to do inserts based on the trimmedTableQuery (but you shouldn't need to do that, just use tableQuery for inserts).
lazy val tableQuery = TableQuery[MyTable]
lazy val trimmedTableQuery = new TableQuery(new MyTable(_) {
override def * = (id, trimLeading0(name), value) <> (MyRecord.tupled, MyRecord.unapply)
})

Related

How to use Lucene's DistinctValuesCollector?

My objective is to collect distinct values of select fields to provided them as filter options for the frontend. DistinctValuesCollector seems to be the tool for this, however since I haven't found code sample and documentation except for the Javadocs I can't currently correctly construct this collector. Can anyone provide an example?
This is my attempt which doesn't deliver the desired distinct values of the field PROJEKTSTATUS.name.
val groupSelector = TermGroupSelector(PROJEKTSTATUS.name)
val searchGroup = SearchGroup<BytesRef>()
val valueSelector = TermGroupSelector(PROJEKTSTATUS.name)
val groups = mutableListOf(searchGroup)
val distinctValuesCollector = DistinctValuesCollector(groupSelector, groups, valueSelector)
That field is indexed as follows:
document.add(TextField(PROJEKTSTATUS.name, aggregat.projektstatus, YES))
document.add(SortedDocValuesField(PROJEKTSTATUS.name, BytesRef(aggregat.projektstatus)))
Thanks to #andrewJames's hint to a test class I could figure it out:
fun IndexSearcher.collectFilterOptions(query: Query, field: String, topNGroups: Int = 128, mapper: Function<String?, String?> = Function { it }): Set<String?> {
val firstPassGroupingCollector = FirstPassGroupingCollector(TermGroupSelector(field), Sort(), topNGroups)
search(query, firstPassGroupingCollector)
val topGroups = firstPassGroupingCollector.getTopGroups(0)
val groupSelector = firstPassGroupingCollector.groupSelector
val distinctValuesCollector = DistinctValuesCollector(groupSelector, topGroups, groupSelector)
search(query, distinctValuesCollector)
return distinctValuesCollector.groups.map { mapper.apply(it.groupValue.utf8ToString()) }.toSet()
}

Room Relation Specify Column Names When Extending

I am working with some datasets in my room database, and my method is to have one table with information about a dataset called DatasetInfo which stores things like name, type of value stored, id, etc; and a second table where I store the values in 3 columns: (id, date, value). This ordered triplet is defined as a DatasetValue entity. Here, (date, value) is an ordered pair that I want to plot.
To plot these ordered pairs, I have to convert them to a list of Entry objects, where Entry takes the values x and y. It makes the most sense to query my database and simply ask for List<Entry>, because right now I ask for List<DatasetValue> and then I have to map that result to List<Entry> which is unnecessary.
I query for the dataset information table DatasetInfo as follows:
data class DatasetWithValues(
#Embedded
var datasetInfo: DatasetInfo,
#Relation(
parentColumn = DATASET_COLUMN_DATASET_ID,
entityColumn = VALUES_COLUMN_ID,
entity = DatasetValue::class,
)
var values : List<Entry>
)
Now, as I said above, Entry has values x and y, and Dataset calls them date and value. Of course, when I ask for this relation, it will fail because it doesn't know how to assign values from a table with the columns id, date, and value to an object which takes x and y. So, I define a new class:
class DatasetEntry(
#ColumnInfo(name = "date")
var date : Float,
#ColumnInfo(name = "value")
val value : Float
) : Entry(date, value)
and then make the following adjustment:
//var values : List<Entry>
var values : List<DatasetEntry>
That does nothing. The code doesn't compile because:
SQL error or missing database (no such column: x)
Well, what if I instead write:
class DatasetEntry(
#ColumnInfo(name = "date")
var date : Float,
#ColumnInfo(name = "value")
val value : Float
) : Entry(){
init{
x = date
y = value
}
}
That doesn't help either, same error. Even if I remove that init call, it still wants x.
The plot thickens, because inside of Entry I can see x is declared private. So I have absolutely no clue what is happening here. How does Room even know to look for x? Is there any work around for this other than renaming the columns in my table to x and y?
Is there any work around for this other than renaming the columns in my table to x and y?
If you have such option it would be the easiest. Still there are some options you could consider:
1. Mapping Room's result to needed one
So, you ask Room for some raw result and then map it to ready one. For that you add 2 classes:
data class DatasetWithValuesRaw(
#Embedded
var datasetInfo: DatasetInfo,
#Relation(
parentColumn = DATASET_COLUMN_DATASET_ID,
entityColumn = VALUES_COLUMN_ID,
)
var values : List<DatasetValue>
)
data class DatasetWithValuesReady(
var datasetInfo: DatasetInfo,
var values : List<Entry>
)
Let's say you have a dao method:
Query("select * ....")
fun getRawData(): List<DatasetWithValuesRaw>
For mapping you use:
fun getReadyData() =
getRawData().map { item ->
DatasetWithValuesReady(item.datasetInfo,
item.values.map { Entry(x = it.date, y = it.value)
}) }
2. Replacing Room's #Relation with explicit query
It's not what you really want, but still is an option.
Use class like that:
data class DatasetWithSeparateValues(
#Embedded
var datasetInfo: DatasetInfo,
#Embedded
var value : Entry // <----- it's not a list, just a single value
)
and in your dao you set query with explicit columns' names (x and y). Something like that:
Query("SELECT *, values.date as x, values.value as y FROM dataset LEFT JOIN values on dataset.DATASET_COLUMN_DATASET_ID = values.VALUES_COLUMN_ID")
fun getData(): List<DatasetWithSeparateValues>
As a result you'll get a list, but if there is a one dataset with 5 values you'll get inside list 5 items with the same dataset and separate values. After that you could use Kotlin collection's methods (groupBy for example) to prettify result in some way

VarcharType mismatch Spark dataframe

I'am trying to change the schema of a dataframe. every time i have a column of string type i want to change it's type to VarcharType(max) where max is the maximum lentgh of string in that column. i wrote the following code. ( i want to export the dataframe later to sql server and i don't want to have nvarchar in sql server so i'am trying to limit it on spark side )
val df = spark.sql(s"SELECT * FROM $tableName")
var l : List [StructField] = List()
val schema = df.schema
schema.fields.foreach(x => {
if (x.dataType == StringType) {
val dataColName = x.name
val maxLength = df.select(dataColName).reduce((x, y) => {
if (x.getString(0).length >= y.getString(0).length) {
x
} else {
y
}
}).getString(0).length
val dataType = VarcharType(maxLength)
l = l :+ StructField(dataColName, dataType)
} else {
l = l :+ x
}
})
val newSchema = StructType(l)
val newDf = spark.createDataFrame(df.rdd, newSchema)
However when running it i get this error.
20/01/22 15:29:44 ERROR ApplicationMaster: User class threw exception: scala.MatchError:
VarcharType(9) (of class org.apache.spark.sql.types.VarcharType)
scala.MatchError: VarcharType(9) (of class org.apache.spark.sql.types.VarcharType)
Can a dataframe column can be of type VarcharType(n) ?
The data mapping from a database to/from dataframe happens in the dialect class. For MS SQL server the class is org.apache.spark.sql.jdbc.MsSqlServerDialect. You can inherit from this and override getJDBCType to influence datatype mapping from a dataframe to a table. Then register your dialect for it to take effect.
I have done this for Oracle (not sqlserver), however it can be done similarly.
//Change this
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
case TimestampType => Some(JdbcType("DATETIME", java.sql.Types.TIMESTAMP))
case StringType => Some(JdbcType("NVARCHAR(MAX)", java.sql.Types.NVARCHAR))
case BooleanType => Some(JdbcType("BIT", java.sql.Types.BIT))
case _ => None
}
You can't use VarcharType because it is not a DataType. Also you can't check length of actual data because it is not exposed. You only have access to "dt: DataType", so you can set a default size for NVARCHAR if max is not acceptable.

How to pass in a map into UDF in spark

Here is my problem, I have a map of Map[Array[String],String], and I want to pass that into a UDF.
Here is my UDF:
def lookup(lookupMap:Map[Array[String],String]) =
udf((input:Array[String]) => lookupMap.lift(input))
And here is my Map variable:
val srdd = df.rdd.map { row => (
Array(row.getString(1),row.getString(5),row.getString(8)).map(_.toString),
row.getString(7)
)}
Here is how I call the function:
val combinedDF = dftemp.withColumn("a",lookup(lookupMap))(Array($"b",$"c","d"))
I first got an error about immutable array, so I changed my array into immutable type, then I got an error about type mismatch. I googled a bit, apparently I can't pass in non-column type directly into a UDF. Can somebody help? Kudos.
Update: So I did convert everything to a wrapped array. Here is what I did:
val srdd = df.rdd.map{row => (WrappedArray.make[String](Array(row.getString(1),row.getString(5),row.getString(8))),row.getString(7))}
val lookupMap = srdd.collectAsMap()
def lookup(lookupMap:Map[collection.mutable.WrappedArray[String],String]) = udf((input:collection.mutable.WrappedArray[String]) => lookupMap.lift(input))
val combinedDF = dftemp.withColumn("a",lookup(lookupMap))(Array($"b",$"c",$"d"))
Now I am having an error like this:
required: Map[scala.collection.mutable.WrappedArray[String],String]
-ksh: Map[scala.collection.mutable.WrappedArray[String],String]: not found [No such file or directory]
I tried to do something like this:
val m = collection.immutable.Map(1->"one",2->"Two")
val n = collection.mutable.Map(m.toSeq: _*)
but then I just got back to the error of column type.
First, you have to pass a Column as an argument of the UDF; Since you want this argument to be an array, you should use the array function in org.apache.spark.sql.functions, which creates an array Column from a series of other Columns. So the UDF call would be:
lookup(lookupMap)(array($"b",$"c",$"d"))
Now, since array columns are deserialized into mutable.WrappedArray, in order for the map lookup to succeed you'd best make sure that's the type used by your UDF:
def lookup(lookupMap: Map[mutable.WrappedArray[String],String]) =
udf((input: mutable.WrappedArray[String]) => lookupMap.lift(input))
So altogether:
import spark.implicits._
import org.apache.spark.sql.functions._
// Create an RDD[(mutable.WrappedArray[String], String)]:
val srdd = df.rdd.map { row: Row => (
mutable.WrappedArray.make[String](Array(row.getString(1), row.getString(5), row.getString(8))),
row.getString(7)
)}
// collect it into a map (I assume this is what you're doing with srdd...)
val lookupMap: Map[mutable.WrappedArray[String], String] = srdd.collectAsMap()
def lookup(lookupMap: Map[mutable.WrappedArray[String],String]) =
udf((input: mutable.WrappedArray[String]) => lookupMap.lift(input))
val combinedDF = dftemp.withColumn("a",lookup(lookupMap)(array($"b",$"c",$"d")))
Anna your code for srdd/lookupmap is of type org.apache.spark.rdd.RDD[(Array[String], String)]
val srdd = df.rdd.map { row => (
Array(row.getString(1),row.getString(5),row.getString(8)).map(_.toString),
row.getString(7)
)}
Where as in lookup method you are expecting a Map as a parameter
def lookup(lookupMap:Map[Array[String],String]) =
udf((input:Array[String]) => lookupMap.lift(input))
That is the reason why you are getting type mismatch error.
First make srdd from RDD[tuple] to a RDD[Map] and then try converting the RDD to Map to resolve this error.
val srdd = df.rdd.map { row => Map(
Array(row.getString(1),row.getString(5),row.getString(8)).map(_.toString) ->
row.getString(7)
)}

Slick 2 - Update columns in a table and return whole table object

How would you update a few columns in a table table while returning the entire updated table when using slick?
Assuming SomeTables is some TableQuery, you would typically write a query like this if you want to, for example, add an item to the table (and returning the newly added item)
val returnedItem = SomeTables returning SomeTables += someTable
How would you do the same if you want to update an item and return the whole back the whole item, I suspect you would do something like this
val q = SomeTables.filter(_.id === id).map(x => (x.someColumn,x.anotherColumn)) returning SomeTables
val returnedItem = q.update((3,"test"))
The following code however does not work, and I can't see any documentation on how to do this
Note that I am aware you can just query the item beforehand, update it, and then use copy on the original object, however this requires a lot of boilerplate (and DB trips as well)
This feature is not supported in Slick (v2 or v3-M1); although I don't see any specific reason prohibiting it's implementation, UPDATE ... RETURNING is not a standard SQL feature (for example, H2 does not support it: http://www.h2database.com/html/grammar.html#update). I'll leave as an exercise to the reader to explore how one might safely and efficiently emulate the feature for RDBMSes lacking UDPATE ... RETURNING.
When you call "returning" on a scala.slick.lifted.Query, it gives you a JdbcInsertInvokerComponent$ReturningInsertInvokerDef. You'll find no update method, although there is an insertOrUpdate method; however, insertOrUpdate only returns the returning expression result if an insert occurs, None is returned for updates, so no help here.
From this we can conclude that if you want to use the UPDATE ... RETURNING SQL feature, you'll either need to use StaticQuery or roll your own patch to Slick. You can manually write your queries (and re-implement your table projections as GetResult / SetParameter serializers), or you can try this snippet of code:
package com.spingo.slick
import scala.slick.driver.JdbcDriver.simple.{queryToUpdateInvoker, Query}
import scala.slick.driver.JdbcDriver.{updateCompiler, queryCompiler, quoteIdentifier}
import scala.slick.jdbc.{ResultConverter, CompiledMapping, JdbcBackend, JdbcResultConverterDomain, GetResult, SetParameter, StaticQuery => Q}
import scala.slick.util.SQLBuilder
import slick.ast._
object UpdateReturning {
implicit class UpdateReturningInvoker[E, U, C[_]](updateQuery: Query[E, U, C]) {
def updateReturning[A, F](returningQuery: Query[A, F, C], v: U)(implicit session: JdbcBackend#Session): List[F] = {
val ResultSetMapping(_,
CompiledStatement(_, sres: SQLBuilder.Result, _),
CompiledMapping(_updateConverter, _)) = updateCompiler.run(updateQuery.toNode).tree
val returningNode = returningQuery.toNode
val fieldNames = returningNode match {
case Bind(_, _, Pure(Select(_, col), _)) =>
List(col.name)
case Bind(_, _, Pure(ProductNode(children), _)) =>
children map { case Select(_, col) => col.name } toList
case Bind(_, TableExpansion(_, _, TypeMapping(ProductNode(children), _, _)), Pure(Ref(_), _)) =>
children map { case Select(_, col) => col.name } toList
}
implicit val pconv: SetParameter[U] = {
val ResultSetMapping(_, compiled, CompiledMapping(_converter, _)) = updateCompiler.run(updateQuery.toNode).tree
val converter = _converter.asInstanceOf[ResultConverter[JdbcResultConverterDomain, U]]
SetParameter[U] { (value, params) =>
converter.set(value, params.ps)
}
}
implicit val rconv: GetResult[F] = {
val ResultSetMapping(_, compiled, CompiledMapping(_converter, _)) = queryCompiler.run(returningNode).tree
val converter = _converter.asInstanceOf[ResultConverter[JdbcResultConverterDomain, F]]
GetResult[F] { p => converter.read(p.rs) }
}
val fieldsExp = fieldNames map (quoteIdentifier) mkString ", "
val sql = sres.sql + s" RETURNING ${fieldsExp}"
val unboundQuery = Q.query[U, F](sql)
unboundQuery(v).list
}
}
}
I'm certain the above can be improved; I've written it based on my somewhat limited understanding of Slick internals, and it works for me and can leverage the projections / type-mappings you've already defined.
Usage:
import com.spingo.slick.UpdateReturning._
val tq = TableQuery[MyTable]
val st = tq filter(_.id === 1048003) map { e => (e.id, e.costDescription) }
st.updateReturning(tq map (identity), (1048003, Some("such cost")))