Slick Plain Sql Generic Return Type - sql

I am trying to write a configurable sql query executor using Slick. User provides a prepared statement with ? and at run time the exact query is formed by replacing ? with values.
Generally this is how one would run a plain sql query using slick.
val query = sql"#$queryString".as[(String,Int)]
In my case i would not know the result type so i want to get back a generic result type. Maybe a List of Tuples with each tuple representing a row of result SET.
Any ideas on how this would be done?

I found a solution from one of the scala git issues. Here it is
ResultMap extends GetResult[Map[String, Any]] {
def apply(pr: PositionedResult) = {
val resultSet = pr.rs
val metaData = resultSet.getMetaData();
(1 to pr.numColumns).map { i =>
metaData.getColumnName(i) -> resultSet.getObject(i)
}.toMap
}
and then we can simply do val query = sql"#$queryString".as(ResultMap)
Hope it helps!!

Related

Slick plain sql query with pagination

I have something like this, using Akka, Alpakka + Slick
Slick
.source(
sql"""select #${onlyTheseColumns.mkString(",")} from #${dbSource.table}"""
.as[Map[String, String]]
.withStatementParameters(rsType = ResultSetType.ForwardOnly, rsConcurrency = ResultSetConcurrency.ReadOnly, fetchSize = batchSize)
.transactionally
).map( doSomething )...
I want to update this plain sql query with skipping the first N-th element.
But that is very DB specific.
Is is possible to get the pagination bit generated by Slick? [like for type-safe queries one just do a drop, filter, take?]
ps: I don't have the Schema, so I cannot go the type-safe way, just want all tables as Map, filter, drop etc on them.
ps2: at akka level, the flow.drop works, but it's not optimal/slow, coz it still consumes the rows.
Cheers
Since you are using the plain SQL, you have to provide a workable SQL in code snippet. Plain SQL may not type-safe, but agile.
BTW, the most optimal way is to skip N-th element by Database, such as limit in mysql.
depending on your database engine, you could use something like
val page = 1
val pageSize = 10
val query = sql"""
select #${onlyTheseColumns.mkString(",")}
from #${dbSource.table}
limit #${pageSize + 1}
offset #${pageSize * (page - 1)}
"""
the pageSize+1 part tells you whether the next page exists
I want to update this plain sql query with skipping the first N-th element. But that is very DB specific.
As you're concerned about changing the SQL for different databases, I suggest you abstract away that part of the SQL and decide what to do based on the Slick profile being used.
If you are working with multiple database product, you've probably already abstracted away from any specific profile, perhaps using JdbcProfile. In that case you could place your "skip N elements" helper in a class and use the active slickProfile to decide on the SQL to use. (As an alternative you could of course check via some other means, such as an environment value you set).
In practice that could be something like this:
case class Paginate(profile: slick.jdbc.JdbcProfile) {
// Return the correct LIMIT/OFFSET SQL for the current Slick profile
def page(size: Int, firstRow: Int): String =
if (profile.isInstanceOf[slick.jdbc.H2Profile]) {
s"LIMIT $size OFFSET $firstRow"
} else if (profile.isInstanceOf[slick.jdbc.MySQLProfile]) {
s"LIMIT $firstRow, $size"
} else {
// And so on... or a default
// Danger: I've no idea if the above SQL is correct - it's just placeholder
???
}
}
Which you could use as:
// Import your profile
import slick.jdbc.H2Profile.api._
val paginate = Paginate(slickProfile)
val action: DBIO[Seq[Int]] =
sql""" SELECT cols FROM table #${paginate.page(100, 10)}""".as[Int]
In this way, you get to isolate (and control) RDBMS-specific SQL in one place.
To make the helper more usable, and as slickProfile is implicit, you could instead write:
def page(size: Int, firstRow: Int)(implicit profile: slick.jdbc.JdbcProfile) =
// Logic for deciding on SQL goes here
I feel obliged to comment that using a splice (#$) in plain SQL opens you to SQL injection attacks if any of the values are provided by a user.

Need help on using Spark Filter

I am new in Apache spark, need help in forming either SQL query or spark filter on dataframe.
Below is how my data is formed, i.e. i have large amount of users which contains below data.
{ "User1":"Joey", "Department": ["History","Maths","Geography"] }
I have multiple search conditions like below ones, wherein i need to search array of data based on operator defined by user say for example may be and / or.
{
"SearchCondition":"1",
"Operator":"and",
"Department": ["Maths","Geography"]
}
Can point me to a path of how to achieve this in spark ?
Thanks,
-Jack
I assume you use Scala and you have parsed the data in a DataFrame
val df = spark.read.json(pathToFile)
I would use DataSets for this because they provide type safety
case class User(department: Array[String], user1: String)
val ds = df.as[User]
def pred(user: User): Boolean = Set("Geography","Maths")subsetOf(user.department.toSet)
ds.filter(pred _)
You can read more about DataSets here and here.
If you prefer to use Dataframes you can do it with user defined functions
import org.apache.spark.sql.functions._
val pred = udf((arr: Seq[String]) => Set("Geography","Maths")subsetOf(arr.toSet))
df.filter(pred($"Department"))
At the same package you can find a spark built-in function for this. You can do
df.filter(array_contains($"Department", "Maths")).filter(array_contains($"Department", "Geography"))
but someone could argue that this is not so efficient and the optimizer can`t improve it a lot.
Note that for each search condition you need a different predicate.

Trouble iterating over resultset and generating json

I am struggling a bit trying to generate some JSON from a query I execute. All of the groovy JsonBuilder examples I've looked at only seem to deal with statically defining a dataset.
code:
def db = new Sql(datasource)
def builder = new JsonBuilder()
db.eachRow('SELECT t.day, t.start FROM mytable') { row ->
builder.days {
day(
date row.day
)
}
}
println builder.toString()
I had it at 1 point where it was printing only the last value in the resultset out.
Currently I am receiving the following error:
unexpected token: $ # line 46, column 18.
date row.day
I'm still a bit of a novice at groovy, any help greatly appreciated.
I generally prefer to present JsonBuilder with a complete object rather than use the DSL, so my solution would look something like this:
def map = [days:[]]
def db = new Sql(dataSource)
db.eachRow('SELECT t.day, t.start FROM mytable') { row ->
map.days << [day : [date: row.day]]
}
println new JsonBuilder(map).toString()
If you have a large number of results, this approach has the advantage of not forcing you to compile a huge list of GroovyRowResult objects, only a huge list of much smaller LinkedHashMap objects.
The builder there does not opening a list, add items, then close it. you would have to provide it in a single go. E.g. collect all rows as maps:
def builder = new groovy.json.JsonBuilder()
def dbresult = [1,2,3]
builder {
days dbresult.collect{
[day: [date: it]]
}
}
println builder
Use db.rows to get the list. You might have to try, what is happening, if you just send in the result. Maybe needs a cast to a Map or you have to do the mapping yourself.
If your rowcount is very high, you might be better off some other library, that don't need you to manifest the list beforehand.

Creating User Defined Function in Spark-SQL

I am new to spark and spark sql and i was trying to query some data using spark SQL.
I need to fetch the month from a date which is given as a string.
I think it is not possible to query month directly from sparkqsl so i was thinking of writing a user defined function in scala.
Is it possible to write udf in sparkSQL and if possible can anybody suggest the best method of writing an udf.
You can do this, at least for filtering, if you're willing to use a language-integrated query.
For a data file dates.txt containing:
one,2014-06-01
two,2014-07-01
three,2014-08-01
four,2014-08-15
five,2014-09-15
You can pack as much Scala date magic in your UDF as you want but I'll keep it simple:
def myDateFilter(date: String) = date contains "-08-"
Set it all up as follows -- a lot of this is from the Programming guide.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
// case class for your records
case class Entry(name: String, when: String)
// read and parse the data
val entries = sc.textFile("dates.txt").map(_.split(",")).map(e => Entry(e(0),e(1)))
You can use the UDF as part of your WHERE clause:
val augustEntries = entries.where('when)(myDateFilter).select('name, 'when)
and see the results:
augustEntries.map(r => r(0)).collect().foreach(println)
Notice the version of the where method I've used, declared as follows in the doc:
def where[T1](arg1: Symbol)(udf: (T1) ⇒ Boolean): SchemaRDD
So, the UDF can only take one argument, but you can compose several .where() calls to filter on multiple columns.
Edit for Spark 1.2.0 (and really 1.1.0 too)
While it's not really documented, Spark now supports registering a UDF so it can be queried from SQL.
The above UDF could be registered using:
sqlContext.registerFunction("myDateFilter", myDateFilter)
and if the table was registered
sqlContext.registerRDDAsTable(entries, "entries")
it could be queried using
sqlContext.sql("SELECT * FROM entries WHERE myDateFilter(when)")
For more details see this example.
In Spark 2.0, you can do this:
// define the UDF
def convert2Years(date: String) = date.substring(7, 11)
// register to session
sparkSession.udf.register("convert2Years", convert2Years(_: String))
val moviesDf = getMoviesDf // create dataframe usual way
moviesDf.createOrReplaceTempView("movies") // 'movies' is used in sql below
val years = sparkSession.sql("select convert2Years(releaseDate) from movies")
In PySpark 1.5 and above, we can easily achieve this with builtin function.
Following is an example:
raw_data =
[
("2016-02-27 23:59:59", "Gold", 97450.56),
("2016-02-28 23:00:00", "Silver", 7894.23),
("2016-02-29 22:59:58", "Titanium", 234589.66)]
Time_Material_revenue_df =
sqlContext.createDataFrame(raw_data, ["Sold_time", "Material", "Revenue"])
from pyspark.sql.functions import *
Day_Material_reveneu_df = Time_Material_revenue_df.select(to_date("Sold_time").alias("Sold_day"), "Material", "Revenue")

LINQ display row numbers

I simply want to include a row number against the returned results of my query.
I found the following post that describes what I am trying to achieve but gives me an exception
http://vaultofthoughts.net/LINQRowNumberColumn.aspx
"An expression tree may not contain an assignment operator"
In MS SQL I would just use the ROWNUMBER() function, I'm simply looking for the equivalent in LINQ.
Use AsEnumerable() to evaluate the final part of your query on the client, and in that final part add a counter column:
int rowNo = 0;
var results = (from data in db.Data
// Add any processing to be performed server side
select data)
.AsEnumerable()
.Select(d => new { Data = d, Count = ++rowNo });
I'm not sure whether LINQ to SQL supports it (but it propably will), but there's an overload to the Queryable.Select method that accepts an lambda with an indexer. You can write your query as follows:
db.Authors.Select((author, index) => new
{
Lp = index, Name = author.Name
});
UPDATE:
I ran a few tests, but unfortunately LINQ to SQL does not support this overload (both 3.5sp1 and 4.0). It throws a NotSupportedException with the message:
Unsupported overload used for query
operator 'Select'.
LINQ to SQL allows you to map a SQL function. While I've not tested this, I think this construct will work:
public partial class YourDataContext : DatContext
{
[Function(Name = "ROWNUMBER")]
public int RowNumber()
{
throw InvalidOperationException("Not called directly.");
}
}
And write a query as follows:
from author in db.Authors
select new { Lp = db.RowNumber(), Name = author.Name };