Picking max item by column from group by in Slick 3.x - sql

I'm trying to write a Slick query to find the "max" element within a group and then continue querying based on that result, however I'm getting a massive error when I try what I thought was the obvious way:
val articlesByUniqueLink = for {
(link, groupedArticles) <- historicArticles.groupBy(_.link)
latestPerLink <- groupedArticles.sortBy(_.pubDate.desc).take(1)
} yield latestPerLink
Since this doesn't seem to work, I'm wondering if there's some other way to find the "latest" element out of "groupedArticles" above, assuming these come from an Articles table with a pubDate Timestamp and a link that can be duplicated. I'm effectively looking for HAVING articles.pub_date = max(articles.pub_date).
The other equivalent way to express it yields the same result:
val articlesByUniqueLink = for {
(link, groupedArticles) <- historicArticles.groupBy(_.link)
latestPerLink <- groupedArticles.filter(_.pubDate === groupedArticles.map(_.pubDate).max.get)
} yield latestPerLink
[SlickTreeException: Unreachable reference to s2 after resolving monadic joins + 50 lines of Slick node trees.

The best way I found to get max or min or etc. per group in Slick is to use self join on grouping result:
val articlesByUniqueLink = for {
(article, _) <- historicArticles join historicArticles.groupBy(_.link)
.map({case (link, group) => (link, group.map(_.pubDate).max)})
on ((article, tuple) => article.link === tuple._1 &&
article.pubDate === tuple._2)
} yield article
If there is possible to produce duplicates with on condition, just drop duplicates like this after.

Related

filter spark dataframe using udf

I have a student dataframe:
var student = Seq(("h123","078","Ryan"),("h789","078","John"),("h456","ad0","Mike")).toDF("id","div","name")
now I want to filter student on div column based on some logic, for this example assume only 078 value should be present.
For this, I have a udf defined as:
val filterudf = udf((div: String) => div == "078")
currently, I am using following approach to get my work done
val allowedDivs = student.select(col("div")).distinct().filter(filterudf(col("div")))
.collectAsList().asScala.map(row => row.getAs[String](0)).toList
val resultDF = student.filter(col("div").isInCollection(allowedDivs))
The actual table where I have to apply this filter is huge and in order to improve the performance I want to use spark.sql query to get benefit from codgen and other Tungsten optimizations.
This is want I have come to, but this query is not working
filterudf.registerTemplate("filterudf")
val resultDF = spark.sql("select * from student where div in (filterudf(select distinct div from student).div)")
Any help is appreciated.

play-slick scala many to many

I have an endpoint lets say /order/ where i can send json object(my order), which contains some products etc, so my problem is i have to first save the order and wait for the order id back from the db and then save my products with this new order id( we are talking many to many relation thats why theres another table)
Consider this controller method
def postOrder = Action(parse.json[OrderRest]) { req => {
Created(Json.toJson(manageOrderService.insertOrder(req.body)))
}
}
this is how my repo methods look like
def addOrder(order: Order) = db.run {
(orders returning orders) += order
}
how can i chain db.runs to first insert order, get order id and then insert my products with this order id i just got?
im thinking about putting some service between my controller and repo, and managing those actions there, but i have no idea where to start
You can use for to chain database operations. Here is an example of adding a table to a db by adding a header row to represent the table and then adding the data rows. In this case it is a simple table containing (age, value).
/** Add a new table to the database */
def addTable(name: String, table: Seq[(Int, Int)]) = {
val action = for {
key <- (Headers returning Headers.map(_.tableId)) += HeadersRow(0, name)
_ <- Values ++= table.map { case (age, value) => ValuesRow(key, age, value) }
} yield key
db.run(action.transactionally)
}
This is cut down from the working code, but it should give the idea of how to do what you want. The first for statement would generate the order id and then the second statement would add the order with that order id.
This is done transactionally so that the new order will not be created unless the order data is valid (in database terms).

Union between optional and non-optional tables

I have two queries that select records where a union needs to be taken, one of which is a left join and one of which is a regular (i.e. inner) join.
Here's the left join case:
def regularAccountRecords = for {
(customer, account) <- customers joinLeft accounts on (_.accountId === _.accountId) // + some other special conditions
} yield (customer, account)
Here's the regular join case:
def specialAccountRecords = for {
(customer, account) <- customers join accounts on (_.accountId === _.accountId) // + some other special conditions
} yield (customer, account)
Now I want to take a union of the two record sets:
regularAccountRecords ++ specialAccountRecords
Obviously this doesn't work because in the regular join case it returns Query[(Customer, Account),...] and in the left join case it returns Query[(Customer, Rep[Option[Account]]),...] and this results in a Type Mismatch error.
Now, If this were a regular column type (e.g. Rep[String]) I could convert it to an optional via the ? operator (i.e. record.?) and get Rep[Option[String]] but using it on a table (i.e. the accounts table) causes:
Error:(62, 85) value ? is not a member of com.test.Account
How do I work around this issue and do the union properly?
Okay, looks like this is what the '?' projection is for but I didn't realize it because I disabled the optionEnabled option in the Codegen. Here's what your codegen extension is supposed to look like:
class MyCodegen extends SourceCodeGenerator(inputModel) {
override def TableClass = new TableClassDef {
override def optionEnabled = true
}
}
Alternatively, you can use implicit classes to tack this thing onto the generated TableClass yourself. Here is how that would look:
implicit class AccountExtensions(account:Account) {
def ? = (Rep.Some(account.id), account.name).shaped.<>({r=>r._1.map(_=> Account.tupled((r._2, r._1.get)))}, (_:Any) => throw new Exception("Inserting into ? projection not supported."))
}
NOTE: be sure to check the field ordering, depending on how this
projection is done, the union query might put the ID field in the wrong
place in the output, use
println(query.result.statements.headOption) to debug the output
SQL to be sure.
Once you do that, you will be able to use account.? in the yield statement:
def specialAccountRecords = for {
(customer, account) <- customers join accounts on (_.accountId === _.accountId)
} yield (customer, account.?)
...and then you will be able to unionize the tables correctly
regularAccountRecords ++ specialAccountRecords
I really wish the Slick people would put a note on how the '?' projection is useful in the documentation beyond the vague statement 'useful for outer joins'.

Raven DB Count Queries

I have a need to get a Count of Documents in a particular collection :
There is an existing index Raven/DocumentCollections that stores the Count and Name of the collection paired with the actual documents belonging to the collection. I'd like to pick up the count from this index if possible.
Here is the Map-Reduce of the Raven/DocumentCollections index :
from doc in docs
let Name = doc["#metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }
On a side note, var Count = DocumentSession.Query<Post>().Count(); always returns 0 as the result for me, even though clearly there are 500 odd documents in my DB atleast 50 of them have in their metadata "Raven-Entity-Name" as "Posts". I have absolutely no idea why this Count query keeps returning 0 as the answer - Raven logs show this when Count is done
Request # 106: GET - 0 ms - TestStore - 200 - /indexes/dynamic/Posts?query=&start=0&pageSize=1&aggregation=None
For anyone still looking for the answer (this question was posted in 2011), the appropriate way to do this now is:
var numPosts = session.Query<Post>().Count();
To get the results from the index, you can use:
session.Query<Collection>("Raven/DocumentCollections")
.Where(x=>x.Name == "Posts")
.FirstOrDefault();
That will give you the result you want.

How can I recreate this complex SQL Query using NHibernate QueryOver?

Imagine the following (simplified) database layout:
We have many "holiday" records that relate to going to a particular Accommodation on a certain date etc.
I would like to pull from the database the "best" holiday going to each accommodation (i.e. lowest price), given a set of search criteria (e.g. duration, departure airport etc).
There will be multiple records with the same price, so then we need to choose by offer saving (descending), then by departure date ascending.
I can write SQL to do this that looks like this (I'm not saying this is necessarily the most optimal way):
SELECT *
FROM Holiday h1 INNER JOIN (
SELECT h2.HolidayID,
h2.AccommodationID,
ROW_NUMBER() OVER (
PARTITION BY h2.AccommodationID
ORDER BY OfferSaving DESC
) AS RowNum
FROM Holiday h2 INNER JOIN (
SELECT AccommodationID,
MIN(price) as MinPrice
FROM Holiday
WHERE TradeNameID = 58001
/*** Other Criteria Here ***/
GROUP BY AccommodationID
) mp
ON mp.AccommodationID = h2.AccommodationID
AND mp.MinPrice = h2.price
WHERE TradeNameID = 58001
/*** Other Criteria Here ***/
) x on h1.HolidayID = x.HolidayID and x.RowNum = 1
As you can see, this uses a subquery within another subquery.
However, for several reasons my preference would be to achieve this same result in NHibernate.
Ideally, this would be done with QueryOver - the reason being that I build up the search criteria dynamically and this is much easier with QueryOver's fluent interface. (I had started out hoping to use NHibernate Linq, but unfortunately it's not mature enough).
After a lot of effort (being a relative newbie to NHibernate) I was able to re-create the very inner query that fetches all accommodations and their min price.
public IEnumerable<HolidaySearchDataDto> CriteriaFindAccommodationFromPricesForOffers(IEnumerable<IHolidayFilter<PackageHoliday>> filters, int skip, int take, out bool hasMore)
{
IQueryOver<PackageHoliday, PackageHoliday> queryable = NHibernateSession.CurrentFor(NHibernateSession.DefaultFactoryKey).QueryOver<PackageHoliday>();
queryable = queryable.Where(h => h.TradeNameId == website.TradeNameID);
var accommodation = Null<Accommodation>();
var accommodationUnit = Null<AccommodationUnit>();
var dto = Null<HolidaySearchDataDto>();
// Apply search criteria
foreach (var filter in filters)
queryable = filter.ApplyFilter(queryable, accommodationUnit, accommodation);
var query1 = queryable
.JoinQueryOver(h => h.AccommodationUnit, () => accommodationUnit)
.JoinQueryOver(h => h.Accommodation, () => accommodation)
.SelectList(hols => hols
.SelectGroup(() => accommodation.Id).WithAlias(() => dto.AccommodationId)
.SelectMin(h => h.Price).WithAlias(() => dto.Price)
);
var list = query1.OrderByAlias(() => dto.Price).Asc
.Skip(skip).Take(take+1)
.Cacheable().CacheMode(CacheMode.Normal).List<object[]>();
// Cacheing doesn't work this way...
/*.TransformUsing(Transformers.AliasToBean<HolidaySearchDataDto>())
.Cacheable().CacheMode(CacheMode.Normal).List<HolidaySearchDataDto>();*/
hasMore = list.Count() == take;
var dtos = list.Take(take).Select(h => new HolidaySearchDataDto
{
AccommodationId = (string)h[0],
Price = (decimal)h[1],
});
return dtos;
}
So my question is...
Any ideas on how to achieve what I want using QueryOver, or if necessary Criteria API?
I'd prefer not to use HQL but if it is necessary than I'm willing to see how it can be done with that too (it makes it harder (or more messy) to build up the search criteria though).
If this just isn't doable using NHibernate, then I could use a SQL query. In which case, my question is can the SQL be improved/optimised?
I have manage to achieve such dynamic search criterion by using Criteria API's. Problem I ran into was duplicates with inner and outer joins and especially related to sorting and pagination, and I had to resort to using 2 queries, 1st query for restriction and using the result of 1st query as 'in' clause in 2nd creteria.