Joiners with filtering performs very slowly

Joiners with filtering performs very slowly - optaplanner

I have a constraint with some joiners but the performance are very poor. Is it a way to improve it ?
I need to have the count of WorkingDay ( with ::hasPermission ) within the previous four days of the current day analyzed.
Here is my current constraint :
private Constraint fiveConsecutiveWorkingDaysMax(ConstraintFactory constraintFactory) {
return constraintFactory
.from(WorkingDay.class)
.filter(WorkingDay::hasPermission)
.join(WorkingDay.class,
Joiners.equal(WorkingDay::hasPermission),
Joiners.equal(WorkingDay::getAgent),
Joiners.filtering((wd1, wd2) -> {
LocalDate fourDaysBefore = wd1.getDayJava().minusDays(4);
Boolean wd2IsBeforeWd1 = wd2.getDayJava().isBefore(wd1.getDayJava());
Boolean wd2IsAfterFourDaysBeforeWd1 = wd2.getDayJava().compareTo(fourDaysBefore) >= 0;
return (wd2IsBeforeWd1 && wd2IsAfterFourDaysBeforeWd1);
}))
.groupBy((wd1, wd2) -> wd2, ConstraintCollectors.countBi())
.filter((wd2, count) -> count >= 4)
.penalizeConfigurable(FIVE_CONSECUTIVE_WORKING_DAYS_MAX);
}
Thanx for your help

There is potential for improvement here. First, we pre-filter the right hand side of the join to reduce the size of the cartesian product:
return constraintFactory
.forEach(WorkingDay.class)
.filter(WorkingDay::hasPermission)
.join(constraintFactory.forEach(WorkingDay.class)
.filter(WorkingDay::hasPermission),
Joiners.equal(WorkingDay::getAgent),
Joiners.filtering((wd1, wd2) -> {
LocalDate fourDaysBefore = wd1.getDayJava().minusDays(4);
Boolean wd2IsBeforeWd1 = wd2.getDayJava().isBefore(wd1.getDayJava());
Boolean wd2IsAfterFourDaysBeforeWd1 = wd2.getDayJava().compareTo(fourDaysBefore) >= 0;
return (wd2IsBeforeWd1 && wd2IsAfterFourDaysBeforeWd1);
}))
...
This has the added benefit of simplifying the index as it removes one equals joiner. Next, part of the filter can be replaced by a joiner as well:
return constraintFactory
.forEach(WorkingDay.class)
.filter(WorkingDay::hasPermission)
.join(constraintFactory.forEach(WorkingDay.class)
.filter(WorkingDay::hasPermission),
Joiners.equal(WorkingDay::getAgent),
Joiners.greaterThan(wd -> wd.getDayJava()),
Joiners.filtering((wd1, wd2) -> {
LocalDate fourDaysBefore = wd1.getDayJava().minusDays(4);
Boolean wd2IsAfterFourDaysBeforeWd1 = wd2.getDayJava().compareTo(fourDaysBefore) >= 0;
return wd2IsAfterFourDaysBeforeWd1;
}))
...
Finally, the method does needless boxing of boolean into Boolean, wasting CPU cycles and memory. This is a micro-optimization, but if the filter happens often enough, the benefit will be measurable.
A constraint refactored like this should perform better. That said, large joins are still going to take considerable time and the only way to work around that is to figure out a way to make them smaller.
Also, as Geoffrey said, I'd consider penalizing by the actual count, as what you have here is a textbook example of a score trap.

I don't see why this should be slow. Except maybe because the Cartesian Product explodes for a long time window. How many days is your time window?
Do note that the nurse rostering example has a totally different approach to detecting consecutive working days, using a custom collector. You might want to look at that in optaplanner-examples.

Related

How to optimize querying multiple unrelated tables in SQLite?

I have scenario when I have to iterate through multiple tables in quite big sqlite database. In tables I store informations about planet position on sky through years. So e.g. for Mars I have tables Mars_2000, Mars_2001 and so on. Table structure is always the same:
|id:INTEGER|date:TEXT|longitude:REAL|
Thing is that for certain task I need to iterate through this tables, which cost much time (for more than 10 queries it's painful).
I suppose that if I merge all tables with years to one big table performance might be better as one query through one big table is better than 50 through smaller tables. I wanted to make sure that this might work, as database is humongous (around 20Gb), and reshaping it would cost a while.
Is this plan I just described viable? Is there any other solution for such case?
It might be helpfull so I attach function that produces my SQL query that is unique for each table:
pub fn transition_query(
select_param: &str, // usually asterix
table_name: &str, // table I'd like to query
birth_degree: &f64, // constant number
wanted_degree: &f64, // another constant number
orb: &f64, // another constant number
upper_date_limit: DateTime<Utc>, // casts to SQL-like string
lower_date_limit: DateTime<Utc>, // casts to SQL-like string
) -> String {
let parsed_upper_date_limit = CelestialBodyPosition::parse_date(upper_date_limit);
let parsed_lower_date_limit = CelestialBodyPosition::parse_date(lower_date_limit);
return format!("
SELECT *,(SECOND_LAG>60 OR SECOND_LAG IS NULL) AS TRANSIT_START, (SECOND_LEAD > 60 OR SECOND_LEAD IS NULL) AS TRANSIT_END, time FROM (
SELECT
*,
UNIX_TIME - LAG(UNIX_TIME,1) OVER (ORDER BY time) as SECOND_LAG,
LEAD(UNIX_TIME,1) OVER (ORDER BY time) - UNIX_TIME as SECOND_LEAD FROM (
SELECT {select_param},
DATE(time) as day_scoped_date,
CAST(strftime('%s', time) AS INT) AS UNIX_TIME,
longitude
FROM {table_name}
WHERE ((-{orb} <= abs(realModulo(longitude -{birth_degree} -{wanted_degree},360))
AND abs(realModulo(longitude -{birth_degree} -{wanted_degree},360)) <= {orb})
OR
(-{orb} <= abs(realModulo(longitude -{birth_degree} +{wanted_degree},360))
AND abs(realModulo(longitude -{birth_degree} +{wanted_degree},360)) <= {orb}))
AND time < '{parsed_upper_date_limit}' AND time > '{parsed_lower_date_limit}'
)
) WHERE (TRANSIT_START AND NOT TRANSIT_END) OR (TRANSIT_END AND NOT TRANSIT_START) ;
");
}

I solved the issue programmatically. Whole thing was done with Rust and r2d2_sqlite library. I'm still doing a lot of queries, but now it's done in threads. It allowed me to reduce execution time from 25s to around 3s. Here's the code:
use std::sync::mpsc;
use std::thread;
use r2d2_sqlite::SqliteConnectionManager;
use r2d2;
let manager = SqliteConnectionManager::file("db_path");
let pool = r2d2::Pool::builder().build(manager).unwrap();
let mut result: Vec<CelestialBodyPosition> = vec![]; // Vector of structs
let (tx, rx) = mpsc::channel(); // Allows ansynchronous communication
let mut children = vec![]; //vector of join handlers (not sure if needed at all
for query in queries {
let pool = pool.clone(); // For each loop I clone connection to databse
let inner_tx = tx.clone(); // and messager, as each thread should have spearated one.
children.push(thread::spawn(move || {
let conn = pool.get().unwrap();
add_real_modulo_function(&conn); // this adds custom sqlite function I needed
let mut sql = conn.prepare(&query).unwrap();
// this does query, and maps result to my internal type
let positions: Vec<CelestialBodyPosition> = sql
.query_map(params![], |row| {
Ok(CelestialBodyPosition::new(row.get(1)?, row.get(2)?))
})
.unwrap()
.map(|position| position.unwrap())
.collect();
// this sends partial result to receiver
return inner_tx.send(positions).unwrap();
}));
}
// first messenger has to be dropped, otherwise program will wait for its input
drop(tx);
for received in rx {
result.extend(received); // combine all results
}
return result;
As you can see no optimization happened from sqlite site, which kinda makes me feel I'm doing something wrong, but for now it's alright. It might be good to press some more control over amount of spawned threads.

Optaplanner. School timetabling. Force first lession

I'm trying to add constraints to School timetabling example. For example: "all groups should have the first lesson".
I tried EasyScore and Streaming - no success. EasyScore cant finds a proper solution, shuffles lessons a lot. Streaming gave me an error: Undo for (Lesson(subj...)) does not exist
Code for Streaming:
from(Lesson::class.java)
.filter { it.timeslot != null }
.groupBy({ it.studentGroup }, { it.timeslot!!.day }, ConstraintCollectors.toList())
.filter { group, day, list ->
list.any { it.timeslot!!.number != 1 }
}
.penalize(
"Student must have first lesson",
HardSoftScore.ONE_HARD
) { group, day, list -> list.count { it.timeslot!!.number != 1 } },
Looks like I'm thinking the wrong direction.
https://github.com/Lewik/timetable
Any help will be greatly appreciated.
update: fixed == -> =!

As far as I understand it, I don't think you're enforcing what you intend to enforce. From what I make from your source code, you penalize every studentgroup's first lesson of the day.
What you should do to enforce the intended goal, is to penalize every studentgroup that does NOT have a timeslot with number == 1 but DOES have one (of the same day) where timeslot number != 1.
So something like :
join all Lesson.class instances with all Lesson.class instances where the first lesson's studentGroup equals the second lesson's studentGroup AND the first lesson's timeSlot's day equals the second lesson's timeSlot's day. You obtain a BiConstraintStream<Lesson, Lesson> this way...
from this, filter all Lesson.class instances where the first lesson's timeSlot's number is less than the second lesson's timeSlot number
then penalise the remaining where the first lesson's timeSlot number differs from 1. That equals penalising all of a studentGroup's days where they have some lesson that day without having any lesson that day during the first timeslot.
If I understood you correctly, that's what you wanted ?

I don't know the real source of the problem, but it's about hashCode. The exception was thrown because HashMap with Object key can't find by that Object.
Lesson class:
#Serializable
#NoArg
#PlanningEntity
data class Lesson(
val subject: String,
val teacher: String,
val studentGroup: String,
#PlanningVariable(valueRangeProviderRefs = ["timeslotRange"])
var timeslot: TimeSlot? = null,
#PlanningId
val id: String = UUID.randomUUID().toString(),
)
The implementation above will not work. It could be fixed if I remove data or add override fun hashCode() = Objects.hash(id). #PlanningId does not help here. Kotlin generates hashCode for data classes and seems it not working with optaplanner (or vise versa)

How about using .ifNotExists()?
First, convert student group from a String into a class and add #ProblemFactCollectionProperty List<StudentGroup> on your solution, then do
from(StudentGroup.class)
.ifNotExists(from(Lesson.class).filter(Lesson::isFirstTimeslot),
equals(this -> this, Lesson::getStudentGroup)
.penalize(...);

Java 8 Stream API - convert for loop over map & list iterator inside it

In the below code, I am trying to calculate the total price of a basket, where basket is a HashMap containing the products as key and the quantity as value. Promotions are available as a list of Promotion.
I am looping over every map entry and for each of them iterating the promotions. If the promotion matches, I am taking the promotion price (promotion.computeDiscountedPrice()) and removing the promotion from the list (Because a promotion is applicable only to a product & product is unique in the list)
If there is no promotion, we execute block.
if (!offerApplied) { /* .... */ }
Can you please help me in doing this same operation using JAVA 8 stream api?
BigDecimal basketPrice = new BigDecimal("0.0");
Map<String, Integer> basket = buildBasket(input);
List<Promotion> promotions = getOffersApplicable(basket);
for (Map.Entry<String, Integer> entry : trolley.entrySet()) {
boolean offerApplied = false;
Iterator<Promotion> promotionIterator = promotions.iterator();
while (promotionIterator.hasNext()) {
Promotion promotion = promotionIterator.next();
if (entry.getKey().equalsIgnoreCase(offer.getProduct().getProductName())) {
basketPrice = basketPrice.add(promotion.computeDiscountedPrice());
offerApplied = true;
promotionIterator.remove();
break;
}
if (!offerApplied) {
basketPrice = basketPrice.add(Product.valueOf(entry.getKey()).getPrice()
.multiply(new BigDecimal(entry.getValue())));
}
}
return basketPrice;

The simplest and cleaner solution, with a better performance than having to iterate the entire promotions list, is to start by creating a map of promotions identified by the product id (in lower case or upper case [assuming no case collision occurs by the use of equalsIgnoreCase(..)]).
Map<String, Promotion> promotionByProduct = promotions.stream()
.collect(Collectors.toMap(prom -> prom.getProduct()
.getProductName().toLowerCase(), Function.identity()));
This will avoid the need to iterate over the entire array when searching for promotions, it also avoids deleting items from it, which in case of being an ArrayList would need to shift to left the remaining elements each time the remove is used.
BigDecimal basketPrice = basket.keySet().stream()
.map(name -> Optional.ofNullable(promotionByProduct.get(name.toLowerCase()))
.map(Promotion::computeDiscountedPrice) // promotion exists
.orElseGet(() -> Product.valueOf(name).getPrice()) // no promotion
.multiply(BigDecimal.valueOf(basket.get(name))))
.reduce(BigDecimal.ZERO, BigDecimal::add);
It iterates for each product name in the basket, then checks if a promotion exists, it uses the computeDiscountedPrice method, otherwise it looks the product with Product.valueOf(..) and gets the price, after that it mutiplies this value by the quantity of products in the basket and finally the results are reduced (all values of the basket are added) with the BigDecimal.add() method.
Important thing to note, is that in your code, you don't multiply by the quantity the result of promotion.computeDiscountedPrice() (this code above does), i'm not sure if that is a type in your code, or that's the way it should behave.
If case it is in fact the way it should behave (you don't want to multiply quantity by promotion.computeDiscountedPrice()) the code would be:
BigDecimal basketPrice = basket.keySet().stream()
.map(name -> Optional.ofNullable(promotionByProduct.get(name.toLowerCase()))
.map(Promotion::computeDiscountedPrice)
.orElseGet(() -> Product.valueOf(name).getPrice()
.multiply(BigDecimal.valueOf(basket.get(name)))))
.reduce(BigDecimal.ZERO, BigDecimal::add);
Here the only value multiplied by quantity would be the product price obtained with Product.valueOf(name).getPrice().
Finally another option, all in one line and not using the map (iterating over the promotions) using the first approach (multipling by quantity in the end):
BigDecimal basketPrice = basket.keySet().stream()
.map(name -> promotions.stream()
.filter(prom -> name.equalsIgnoreCase(prom.getProduct().getProductName()))
.findFirst().map(Promotion::computeDiscountedPrice) // promotion exists
.orElseGet(() -> Product.valueOf(name).getPrice()) // no promotion
.multiply(BigDecimal.valueOf(basket.get(name))))
.reduce(BigDecimal.ZERO, BigDecimal::add);

Picking max item by column from group by in Slick 3.x

I'm trying to write a Slick query to find the "max" element within a group and then continue querying based on that result, however I'm getting a massive error when I try what I thought was the obvious way:
val articlesByUniqueLink = for {
(link, groupedArticles) <- historicArticles.groupBy(_.link)
latestPerLink <- groupedArticles.sortBy(_.pubDate.desc).take(1)
} yield latestPerLink
Since this doesn't seem to work, I'm wondering if there's some other way to find the "latest" element out of "groupedArticles" above, assuming these come from an Articles table with a pubDate Timestamp and a link that can be duplicated. I'm effectively looking for HAVING articles.pub_date = max(articles.pub_date).
The other equivalent way to express it yields the same result:
val articlesByUniqueLink = for {
(link, groupedArticles) <- historicArticles.groupBy(_.link)
latestPerLink <- groupedArticles.filter(_.pubDate === groupedArticles.map(_.pubDate).max.get)
} yield latestPerLink
[SlickTreeException: Unreachable reference to s2 after resolving monadic joins + 50 lines of Slick node trees.

The best way I found to get max or min or etc. per group in Slick is to use self join on grouping result:
val articlesByUniqueLink = for {
(article, _) <- historicArticles join historicArticles.groupBy(_.link)
.map({case (link, group) => (link, group.map(_.pubDate).max)})
on ((article, tuple) => article.link === tuple._1 &&
article.pubDate === tuple._2)
} yield article
If there is possible to produce duplicates with on condition, just drop duplicates like this after.

Using LINQ to pull collection until aggregate condition met

At a high level, I need a query that can pull a subset of records based on the sum of a column, just like Linq: How to query items from a collection until the sum reaches a certain value.
However, the key difference is that he's already got his records in an object, and I don't and can't. My table can have millions of records. If I build my query the way he did, I get this error:
"A lambda expression with a statement body
cannot be converted to an expression tree"
Which makes sense after researching it, LINQ can't turn the answer in the above referenced question into valid SQL.
I'm going to make a hypothetical table that represents my situation.
Order Id | Cookie Name | Qty
1 Sugar 5
2 Snickerdoodle 4
3 Chocolate chip 8
4 Snickerdoodle 10
5 Snickerdoodle 5
Given this sample, I need to write a query that grabs the first X orders of Snickerdoodle until the summed Qty exceedes an input from the parameter (i.e. If the user chooses 13, it would return records 2 & 4 ).
I'm using Nhibernate.Linq, because I'm more comfortable in LINQ. I'm completely open to ICreate if the need arises.
As a side note, I'm interested in this as a concept as well as a direct problem. Even though I need a Sum, there has to be a way to do something akin to a takewhile that executes until a condition is met.

pragmatic approach
int needed = ...;
int actual = 0;
int page = 0;
const int pagesize = 20; // set to some sensible value, eg. the pagesize of the grid shown to the user
var results = new List<CookieOrder>();
while (actual < needed)
{
var partialResults = session.Query<CookieOrder>()
.Where(c => c.Name == "Snickerdoodle")
.OrderBy(c => c.Id)
.Skip(page * pagesize)
.Take(pagesize)
.ToList();
for(int i = 0; i < partialResults.Length && actual < needed; i++)
{
results.Add(partialResults[i]);
actual = partialResults[i].Quantity;
}
page++;
}
return results;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Joiners with filtering performs very slowly - optaplanner

Related

How to optimize querying multiple unrelated tables in SQLite?

Optaplanner. School timetabling. Force first lession

Java 8 Stream API - convert for loop over map & list iterator inside it

Picking max item by column from group by in Slick 3.x

Using LINQ to pull collection until aggregate condition met

Categories

Resources