Constraint to require a certain number of matches - optaplanner

I am trying to write a hard constraint that requires that a certain value has been chosen a certain number of times. I have a constraint written below, which (I think) filters to a set of results that match this criteria, and I want it to penalize if there are no such results. I cannot figure out how to work .ifNotExists() into this. I think I am missing some understanding.
fun cpMustUseN(constraintFactory: ConstraintFactory): Constraint {
return constraintFactory.forEach(MealMenu::class.java)
.join(CpMustUse::class.java, equal({ mm -> mm.slottedCp!!.id }, CpMustUse::cpId))
.groupBy({ _, cpMustUse -> cpMustUse.numRequired }, countBi())
.filter { numRequired, count -> count >= numRequired }
.penalize(HardSoftScore.ONE_HARD)
.asConstraint("cpMustUseN")
}
MealMenu is an entity:
#PlanningEntity
class MealMenu {
#PlanningId
var id = 0
#PlanningVariable(valueRangeProviderRefs = ["cpRange"])
var slottedCp: Cp? = null
}
CpMustUse is a #ProblemFactCollectionProperty on my solution class, and the class looks like this:
class CpMustUse {
var cpId = 1
var numRequired = 4
}
I want to, in this case, constrain the result such that cpId 1 is chosen at least 4 times.

There are two conceptual issues here:
groupBy() will only match if the join returns a non-zero number of matches. Therefore you will never get a countBi() of zero - in that case, groupBy() will simply never match. Therefore you can not use grouping to check that something does not exist.
ifNotExists() always applies to a fact from the working memory. You can not use it to check if a result of a previous calculation exists.
Combined together, this makes your approach infeasible. This particular requirement will be a bit trickier to implement.
Start by inverting the logic of the constraint you pasted. Penalize every time count < numRequired; this handles all cases where count >= 1.
Then introduce a second constraint that will handle specifically the case where the count would be zero - in this case, you should be able to use forEach(MealMenu::class.java).ifNotExists(CpMustUse::class, ...).

Related

How to optimize a problem using only linear constraints

I am new to AMPL. Currently I am trying to optimize a networking problem. I can only use CPLEX solver. Others like ILOG CP are forbidden.
Input:
Demands - set of demands that have to be fulfilled (basically QoS rules) which consist of multiple paths between a given pair of servers. Every path must be assigned some guaranteed bandwidth which is >= 0
demand_maxPath - number of paths for a demand d
hd - bandwidth requested for a demand d
Goal
The goal is to assign bandwidths to every path with regards to some cost function.
Here is an example. d denotes demand id, x(d,1) is a first path that belongs to demand d and so on, hd denotes bandwidth requested for a demand d.
Constraints
There are a couple of constraints:
sum of bandwidths of all paths for a demand d must equal hd
x(d,p) must be equal to hd, where p is a variable
Constraint 2 enforces that all other paths from a demand d, except for path (x,dp) must be equal to 0.
Approach
My approach: I declared a variable:
var demandPath_signalCount { d in Demands, 1..demand_maxPath[d]}, >= 0;
which holds values for each of the paths. The following constraint reflects contraint 1:
subject to demand_satisfaction_constraint { d in Demands }:
sum { dp in 1..demand_maxPath[d] } demandPath_signalCount[d,dp] = h[d];
However, I can't think of a way to write constraint 2. For example:
subject to path_value_satisfaction_constraint { d in Demands }:
max { dp in 1..demand_maxPath[d] } demandPath_signalCount[d,dp] = h[d];
doesn't work since max() function is nonlinear.
Other idea was to declare another variable:
var demand_chosenPath { d in demands }, >= 0;
and to use it like so:
subject to path_value_satisfaction_constraint { d in Demands }:
demandPath_signalCount[d,demand_chosenPath[d]] = h[d];
It obviously doesn't work either since variables cannot be used as indices.
Yet another way that I tried was to constraint the values that demandPath_signalCount may be equal to like so:
set possible_values {d in Demands } = 0..demand_volume[d] by demand_volume[d];
and
subject to possible_values_satisfaction_constraint { d in Demands, dp in 1..demand_maxPath[d] }:
demandPath_signalCount[d,dp] in possible_values[d];
But then again, the error is: continuous variable in tuple
How to formulate the second constraint?

Results DataSet from DynamoDB Query using GSI is not returning correct results

I have a dynamo DB table where I am currently storing all the events that are happening in my system with respect to every product. There is a primary key with a Hash combination of productid,eventtype and eventcategory and Sort Key as Creation Time on the main table. The table was created and data was added into it.
Later I added a new GSI on the table with the attributes being Secondary Hash (which is just the combination of eventcategory and eventtype (excluding productid) and CreationTime as Sort Key. This was added so that I can query for multiple products at once.
The GSI seems to work fine, However only later I realized the data being returned is incorrect
Here is the scenario. (I am running all these queries against the newly created index)
I was querying for products with in the last 30 days and the Query returns 312 records, However, when I run the same query for last 90 days, it returns me only 128 records (which is wrong, should be atleast equal or greater than last 30 days records)
I have the pagination logic already embedded in my code, so that the lastEvaluatedKey is verified every-time, to loop and fetch the next set of records and after the loop, all the results are combined.
Not sure if I am missing something.
ANy suggestions would be appreciated.
var limitPtr *int64
if limit > 0 {
limit64 := int64(limit)
limitPtr = &limit64
}
input := dynamodb.QueryInput{
ExpressionAttributeNames: map[string]*string{
"#sch": aws.String("SecondaryHash"),
"#pkr": aws.String("CreationTime"),
},
ExpressionAttributeValues: map[string]*dynamodb.AttributeValue{
":sch": {
S: aws.String(eventHash),
},
":pkr1": {
N: aws.String(strconv.FormatInt(startTime, 10)),
},
":pkr2": {
N: aws.String(strconv.FormatInt(endTime, 10)),
},
},
KeyConditionExpression: aws.String("#sch = :sch AND #pkr BETWEEN :pkr1 AND :pkr2"),
ScanIndexForward: &scanForward,
Limit: limitPtr,
TableName: aws.String(ddbTableName),
IndexName: aws.String(ddbIndexName),
}
You reached the maximum number of items to evaluate (not necessarily the number of matching items). The limit is 1 MB.
The response will contain a LastEvaluatedKey parameter, it is the last item's id. You have to perform a new query with an extra ExclusiveStartKey parameter. (ExclusiveStartKey should be equal with LastEvaluatedKey's value.)
When the LastEvaluatedKey is empty you reached the end of the table.

SQL "group by" like - grouping algorithm

I have a table with more than 2 columns (let's say A, B and C). One column holds some numbers (C) and I want to do a "group by" like grouping, summing the numbers in C, but I don't know the algorithm for doing so.
I tried sorting the table by each column (from last to first, aside from the numbers column (C), so in this case: sort(B) and then sort(A)) and then, wherever nth row holds same values in A and B as in n-1th row, I add the number from nth row to n-1th row (in the C column), and then delete the nth row. Else, if A or B value in row n differs from A or B value in n-1th row, I'll just move to the next row. Then I repeat the algorithm till the last row in table. But somehow this isn't working all the time, especially when there're a lot more columns (some rows remain ungrouped, maybe because of the sorting method).
I want to know whether this is a good grouping algorithm and I need to look for the problem into the sorting method, or I need to use another (sorting and/or grouping) algorithm and which one. Thank you.
LE: Apparently the algorithm that I used works well after a thorough check of the code and fixing some minor mistakes that junior programmers like me often make :)
I think a good way to do this would be to wrap your row into a class, implement the equals method, and then use a Map to add the values up:
public class MyRow {
private Long columnA;
private String columnB;
private int columnC;
#Override
public boolean equals(final Object other) {
if (!other instanceof MyRow) {
return false;
}
final MyRow otherRow = (MyRow) other;
return this.columnA.equals(otherRow.getColumnA()) && this.columnB.equals(otherRow.getColumnB);
}
}
Then you can iterate over all the rows, and create a Map for holding the sums of C.
final Map<MyRow, Integer> computedCSums = new HashMap<MyRow, Integer>();
for (final MyRow myRow : myRows) {
if (computedCSums.get(myRow) == null) {
computedCSums.put(myRow, myRow.getColumnC());
} else {
computedCSums.put(myRow, computedSums.get(myRow) + myRow.getColumnC());
}
}
Then, to get the sum of grouped Cs of any row, you just do:
computedCSum.get(mySelectedRow);
I think there is three things should be considered about group by
less or equal is abstract
comparing two rows A, B according it columns (C1..Cn) are like this : compare each column from C1 to Cn , if we can get which is less, then return ,or if the two values are equal, then we go to compare next, repeat this until return.
which algorithm we choose
1)build a binary search tree or a hash table to store tuples , when we get a tuple, search the equal tuple , if we have , then merge the tuple which have the same group value, else put it to our search structure
2) read some tuples, then sort , walk the buffer and merge the same group
I prefer 1 rather than 2.
memory size
if out input is huge, we must consider memory limit.
we can use merge algorithm to deal this.
if memory exceed our limit , then write the tuples in memory to the tape order by their group columns
when we finish reading the input, then merge the result set in tape.

search within an array with a condition

I have two array I'm trying to compare at many levels. Both have the same structure with 3 "columns.
The first column contains the polygon's ID, the second a area type, and the third, the percentage of each area type for a polygone.
So, for many rows, it will compare, for example, ID : 1 Type : aaa % : 100
But for some elements, I have many rows for the same ID. For example, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 25% --- ID 2, type ccc, 50%. And in the second array, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 10% --- ID 2, type eee, 38% --- ID 2, type fff, 27%.
here's a visual example..
So, my function has to compare these two array and send me an email if there are differences.
(I wont show you the real code because there are 811 lines). The first "if" condition is
if array1.id = array2.id Then
if array1.type = array2.type Then
if array1.percent = array2.percent Then
zone_verification = True
Else
zone_verification = False
The probleme is because there are more than 50 000 rows in each array. So when I run the function, for each "array1.id", the function search through 50 000 rows in array2. 50 000 searchs for 50 000 rows.. it's pretty long to run!
I'm looking for something to get it running faster. How could I get my search more specific. Example : I have many id "2" in the array1. If there are many id "2" in the array2, find it, and push all the array2.id = 3 in a "sub array" or something like that, and search in these specific rows. So I'll have just X rows in array1 to compare with X rows in array 2, not with 50 000. and when each "id 2" in array1 is done, do the same thing for "id 4".. and for "id 5"...
Hope it's clear. it's almost the first time I use VB.net, and I have this big function to get running.
Thanks
EDIT
Here's what I wanna do.
I have two different layers in a geospatial database. Both layers have the same structure. They are a "spatial join" of the land parcels (55 000), and the land use layer. The first layer is the current one, and the second layer is the next one we'll use after 2015.
So I have, for each "land parcel" the percentage of each land use. So, for a "land parcel" (ID 7580-80-2532, I can have 50% of farming use (TYPE FAR-23), and 50% of residantial use (RES-112). In the first array, I'll have 2 rows with the same ID (7580-80-2532), but each one will have a different type (FAR-23, RES-112) and a different %.
In the second layer, the same the municipal zoning (land use) has changed. So the same "land parcel" will now be 40% of residential use (RES-112), 20% of commercial (COM-54) and 40% of a new farming use (FAR-33).
So, I wanna know if there are some differences. Some land parcels will be exactly the same. Some parcels will keep the same land use, but not the same percentage of each. But for some land parcel, there will be more or less land use types with different percentage of each.
I want this script to compare these two layers and send me an email when there are differences between these two layers for the same land parcel ID.
The script is already working, but it takes too much time.
The probleme is, I think, the script go through all array2 for each row in array 1.
What I want is when there are more than 1 rows with the same ID in array1, take only this ID in both arrays.
Maybe if I order them by IDs, I could write a condition. kind of "when you find what you're looking for, stop searching when you'll find a different value?
It's hard to explain it clearly because I've been using VB since last week.. And english isn't my first language! ;)
If you just want to find out if there are any differences between the first and second array, you could do:
Dim diff = New HashSet(of Polygon)(array1)
diff.SymmetricExceptWith(array2)
diff will contain any Polygon which is unique to array1 or array2. If you want to do other types of comparisons, maybe you should explain what you're trying to do exactly.
UPDATE:
You could use grouping and lookups like this:
'Create lookup with first array, for fast access by ID
Dim lookupByID = array1.ToLookup(Function(p) p.id)
'Loop through each group of items with same ID in array2
For Each secondArrayValues in array2.GroupBy(Function(p) p.id)
Dim currentID As Integer = secondArrayValues.Key 'Current ID is the grouping key
'Retrieve values with same ID in array1
'Use a hashset to easily compare for equality
Dim firstArrayValues As New HashSet(of Polygon)(lookupByID(currentID))
'Check for differences between the two sets of data, for this ID
If Not firstArrayValues.SetEquals(secondArrayValues) Then
'Data has changed, do something
Console.WriteLine("Differences for ID " & currentID)
End If
Next
I am answering this question based on the first part that you wrote (that is without the EDIT section). The correct answer should explain a good algorithm but I am suggesting you to use DB capabilities because they have optimized many queries for these purpose.
Put all the records in DB two tables - O(n) time ... If the records are static you dont need to perform this step every time.
Table 1
id type percent
Table 2
id type percent
Then use the DB query, some thing like this
select count(*) from table1 t1, table2 t2 where t1.id!=t2.id and t1.type!=t2.type
(you can use some better queries, what I am trying to say is give the control to DB to perform this operation)
retrieve the result in your code and perform the necessary operation.
EDIT
1) You can sort them in O(n logn) time based on ID + type + Percent and then perform binary search.
2) Store the first record in hash map with appropriate key - could be ID only or ID+type
this will take O(n) time and searching ,if key is correct, will take constant time.
You need to define a structure to store this data. We'll store all the data in a LandParcel class, which will have a HashSet<ParcelData>
public class ParcelData
{
public ParcelType Type { get; set; } // This can be an enum, string, etc.
public int Percent { get; set; }
// Redefine Equals and GetHashCode conveniently
}
public class LandParcel
{
public ID Id { get; set; } // Whatever the type of the ID is...
public HashSet<ParcelData> Data { get; set; }
}
Now you have to build your data structure, with something like this:
Dictionary<ID, LandParcel> data1 = new ....
foreach (var item in array1)
{
LandParcel p;
if (!data1.TryGetValue(item.id, out p)
data1[item.id] = p = new LandParcel(id);
// Can this data be repeated?
p.Data.Add(new ParcelData(item.type, item.percent));
}
You do the same with a data2 dictionary for the second array. Now you iterate for all items in data1 and compare them with the item with the same id for data2.
foreach (var parcel2 in data2.Values)
{
var parcel1 = data1[parcel2.ID]; // Beware with exceptions here !!!
if (!parcel1.Data.SetEquals(parcel2.Data))
// You have different parcels
}
(Now that I look at it, we are practically doing a small database query here, kind of smelly code ...)
Sorry for the C# code since I don't really feel so comfortable with VB, but it should be fairly straightforward.

Maths! Approximating the mean, without storing the whole data set

Obvious (but expensive) solution:
I would like to store rating of a track (1-10) in a table like this:
TrackID
Vote
And then a simple
SELECT AVERAGE(Vote) FROM `table` where `TrackID` = some_val
to calculate the average.
However, I am worried about scalability on this, especially as it needs to be recalculated each time.
Proposed, but possibly stupid, solution:
TrackID
Rating
NumberOfVotes
Every time someone votes, the Rating is updated with
new_rating = ((old_rating * NumberOfVotes) + vote) / (NumberOfVotes + 1)
and stored as the TrackID's new Rating value. Now whenever the Rating is wanted, it's a simple lookup, not a calculation.
Clearly, this does not calculate the mean. I've tried a few small data sets, and it approximates the mean. I believe it might converge as the data set increases? But I'm worried that it might diverge!
What do you guys think? Thanks!
Assuming you had infinite numeric precision, that calculation does update the mean correctly. In practice, you're probably using integer types, so it won't be exact.
How about storing the cumulative vote count, and the number of votes? (i.e. total=total+vote, numVotes=numVotes+1). That way, you can get the exact mean by dividing one by the other.
This approach will only break if you get so many votes that you overflow the range of the data type you're using. So use a big data type (32-bit ought to be enough, unless you're expecting ~4 billion votes)!
Store TrackId, RatingSum, NumberOfVotes in your table.
Every time someone votes,
NumberOfVotes = NumberOfVotes + 1
RatingsSum = RatingsSum + [rating supplied by user]
Then when selecting
SELECT TrackId, RatingsSum / NumberOfVotes FROM ...
Your solution is completely legit. and differes only by roughly a few times the floating point precision from a value calculated from the full source set.
You can certainly calculate a running mean and standard deviation without having all the points in hand. You merely need to accumulate the sum, sum of squares, and number of points.
It's not an approximation; the mean and standard deviation are exact.
Here's a Java class that demonstrates. You can adapt to your SQL solution as needed:
package statistics;
public class StatsUtils
{
private double sum;
private double sumOfSquares;
private long numPoints;
public StatsUtils()
{
this.init();
}
private void init()
{
this.sum = 0.0;
this.sumOfSquares = 0.0;
this.numPoints = 0L;
}
public void addValue(double value)
{
// Check for overflow in either number of points or sum of squares; reset if overflow is detected
if ((this.numPoints == Long.MAX_VALUE) || (this.sumOfSquares > (Double.MAX_VALUE-value*value)))
{
this.init();
}
this.sum += value;
this.sumOfSquares += value*value;
++this.numPoints;
}
public double getMean()
{
double mean = 0.0;
if (this.numPoints > 0)
{
mean = this.sum/this.numPoints;
}
return mean;
}
public double getStandardDeviation()
{
double standardDeviation = 0.0;
if (this.numPoints > 1)
{
standardDeviation = Math.sqrt((this.sumOfSquares - this.sum*this.sum/this.numPoints)/(this.numPoints-1L));
}
return standardDeviation;
}
public long getNumPoints() { return this.numPoints; }
}
Small improvement on your solution. You have the table:
TrackID
SumOfVotes
NumberOfVotes
When someone votes,
NumberOfVotes = NumberOfVotes + 1
SumOfVotes = SumOfVotes + ThisVote
and to see the average you only then do a division:
SELECT TrackID, (SumOfVotes/NumberOfVotes) AS Rating FROM `table`
I would add that the original (obvious and expensive) solution is only expensive compared to the provied solution when calculating the average.
It is cheaper when a vote is added, deleted or changed.
I guess that the original table
TrackID
Vote
VoterID
would still need to be used in the provided solution to keep track of the vote (rating) of every voter. SO, two tables have to be updated for every change in this table (insert, delete or Vote update).
In other words, the original solution may be the best way to go.