How can i add an array of values to Google ortools versus a lower and upper bound? - optimization

In the documentation and all examples I can find... in terms of nurse scheduling at least, everyone just declares shift values within the search space of {1,4} lets say for shift 1,2,3,4....
solver = pywrapcp.Solver("schedule_shifts")
num_nurses = 4
num_shifts = 4 # Nurse assigned to shift 0 means not working that day.
num_days = 7
# [START]
# Create shift variables.
shifts = {}
for j in range(num_nurses):
for i in range(num_days):
shifts[(j, i)] = solver.IntVar(0, num_shifts - 1, "shifts(%i,%i)" % (j, i))
shifts_flat = [shifts[(j, i)] for j in range(num_nurses) for i in range(num_days)]
# Create nurse variables.
nurses = {}
for j in range(num_shifts):
for i in range(num_days):
nurses[(j, i)] = solver.IntVar(0, num_nurses - 1, "shift%d day%d" % (j,i))
I want to avoid the use of range of values when I call solver.IntVar(lowerbound, upperbound, ...)
I want IntSolver([available values that you can choose], ...)
I created a matrix of all shifts as the columns flowing from the first day to last. My row indexes don't matter but in each day/shift column, I have the index values of nurses in ranked descending order of who bid the highest for that shift. I want to create then a constraint where if I choose a nurse, I choose the maximum bid that is allowed via other constraints from the column, however I don't know how to do that given the limited documentation ortools has with python IntVar.

Can you try
solver.IntVar([values...], 'name')
It should work.
See https://github.com/google/or-tools/blob/master/examples/python/einav_puzzle2.py

Related

GAMS- manipulating expression within a loop

I have a matrix, of dimension, i rows and j columns, a specific element of which is called x(i,j), where say i are plants, and j are markets. In standard GAMS notation:
Sets
i canning plants / seattle, san-diego /
j markets / new-york, chicago, topeka / ;
Now, I also wish to create a loop, over time- for 5 periods. Essentially, say I define
Set t time period
/period1
period2
period3
period4
period5
/ ;
Parameters
time(t)
/ period1 1,
period2 2,
period3 3,
period4 4,
period5 5
/ ;
Basically, I want to re-run this loop, which contains a bunch of other commands, but I wish to re-define this matrix from period 2 onwards, to look like this:
x("seattle",j)=x("seattle",j)+s("new-york",j)
x("new-york",j)=0'
Essentially, within the loop, I want the matrix x(i,j) to look different after period 2, wherein the column x("seattle",j) is replaced with the erstwhile x("seattle",j)+s("new-york",j) and the column x("new-york",j) is set to 0.
The loop would start like :
loop
(t,
...
Option reslim = 20000 ;
option nlp = conopt3 ;
solve example using NLP maximizing VARIABLE ;
) ;
I am not sure how to keep redefining this matrix within the loop, for each period>2.
Please note: After period 2, the matrix looks the same. The change only happens once (i.e., the matrix elements do not keep looping from the previous period, but just switch once , at the end of period 2, and then stay constant thereafter.
Any help on this is much appreciated!
You can use a $ condition to make this change in the loop for period2 only, like this:
x("seattle",j)$sameAs(t,'period2')=x("seattle",j)+s("new-york",j);

Select column with the most unique values from csv, python

I'm trying to come up with a way to select from a csv file the one numeric column that shows the most unique values. If there are multiple with the same amount of unique values it should be the left-most one. The output should be either the name of the column or the index.
Position,Experience in Years,Salary,Starting Date,Floor,Room
Middle Management,5,5584.10,2019-02-03,12,100
Lower Management,2,3925.52,2016-04-18,12,100
Upper Management,1,7174.46,2019-01-02,10,200
Middle Management,5,5461.25,2018-02-02,14,300
Middle Management,7,7471.43,2017-09-09,17,400
Upper Management,10,12021.31,2020-01-01,11,500
Lower Management,2,2921.92,2019-08-17,11,500
Middle Management,5,5932.94,2017-11-21,15,600
Upper Management,7,10192.14,2018-08-18,18,700
So here I would want 'Floor' or 4 as my output given that Floor and Room have the same amount of unique values but Floor is the left-most one (I need it in pure python, i can't use pandas)
I have this nested in a whole bunch of other code for what I need to do as a whole, i will spare you the details but these are the used elements in the code:
new_types_list = [str, int, str, datetime.datetime, int, int] #all the datatypes of the columns
l1_listed = ['Position', 'Experience in Years', 'Salary', 'Starting Date', 'Floor', 'Room'] #the header for each column
difference = [3, 5, 9, 9, 6, 7] #is basically the amount of unique values each column has
And here I try to do exactly what I mentioned before:
another_list = [] #now i create another list
for i in new_types_list: # this is where the error occurs, it only fills the list with the index of the first integer 3 times instead of with the individual indices
if i== int:
another_list.append(new_types_list.index(i))
integer_listi = [difference[i] for i in another_list] #and this list is the corresponding unique values from the integers
for i in difference: #now we want to find out the one that is the highest
if i== max(integer_listi):
chosen_one_i = difference.index(i) #the index of the column with the most unique values is the chosen one -
MUV_LMNC = l1_listed[chosen_one_i]
```
You can use .nunique() to get number of unique in each column:
df = pd.read_csv("your_file.csv")
print(df.nunique())
Prints:
Position 3
Experience in Years 5
Salary 9
Starting Date 9
Floor 7
Room 7
dtype: int64
Then to find max, use .idxmax():
print(df.nunique().idxmax())
Prints:
Salary
EDIT: To select only integer columns:
print(df.loc[:, df.dtypes == np.integer].nunique().idxmax())
Prints:
Floor

Pandas manipulation: matching data from other columns to one column, applied uniquely to all rows

I have a model that predicts 10 words for a particular course in order of likelihood, and I'd like the first 5 words of those words that appear in the course's description.
This is the format of the data:
course_name course_title course_description predicted_word_10 predicted_word_9 predicted_word_8 predicted_word_7 predicted_word_6 predicted_word_5 predicted_word_4 predicted_word_3 predicted_word_2 predicted_word_1
Xmath 32 Precalculus Polynomial and rational functions, exponential... directed scholars approach build african different visual cultures placed global
Xphilos 2 Morality Introduction to ethical and political philosop... make presentation weekly european ways general range questions liberal speakers
My idea is for each row to start iterating from predicted_word_1 until I get the first 5 that are in the description. I'd like to save those words in the order they appear into additional columns description_word_1 ... description_word_5. (If there are <5 predicted words in the description I plan to return NAN in the corresponding columns).
To clarify with an example: if the course_description of a course is 'Polynomial and rational functions, exponential and logarithmic functions, trigonometry and trigonometric functions. Complex numbers, fundamental theorem of algebra, mathematical induction, binomial theorem, series, and sequences. ' and its first few predicted words are irrelevantword1, induction, exponential, logarithmic, irrelevantword2, polynomial, algebra...
I would want to return induction, exponential, logarithmic, polynomial, algebra for that in that order and do the same for the rest of the courses.
My attempt was to define an apply function that will take in a row and iterate from the first predicted word until it finds the first 5 that are in the description, but the part I am unable to figure out is how to create these additional columns that have the correct words for each course. This code will currently only keep the words for one course for all the rows.
def find_top_description_words(row):
print(row['course_title'])
description_words_index=1
for i in range(num_words_per_course):
description = row.loc['course_description']
word_i = row.loc['predicted_word_' + str(i+1)]
if (word_i in description) & (description_words_index <=5) :
print(description_words_index)
row['description_word_' + str(description_words_index)] = word_i
description_words_index += 1
df.apply(find_top_description_words,axis=1)
The end goal of this data manipulation is to keep the top 10 predicted words from the model and the top 5 predicted words in the description so the dataframe would look like:
course_name course_title course_description top_description_word_1 ... top_description_word_5 predicted_word_1 ... predicted_word_10
Any pointers would be appreciated. Thank you!
If I understand correctly:
Create new DataFrame with just 100 predicted words:
pred_words_lists = df.apply(lambda x: list(x[3:].dropna())[::-1], axis = 1)
Please note that, there are lists in each row with predicted words. The order is nice, I mean the first, not empty, predicted word is on the first place, the second on the second place and so on.
Now let's create a new DataFrame:
pred_words_df = pd.DataFrame(pred_words_lists.tolist())
pred_words_df.columns = df.columns[:2:-1]
And The final DataFrame:
final_df = df[['course_name', 'course_title', 'course_description']].join(pred_words_df.iloc[:,0:11])
Hope this works.
EDIT
def common_elements(xx, yy):
temp = pd.Series(range(0, len(xx)), index= xx)
return list(df.reindex(yy).sort_values()[0:10].dropna().index)
pred_words_lists = df.apply(lambda x: common_elements(x[2].replace(',','').split(), list(x[3:].dropna())), axis = 1)
Does it satisfy your requirements?
Adapted solution (OP):
def get_sorted_descriptions_words(course_description, predicted_words, k):
description_words = course_description.replace(',','').split()
predicted_words_list = list(predicted_words)
predicted_words = pd.Series(range(0, len(predicted_words_list)), index=predicted_words_list)
predicted_words = predicted_words[~predicted_words.index.duplicated()]
ordered_description = predicted_words.reindex(description_words).dropna().sort_values()
ordered_description_list = pd.Series(ordered_description.index).unique()[:k]
return ordered_description_list
df.apply(lambda x: get_sorted_descriptions_words(x['course_description'], x.filter(regex=r'predicted_word_.*'), k), axis=1)

How to get same rank for same scores in Redis' ZRANK?

If I have 5 members with scores as follows
a - 1
b - 2
c - 3
d - 3
e - 5
ZRANK of c returns 2, ZRANK of d returns 3
Is there a way to get same rank for same scores?
Example: ZRANK c = 2, d = 2, e = 3
If yes, then how to implement that in spring-data-redis?
Any real solution needs to fit the requirements, which are kind of missing in the original question. My 1st answer had assumed a small dataset, but this approach does not scale as dense ranking is done (e.g. via Lua) in O(N) at least.
So, assuming that there are a lot of users with scores, the direction that for_stack suggested is better, in which multiple data structures are combined. I believe this is the gist of his last remark.
To store users' scores you can use a Hash. While conceptually you can use a single key to store a Hash of all users scores, in practice you'd want to hash the Hash so it will scale. To keep this example simple, I'll ignore Hash scaling.
This is how you'd add (update) a user's score in Lua:
local hscores_key = KEYS[1]
local user = ARGV[1]
local increment = ARGV[2]
local new_score = redis.call('HINCRBY', hscores_key, user, increment)
Next, we want to track the current count of users per discrete score value so we keep another hash for that:
local old_score = new_score - increment
local hcounts_key = KEYS[2]
local old_count = redis.call('HINCRBY', hcounts_key, old_score, -1)
local new_count = redis.call('HINCRBY', hcounts_key, new_score, 1)
Now, the last thing we need to maintain is the per score rank, with a sorted set. Every new score is added as a member in the zset, and scores that have no more users are removed:
local zdranks_key = KEYS[3]
if new_count == 1 then
redis.call('ZADD', zdranks_key, new_score, new_score)
end
if old_count == 0 then
redis.call('ZREM', zdranks_key, old_score)
end
This 3-piece-script's complexity is O(logN) due to the use of the Sorted Set, but note that N is the number of discrete score values, not the users in the system. Getting a user's dense ranking is done via another, shorter and simpler script:
local hscores_key = KEYS[1]
local zdranks_key = KEYS[2]
local user = ARGV[1]
local score = redis.call('HGET', hscores_key, user)
return redis.call('ZRANK', zdranks_key, score)
You can achieve the goal with two Sorted Set: one for member to score mapping, and one for score to rank mapping.
Add
Add items to member to score mapping: ZADD mem_2_score 1 a 2 b 3 c 3 d 5 e
Add the scores to score to rank mapping: ZADD score_2_rank 1 1 2 2 3 3 5 5
Search
Get score first: ZSCORE mem_2_score c, this should return the score, i.e. 3.
Get the rank for the score: ZRANK score_2_rank 3, this should return the dense ranking, i.e. 2.
In order to run it atomically, wrap the Add, and Search operations into 2 Lua scripts.
Then there's this Pull Request - https://github.com/antirez/redis/pull/2011 - which is dead, but appears to make dense rankings on the fly. The original issue/feature request (https://github.com/antirez/redis/issues/943) got some interest so perhaps it is worth reviving it /cc #antirez :)
The rank is unique in a sorted set, and elements with the same score are ordered (ranked) lexically.
There is no Redis command that does this "dense ranking"
You could, however, use a Lua script that fetches a range from a sorted set, and reduces it to your requested form. This could work on small data sets, but you'd have to devise something more complex for to scale.
unsigned long zslGetRank(zskiplist *zsl, double score, sds ele) {
zskiplistNode *x;
unsigned long rank = 0;
int i;
x = zsl->header;
for (i = zsl->level-1; i >= 0; i--) {
while (x->level[i].forward &&
(x->level[i].forward->score < score ||
(x->level[i].forward->score == score &&
sdscmp(x->level[i].forward->ele,ele) <= 0))) {
rank += x->level[i].span;
x = x->level[i].forward;
}
/* x might be equal to zsl->header, so test if obj is non-NULL */
if (x->ele && x->score == score && sdscmp(x->ele,ele) == 0) {
return rank;
}
}
return 0;
}
https://github.com/redis/redis/blob/b375f5919ea7458ecf453cbe58f05a6085a954f0/src/t_zset.c#L475
This is the piece of code redis uses to compute the rank in sorted sets. Right now ,it just gives rank based on the position in the Skiplist (which is sorted based on scores).
What does the skiplistnode variable "span" mean in redis.h? (what is span ?)

Create 20 unique bingo cards

I'm trying to create 20 unique cards with numbers, but I struggle a bit.. So basically I need to create 20 unique matrices 3x3 having numbers 1-10 in first column, numbers 11-20 in the second column and 21-30 in the third column.. Any ideas? I'd prefer to have it done in r, especially as I don't know Visual Basic. In excel I know how to generate the cards, but not sure how to ensure they are unique..
It seems to be quite precise and straightforward to me. Anyway, i needed to create 20 matrices that would look like :
[,1] [,2] [,3]
[1,] 5 17 23
[2,] 8 18 22
[3,] 3 16 24
Each of the matrices should be unique and each of the columns should consist of three unique numbers ( the 1st column - numbers 1-10, the 2nd column 11-20, the 3rd column - 21-30).
Generating random numbers is easy, though how to make sure that generated cards are unique?Please have a look at the post that i voted for as an answer - as it gives you thorough explanation how to achieve it.
(N.B. : I misread "rows" instead of "columns", so the following code and explanation will deal with matrices with random numbers 1-10 on 1st row, 11-20 on 2nd row etc., instead of columns, but it's exactly the same just transposed)
This code should guarantee uniqueness and good randomness :
library(gtools)
# helper function
getKthPermWithRep <- function(k,n,r){
k <- k - 1
if(n^r< k){
stop('k is greater than possibile permutations')
}
v <- rep.int(0,r)
index <- length(v)
while ( k != 0 )
{
remainder<- k %% n
k <- k %/% n
v[index] <- remainder
index <- index - 1
}
return(v+1)
}
# get all possible permutations of 10 elements taken 3 at a time
# (singlerowperms = 720)
allperms <- permutations(10,3)
singlerowperms <- nrow(allperms)
# get 20 random and unique bingo cards
cards <- lapply(sample.int(singlerowperms^3,20),FUN=function(k){
perm2use <- getKthPermWithRep(k,singlerowperms,3)
m <- allperms[perm2use,]
m[2,] <- m[2,] + 10
m[3,] <- m[3,] + 20
return(m)
# if you want transpose the result just do:
# return(t(m))
})
Explanation
(disclaimer tl;dr)
To guarantee both randomness and uniqueness, one safe approach is generating all the possibile bingo cards and then choose randomly among them without replacements.
To generate all the possible cards, we should :
generate all the possibilities for each row of 3 elements
get the cartesian product of them
Step (1) can be easily obtained using function permutations of package gtools (see the object allPerms in the code). Note that we just need the permutations for the first row (i.e. 3 elements taken from 1-10) since the permutations of the other rows can be easily obtained from the first by adding 10 and 20 respectively.
Step (2) is also easy to get in R, but let's first consider how many possibilities will be generated. Step (1) returned 720 cases for each row, so, in the end we will have 720*720*720 = 720^3 = 373248000 possible bingo cards!
Generate all of them is not practical since the occupied memory would be huge, thus we need to find a way to get 20 random elements in this big range of possibilities without actually keeping them in memory.
The solution comes from the function getKthPermWithRep, which, given an index k, it returns the k-th permutation with repetition of r elements taken from 1:n (note that in this case permutation with repetition corresponds to the cartesian product).
e.g.
# all permutations with repetition of 2 elements in 1:3 are
permutations(n = 3, r = 2,repeats.allowed = TRUE)
# [,1] [,2]
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 2 1
# [5,] 2 2
# [6,] 2 3
# [7,] 3 1
# [8,] 3 2
# [9,] 3 3
# using the getKthPermWithRep you can get directly the k-th permutation you want :
getKthPermWithRep(k=4,n=3,r=2)
# [1] 2 1
getKthPermWithRep(k=8,n=3,r=2)
# [1] 3 2
Hence now we just choose 20 random indexes in the range 1:720^3 (using sample.int function), then for each of them we get the corresponding permutation of 3 numbers taken from 1:720 using function getKthPermWithRep.
Finally these triplets of numbers, can be converted to actual card rows by using them as indexes to subset allPerms and get our final matrix (after, of course, adding +10 and +20 to the 2nd and 3rd row).
Bonus
Explanation of getKthPermWithRep
If you look at the example above (permutations with repetition of 2 elements in 1:3), and subtract 1 to all number of the results you get this :
> permutations(n = 3, r = 2,repeats.allowed = T) - 1
[,1] [,2]
[1,] 0 0
[2,] 0 1
[3,] 0 2
[4,] 1 0
[5,] 1 1
[6,] 1 2
[7,] 2 0
[8,] 2 1
[9,] 2 2
If you consider each number of each row as a number digit, you can notice that those rows (00, 01, 02...) are all the numbers from 0 to 8, represented in base 3 (yes, 3 as n). So, when you ask the k-th permutation with repetition of r elements in 1:n, you are also asking to translate k-1 into base n and return the digits increased by 1.
Therefore, given the algorithm to change any number from base 10 to base n :
changeBase <- function(num,base){
v <- NULL
while ( num != 0 )
{
remainder = num %% base # assume K > 1
num = num %/% base # integer division
v <- c(remainder,v)
}
if(is.null(v)){
return(0)
}
return(v)
}
you can easily obtain getKthPermWithRep function.
One 3x3 matrix with the desired value range can be generated with the following code:
mat <- matrix(c(sample(1:10,3), sample(11:20,3), sample(21:30, 3)), nrow=3)
Furthermore, you can use a for loop to generate a list of 20 unique matrices as follows:
for (i in 1:20) {
mat[[i]] <- list(matrix(c(sample(1:10,3), sample(11:20,3), sample(21:30,3)), nrow=3))
print(mat[[i]])
}
Well OK I may fall on my face here but I propose a checksum (using Excel).
This is a unique signature for each bingo card which will remain invariate if the order of numbers within any column is changed without changing the actual numbers. The formula is
=SUM(10^MOD(A2:A4,10)+2*10^MOD(B2:B4,10)+4*10^MOD(C2:C4,10))
where the bingo numbers for the first card are in A2:C4.
The idea is to generate a 10-digit number for each column, then multiply each by a constant and add them to get the signature.
So here I have generated two random bingo cards using a standard formula from here plus two which are deliberately made to be just permutations of each other.
Then I check if any of the signatures are duplicates using the formula
=MAX(COUNTIF(D5:D20,D5:D20))
which shouldn't given an answer more than 1.
In the unlikely event that there were duplicates, then you would just press F9 and generate some new cards.
All formulae are array formulae and must be entered with CtrlShiftEnter
Here is an inelegant way to do this. Generate all possible combinations and then sample without replacement. These are permutations, combinations: order does matter in bingo
library(dplyr)
library(tidyr)
library(magrittr)
generate_samples = function(n) {
first = data_frame(first = (n-9):n)
first %>%
merge(first %>% rename(second = first)) %>%
merge(first %>% rename(third = first)) %>%
sample_n(20)
}
suffix = function(df, suffix)
df %>%
setNames(names(.) %>%
paste0(suffix))
generate_samples(10) %>% suffix(10) %>%
bind_cols(generate_samples(20) %>% suffix(20)) %>%
bind_cols(generate_samples(30) %>% suffix(30)) %>%
rowwise %>%
do(matrix = t(.) %>% matrix(3)) %>%
use_series(matrix)