CP-SAT create a bool variable which represents whether a sum of variables is greater than 0 - sum

I have this piece of code in which im trying to make a variable teacher_doesnt_work that represents if Sum(classes_by_teacher[t] is 0 or not.
classes_by_teacher = {}
for t in all_teachers:
cur_classes = []
for d in all_days:
for p in all_day_parts:
for g in all_groups:
for s in all_subjects:
cur_classes.append(classes[(t, d, p, g, s)])
classes_by_teacher[t] = cur_classes
teacher_i_doesnt_work = []
for t in all_teachers:
teacher_works_n_times = model.NewIntVar(0,10000, f"How many times does {t} work").Sum(classes_by_teacher[t])
teacher_doesnt_work = model.NewBoolVar(f"teacher {t} does not work")
model.Add(teacher_doesnt_work == (teacher_works_n_times == 0))
teacher_i_doesnt_work.append(teacher_doesnt_work)
model.Maximize(sum(teacher_i_doesnt_work))
it as expected gives me this error, is there some workaround this?
File ~/notebook/jupyterenv/lib/python3.8/site-packages/ortools/sat/python/cp_model.py:412, in LinearExpr.__eq__(self, arg)
410 return BoundedLinearExpression(self, [arg, arg])
411 else:
--> 412 return BoundedLinearExpression(self - arg, [0, 0])
File ~/notebook/jupyterenv/lib/python3.8/site-packages/ortools/sat/python/cp_model.py:334, in LinearExpr.__sub__(self, arg)
332 if cmh.is_zero(arg):
333 return self
--> 334 return _Sum(self, -arg)
TypeError: bad operand type for unary -: 'BoundedLinearExpression'

I suggest reading this section of the doc on channeling
The solution will use 2 linear equations with OnlyEnforceIf() extensions.

Related

Remove the first or last char so the values from a column should start with numbers

I'm new to Pandas and I'd like to ask your advice.
Let's take this dataframe:
df_test = pd.DataFrame({'Dimensions': ['22.67x23.5', '22x24.6', '45x56', 'x23x56.22','46x23x','34x45'],
'Other': [59, 29, 73, 56,48,22]})
I want to detect the lines that starts with "x" (line 4) or ends with "x" (line 5) and then remove them so my dataframe should look like this
Dimensions Other
22.67x23.5 59
22x24.6 29
45x56 73
23x56.22 56
46x23 48
34x45 22
I wanted to create a function and apply it to a column
def remove_x(x):
if (x.str.match('^[a-zA-Z]') == True):
x = x[1:]
return x
if (x.str.match('.*[a-zA-Z]$') == True):
x = x[:-1]
return x
If I apply this function to the column
df_test['Dimensions'] = df_test['Dimensions'].apply(remove_x)
I got an error 'str' object has no attribute 'str'
I delete 'str' from the function and re-run all but no success.
What should I do?
Thank you for any suggestions or if there is another way to do it I'm interested in.
Just use str.strip:
df_test['Dimensions'] = df_test['Dimensions'].str.strip('x')
For general patterns, you can try str.replace:
df_test['Dimensions'].str.replace('(^x)|(x$)','')
Output:
Dimensions Other
0 22.67x23.5 59
1 22x24.6 29
2 45x56 73
3 23x56.22 56
4 46x23 48
5 34x45 22
#QuangHoang's answer is better (for simplicity and efficiency), but here's what went wrong in your approach. In your apply function, you are making calls to accessing the str methods of a Series or DataFrame. But when you call df_test['Dimensions'].apply(remove_x), the values passed to remove_x are the elements of df_test['Dimensions'], aka the str values themselves. So you should construct the function as if x is an incoming str.
Here's how you could implement that (avoiding any regex):
def remove_x(x):
if x[0] == 'x':
return x[1:]
elif x[-1] == 'x':
return x[:-1]
else:
return x
More idiomatically:
def remove_x(x):
x.strip('x')
Or even:
df_test['Dimensions'] = df_test['Dimensions'].apply(lambda x : x.strip('x'))
All that said, better to not use apply and follow the built-ins shown by Quang.

Numba / Numpy - Understanding Error Message

I'm experimenting with Numba to try and speed up a union-find algorithm I'm working on. Here's some example code. When I experiment with some sample data I cannot understand the type complaint that Numba appears to be raising.
from numba import jit
import numpy as np
indices = np.arange(8806806, dtype=np.int64)
sizes = np.ones(8806806, dtype=np.int64)
connected_components = 8806806
#jit(npython=True)
def root(p: int) -> int:
while p != indices[p]:
indices[p] = indices[indices[p]]
p = indices[p]
return p
#jit(npython=True)
def connected( p: int, q: int) -> bool:
return root(p) == root(q)
#jit(npython=True)
def union( p: int, q: int) -> None:
root1 = root(p)
root2 = root(q)
if root1 == root2:
return
if (sizes[root1] < sizes[root2]):
indices[root1] = root2
sizes[root2] += sizes[root1]
else:
indices[root2] = root1
sizes[root1] += sizes[root2]
connected_components -= 1
#jit(nopython=True)
def process_values(arr):
for row in arr:
typed_arr = row.astype('int64')
for first, second in zip(arr, arr[1:]):
union(first, second)
process_values(
np.array(
[np.array([8018361, 4645960]),
np.array([1137555, 7763897]),
np.array([7532943, 2248813]),
np.array([5352737, 71466, 3590473, 5352738, 2712260])], dtype='object'))
I cannot understand this error:
TypingError Traceback (most recent call last)
<ipython-input-45-62735e65f581> in <module>
44 np.array([1137555, 7763897]),
45 np.array([7532943, 2248813]),
---> 46 np.array([5352737, 71466, 3590473, 5352738, 2712260])], dtype='object'))
/opt/conda/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
399 e.patch_message(msg)
400
--> 401 error_rewrite(e, 'typing')
402 except errors.UnsupportedError as e:
403 # Something unsupported is present in the user code, add help info
/opt/conda/lib/python3.7/site-packages/numba/core/dispatcher.py in error_rewrite(e, issue_type)
342 raise e
343 else:
--> 344 reraise(type(e), e, None)
345
346 argtypes = []
/opt/conda/lib/python3.7/site-packages/numba/core/utils.py in reraise(tp, value, tb)
78 value = tp()
79 if value.__traceback__ is not tb:
---> 80 raise value.with_traceback(tb)
81 raise value
82
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type array(pyobject, 1d, C)
[1] During: typing of argument at <ipython-input-45-62735e65f581> (36)
File "<ipython-input-45-62735e65f581>", line 36:
def process_values(arr):
for row in arr:
^
Does this have anything to do with process_values taking an array of irregularly shaped arrays? Any pointers? Thanks!
the problem is that Numba does not accept arrays of dtype 'object'. You seem to be placing arrays inside arrays, you will have to use lists inside lists. Look for the typed.List class in Numba, https://numba.pydata.org/numba-doc/dev/reference/pysupported.html#typed-list
Alternatively, you can use awkward arrays: https://github.com/scikit-hep/awkward-1.0

Numpy , OOP and callables

I'm implementing a Markov Chain Montecarlo with metropolis and barkes alphas for numerical integration. I've created a class called MCMCIntegrator(). I've loaded it with some attributes, one of then is the pdf of the function (a lambda) we're trying to integrate called g.
import numpy as np
import scipy.stats as st
class MCMCIntegrator:
def __init__(self):
self.g = lambda x: st.gamma.pdf(x, 0, 1, scale=1 / 1.23452676)*np.abs(np.cos(1.123454156))
self.size = 10000
self.std = 0.6
self.real_int = 0.06496359
There are other methods in this class, the size is the size of the sample that the class must generate, std is the standard deviation of the Normal Kernel, which you will see in a few seconds. The real_int is the value of the integral from 1 to 2 of the function we're integrating. I've generated it with a R script. Now, to the problem.
def _chain(self, method=None):
"""
Markov chain heat-up with burn-in
:param method: Metrpolis or barker alpha
:return: np.array containing the sample
"""
old = 0
sample = np.zeros(int(self.size * 1.5))
i = 0
if method:
def alpha(a, b): return min(1, self.g(b) / self.g(a))
else:
def alpha(a, b): return self.g(b) / (self.g(a) + self.g(b))
while i != len(sample):
if new < 0:
new = st.norm(loc=old, scale=self.std).rvs()
alpha = alpha(old, new)
u = st.uniform.rvs()
if alpha > u:
sample[i] = new
old = new
i += 1
return np.array(sample)
When I call the _chain() method, this is the following error:
44 while i != len(sample):
45 new = st.norm(loc=old, scale=self.std).rvs()
---> 46 alpha = alpha(old, new)
47 u = st.uniform.rvs()
48
TypeError: 'numpy.float64' object is not callable
alpha returns a nnumpy.float, but I don't know why it's saying it's not callable.
You define a method named alpha based on some condition in an 'early' section of the code:
if method:
def alpha(a, b): return min(1, self.g(b) / self.g(a))
else:
def alpha(a, b): return self.g(b) / (self.g(a) + self.g(b))
and then in the while loop (a 'later' part of the code), you assign the return value of this function to a variable named alpha.
Since the names of these two objects are same, and the variable has been declared later in the code, without the function being re-declared anywhere after this variable creation, the variable replaces the function in the namespace and now you can't make calls to alpha anymore, because it has ceased to be a function.
If it is not a hindrance to your program logic (doesn't seem to be), renaming the variable to some other nice name would be okay.

Binary-search without an explicit array

I want to perform a binary-search using e.g. np.searchsorted, however, I do not want to create an explicit array containing values. Instead, I want to define a function giving the value to be expected at the desired position of the array, e.g. p(i) = i, where i denotes the position within the array.
Generating an array of values regarding the function would, in my case, be neither efficient nor elegant. Is there any way to achieve this?
What about something like:
import collections
class GeneratorSequence(collections.Sequence):
def __init__(self, func, size):
self._func = func
self._len = size
def __len__(self):
return self._len
def __getitem__(self, i):
if 0 <= i < self._len:
return self._func(i)
else:
raise IndexError
def __iter__(self):
for i in range(self._len):
yield self[i]
This would work with np.searchsorted(), e.g.:
import numpy as np
gen_seq = GeneratorSequence(lambda x: x ** 2, 100)
np.searchsorted(gen_seq, 9)
# 3
You could also write your own binary search function, you do not really need NumPy in this case, and it can actually be beneficial:
def bin_search(seq, item):
first = 0
last = len(seq) - 1
found = False
while first <= last and not found:
midpoint = (first + last) // 2
if seq[midpoint] == item:
first = midpoint
found = True
else:
if item < seq[midpoint]:
last = midpoint - 1
else:
first = midpoint + 1
return first
Which gives identical results:
all(bin_search(gen_seq, i) == np.searchsorted(gen_seq, i) for i in range(100))
# True
Incidentally, this is also WAY faster:
gen_seq = GeneratorSequence(lambda x: x ** 2, 1000000)
%timeit np.searchsorted(gen_seq, 10000)
# 1 loop, best of 3: 1.23 s per loop
%timeit bin_search(gen_seq, 10000)
# 100000 loops, best of 3: 16.1 µs per loop
Inspired by #norok2 comment, I think you can use something like this:
def f(i):
return i*2 # Just an example
class MySeq(Sequence):
def __init__(self, f, maxi):
self.maxi = maxi
self.f = f
def __getitem__(self, x):
if x < 0 or x > self.maxi:
raise IndexError()
return self.f(x)
def __len__(self):
return self.maxi + 1
In this case f is your function while maxi is the maximum index. This of course only works if the function f return values in sorted order.
At this point you can use an object of type MySeq inside np.searchsorted.

function' object is not subscriptable", 'occurred at index 0'

I have a dataframe (maple) that, amongst others, has the columns 'THM', which is filled with float64 and 'Season_index', which is filled with int64. The 'THM' column has some missing values, and I want to fill them using the following function:
def fill_thm(cols):
THM = cols[0]
Season_index = cols[1]
if pd.isnull[THM]:
if Season_index == 1:
return 10
elif Season_index == 2:
return 20
elif Season_index == 3:
return 30
else:
return 40
else:
return THM
Then, to apply the function I used
maple['THM']= maple[['THM','Season_index']].apply(fill_thm,axis=1)
But I am getting the ("'function' object is not subscriptable", 'occurred at index 0') error. Anyone has any idea why? Thanks!
Try this:
def fill_thm(THM, S_i):
if pd.isnull[THM]:
if S_i == 1:
return 10
elif S_i == 2:
return 20
elif S_i == 3:
return 30
else:
return 40
else:
return THM
And apply with:
maple.loc[:,'THM'] = maple[['THM','Season_index']].apply(lambda row: pd.Series((fill_thm(row['THM'], row['Season_index']))), axis=1)
Try this code:
def fill(cols):
Age = cols[0]
Pclass=cols[1]
if pd.isnull['Age']:
if Pclass==1:
return 37
elif Pclass==2:
return 30
else:
return 28
else:
return Age
train[:,'Age'] = train[['Age','Pclass']].apply(fill,axis=1)
first of all, when you use apply on a specific column, you need not to specify axis=1.
second, if you are using pandas 0.22, just upgrade to 0.24. It solves all the issues with apply on Dataframes.