Minimize cost based on purchased volume Pyomo - optimization

I'd like to find the optimal solution for buying goods from suppliers where the shipping cost is dependent on the cost of goods bought from given supplier. I'm using Pyomo. My code so far is:
model = ConcreteModel(name="(MN_2)")
# products
N = ['prod1', 'prod2', 'prod3']
# suppliers
M = ['A', 'B']
# price
p = {('prod1', 'A'): 10,
('prod2', 'A'): 9,
('prod3', 'A'): 50,
('prod1', 'B'): 16,
('prod2', 'B'): 20,
('prod3', 'B'): 35}
# user quantity contraint
q_u = {('prod1', 'A'): 2,
('prod2', 'A'): 1,
('prod3', 'A'): 1,
('prod1', 'B'): 1,
('prod2', 'B'): 1,
('prod3', 'B'): 1}
# seller quantity contraint
q_s = {('prod1', 'A'): 20,
('prod2', 'A'): 10,
('prod3', 'A'): 10,
('prod1', 'B'): 10,
('prod2', 'B'): 10,
('prod3', 'B'): 10}
# quantity of product n bough in shop m
model.x = Var(N, M, bounds=(0,10))
def obj_rule(model):
return sum(p[n,m]*model.x[n,m] for n in N for m in M)
model.obj = Objective(rule=obj_rule)
def user_quantity(model, n, m):
return model.x[n,m] >= q_u[n,m]
model.user_quantity = Constraint(N, M, rule=user_quantity)
def seller_quantity(model, n, m):
return model.x[n,m] <= q_s[n,m]
model.seller_quantity = Constraint(N, M, rule=seller_quantity)
solver = SolverFactory('glpk')
solver.solve(model)
model.x.pprint()
What I'm struggling with is how to include the shipping cost that is dependent on the cost of goods bought from given supplier. For example:
For supplier A: shipping cost is =
10 if the sum of costs of products bought from them is <= 100,
0 if the sum of costs of products bought from them is > 100
For supplier B: shipping cost is =
8 if the sum of costs of products bought from them is <= 150,
0 if the sum of costs of products bought from them is > 150

The constraints you're describing are an implementation of if-then condition which is described here. The quirk is that your conditions require the binary variable to be 1 if your procurement costs are less than or equal to some threshold rather than strictly less than the threshold. We can add a very small number (0.0001) to the threshold that doesn't affect adherence to the condition and allow us to use the new value in a strictly less than inequality.
To your initial model, you can add one new binary variable per seller (model.shipping_bin) and one constraint per binary variable that forces the binary variable to be 1 if the cost is less than the threshold and 0 otherwise. We can then multiply by the shipping cost in the objective function by these variables.
# add new binary variables to track costs per supplier
model.shipping_bin = Var(M,within = Binary)
shipping_costs = {'A':10,'B':8}
shipping_thresholds = {'A':100,'B':150} # threshold to meet to not incur shipping
# We need big M values to multiply the binaries to enforce the constraint without constraining the procurement cost incurred.
# We can set to the maximum amount we expect to procure from the seller
# The largest cost you could incur from each seller is
# the price times the max quantity
shipping_big_m = {seller: sum([p[(prod,seller)] * q_s[(prod,seller)] for prod in N])
for seller in M}
# add constraints
def shipping_bin_rule(model,seller):
# Sets shipping binary var to 1 if threshold not met
# Allows it to be 0 otherwise
# 790 * (model.shipping_bin['A']) >= 100.0001 + cost of products from seller 'A'
# if cost of products from 'A' < 100.0001 then binary variable = 1
# 710 * (model.shipping_bin['B']) >= 150.0001 + cost of products from seller 'B'
# if cost of products from 'B' < 150.0001 then binary variable = 1
epsilon = .0001 # to make sure case where cost == threshold is still accounted for
return(shipping_big_m[seller] * model.shipping_bin[seller] >= shipping_thresholds[seller] + epsilon - sum([p[(product,seller)] * model.x[product,seller]
for product in N]))
model.shipping_bin_con = Constraint(M,rule = shipping_bin_rule)
# new objective function adding the shipping cost
def obj_with_shipping_rule(model):
orig_cost = obj_rule(model) # call the original function, but can combine into one function if desired
# apply the shipping cost if cost of products is less than threshold (binary is 0)
shipping_cost = sum([shipping_costs[seller] * model.shipping_bin[seller]
for seller in M])
return(orig_cost + shipping_cost)
# deactivate the original objective to apply the new one
model.obj.deactivate()
model.obj_with_shipping = Objective(rule = obj_with_shipping_rule)
# solve the model with new obj
solver.solve(model)
model.x.pprint() # x values remain unchanged
# x : Size=6, Index=x_index
# Key : Lower : Value : Upper : Fixed : Stale : Domain
# ('prod1', 'A') : 0 : 2.0 : 10 : False : False : Reals
# ('prod1', 'B') : 0 : 1.0 : 10 : False : False : Reals
# ('prod2', 'A') : 0 : 1.0 : 10 : False : False : Reals
# ('prod2', 'B') : 0 : 1.0 : 10 : False : False : Reals
# ('prod3', 'A') : 0 : 1.0 : 10 : False : False : Reals
# ('prod3', 'B') : 0 : 1.0 : 10 : False : False : Reals
# cost from A = 2 * 10 + 1 * 9 + 1 * 50 = 79 < 100 so model.shipping_bin['A'] = 1
# cost from B = 1 * 16 + 1 * 20 + 1 * 35 = 71 < 150 so model.shipping_bin['B'] = 1
model.shipping_bin.pprint()
# shipping_bin : Size=2, Index=shipping_bin_index
# Key : Lower : Value : Upper : Fixed : Stale : Domain
# A : 0 : 1.0 : 1 : False : False : Binary
# B : 0 : 1.0 : 1 : False : False : Binary
value(model.obj_with_shipping) # 168 (18 units larger than original because of shipping)

Related

Comparison of values in Dataframes with different size

I have a DataFrame in which I want to compare the speed of certain IDs at different conditions.
Boundary conditions:
IDs do not have to be represented in every condition,
ID is not represented in every condition with the same frequency.
My goal is to assign whether the speed remained
larger (speed > than speed in CondA +10%),
smaller ((speed < than speed in CondA -10%)) or
the same (speed < than speed in CondA +10%) & (speed > than speed in CondA -10%))
depending on the condition.
The data
import numpy as np
import pandas as pd
data1 = {
'ID' : [1, 1, 1, 2, 3, 3, 4, 5],
'Condition' : ['Cond_A', 'Cond_A', 'Cond_A', 'Cond_A', 'Cond_A', 'Cond_A','Cond_A','Cond_A', ],
'Speed' : [1.2, 1.05, 1.2, 1.3, 1.0, 0.85, 1.1, 0.85],
}
df1 = pd.DataFrame(data1)
data2 = {
'ID' : [1, 2, 3, 4, 5, 6],
'Condition' : ['Cond_B', 'Cond_B', 'Cond_B', 'Cond_B', 'Cond_B', 'Cond_B' ],
'Speed' : [0.8, 0.55, 0.7, 1.15, 1.2, 1.4],
}
df2 = pd.DataFrame(data2)
data3 = {
'ID' : [1, 2, 3, 4, 6],
'Condition' : ['Cond_C', 'Cond_C', 'Cond_C', 'Cond_C', 'Cond_C' ],
'Speed' : [1.8, 0.99, 1.7, 131, 0.2, ],
}
df3 = pd.DataFrame(data3)
lst_of_dfs = [df1,df2, df3]
# creating a Dataframe object
data = pd.concat(lst_of_dfs)
My goal is to archive a result like this
Condition ID Speed Category
0 Cond_A 1 1.150 NaN
1 Cond_A 2 1.300 NaN
2 Cond_A 3 0.925 NaN
3 Cond_A 4 1.100 NaN
4 Cond_A 5 0.850 NaN
5 Cond_B 1 0.800 faster
6 Cond_B 2 0.550 slower
7 Cond_B 3 0.700 slower
8 Cond_B 4 1.150 equal
...
My attempt:
Calculate average of speed for each ID per condition
data = data.groupby(["Condition", "ID"]).mean()["Speed"].reset_index()
Definition of thresholds. Assuming I want to realize thresholds up to 10 percent around the CondA-Values
threshold_upper = data.loc[(data.Condition == 'CondA')]['Speed'] + (data.loc[(data.Condition == 'CondA')]['Speed']*10/100)
threshold_lower = data.loc[(data.Condition == 'CondA')]['Speed'] - (data.loc[(data.Condition == 'CondA')]['Speed']*10/100)
Mapping strings 'faster', 'equal', 'slower' based on condition using numpy select.
conditions = [
(data.loc[(data.Condition == 'CondB')]['Speed'] > threshold_upper), #check whether Speed of each ID in CondB is faster than Speed in CondA+10%
(data.loc[(data.Condition == 'CondC')]['Speed'] > threshold_upper), #check whether Speed of each ID in CondC is faster than Speed in CondA+10%
((data.loc[(data.Condition == 'CondB')]['Speed'] < threshold_upper) & (data.loc[(data.Condition == 'CondB')]['Speed'] > threshold_lower)), #check whether Speed of each ID in CondB is slower than Speed in CondA+10% AND faster than Speed in CondA-10%
((data.loc[(data.Condition == 'CondC')]['Speed'] < threshold_upper) & (data.loc[(data.Condition == 'CondC')]['Speed'] > threshold_lower)), #check whether Speed of each ID in CondC is slower than Speed in CondA+10% AND faster than Speed in CondA-10%
(data.loc[(data.Condition == 'CondB')]['Speed'] < threshold_upper), #check whether Speed of each ID in CondB is slower than Speed in CondA-10%
(data.loc[(data.Condition == 'CondC')]['Speed'] < threshold_upper), #check whether Speed of each ID in CondC is faster than Speed in CondA-10%
]
values = [
'faster',
'faster',
'equal',
'equal',
'slower',
'slower'
]
data['Category'] = np.select(conditions, values)
Produces this error: <ValueError: Length of values (0) does not match length of index (16)>
My data frames unfortunately have a different length (since not all IDs performed all trials to each condition). I appreciate any hint. Many thanks in advance.
# Dataframe created
data
ID Condition Speed
0 1 Cond_A 1.20
1 1 Cond_A 1.05
2 1 Cond_A 1.20
# Reset the index
data = data.reset_index(drop=True)
# Creating based on ID
data['group'] = data.groupby(['ID']).ngroup()
# Creating functions which returns the upper and lower limit of speed
def lowlimit(x):
return x[x['Condition']=='Cond_A'].Speed.mean() * 0.9
def upperlimit(x):
return x[x['Condition']=='Cond_A'].Speed.mean() * 1.1
# Calculate the upperlimit and lowerlimit for the groups
df = pd.DataFrame()
df['ul'] = data.groupby('group').apply(lambda x: upperlimit(x))
df['ll'] = data.groupby('group').apply(lambda x: lowlimit(x))
# reseting the index
# So that we can merge the values of 'group' column
df = df.reset_index()
# Merging the data and df dataframe
data_new = pd.merge(data,df,on='group',how='left')
data_new
ID Condition Speed group ul ll
0 1 Cond_A 1.20 0 1.2650 1.0350
1 1 Cond_A 1.05 0 1.2650 1.0350
2 1 Cond_A 1.20 0 1.2650 1.0350
3 2 Cond_A 1.30 1 1.4300 1.1700
Now we have to apply the conditions
data_new.loc[(data_new['Speed'] >= data_new['ul']) & (data_new['Condition'] != 'Cond_A'),'Category'] = 'larger'
data_new.loc[(data_new['Speed'] <= data_new['ll']) & (data_new['Condition'] != 'Cond_A'),'Category'] = 'smaller'
data_new.loc[(data_new['Speed'] < data_new['ul']) & (data_new['Speed'] > data_new['ll']) & (data_new['Condition'] != 'Cond_A'),'Category'] = 'Same'
Here is the output
You can drop the other columns now, if you want
data_new = data_new.drop(columns=['group','ul','ll'])

How to index the unique value count in numpy? [duplicate]

Consider the following lists short_list and long_list
short_list = list('aaabaaacaaadaaac')
np.random.seed([3,1415])
long_list = pd.DataFrame(
np.random.choice(list(ascii_letters),
(10000, 2))
).sum(1).tolist()
How do I calculate the cumulative count by unique value?
I want to use numpy and do it in linear time. I want this to compare timings with my other methods. It may be easiest to illustrate with my first proposed solution
def pir1(l):
s = pd.Series(l)
return s.groupby(s).cumcount().tolist()
print(np.array(short_list))
print(pir1(short_list))
['a' 'a' 'a' 'b' 'a' 'a' 'a' 'c' 'a' 'a' 'a' 'd' 'a' 'a' 'a' 'c']
[0, 1, 2, 0, 3, 4, 5, 0, 6, 7, 8, 0, 9, 10, 11, 1]
I've tortured myself trying to use np.unique because it returns a counts array, an inverse array, and an index array. I was sure I could these to get at a solution. The best I got is in pir4 below which scales in quadratic time. Also note that I don't care if counts start at 1 or zero as we can simply add or subtract 1.
Below are some of my attempts (none of which answer my question)
%%cython
from collections import defaultdict
def get_generator(l):
counter = defaultdict(lambda: -1)
for i in l:
counter[i] += 1
yield counter[i]
def pir2(l):
return [i for i in get_generator(l)]
def pir3(l):
return [i for i in get_generator(l)]
def pir4(l):
unq, inv = np.unique(l, 0, 1, 0)
a = np.arange(len(unq))
matches = a[:, None] == inv
return (matches * matches.cumsum(1)).sum(0).tolist()
setup
short_list = np.array(list('aaabaaacaaadaaac'))
functions
dfill takes an array and returns the positions where the array changes and repeats that index position until the next change.
# dfill
#
# Example with short_list
#
# 0 0 0 3 4 4 4 7 8 8 8 11 12 12 12 15
# [ a a a b a a a c a a a d a a a c]
#
# Example with short_list after sorting
#
# 0 0 0 0 0 0 0 0 0 0 0 0 12 13 13 15
# [ a a a a a a a a a a a a b c c d]
argunsort returns the permutation necessary to undo a sort given the argsort array. The existence of this method became know to me via this post.. With this, I can get the argsort array and sort my array with it. Then I can undo the sort without the overhead of sorting again.
cumcount will take an array sort it, find the dfill array. An np.arange less dfill will give me cumulative count. Then I un-sort
# cumcount
#
# Example with short_list
#
# short_list:
# [ a a a b a a a c a a a d a a a c]
#
# short_list.argsort():
# [ 0 1 2 4 5 6 8 9 10 12 13 14 3 7 15 11]
#
# Example with short_list after sorting
#
# short_list[short_list.argsort()]:
# [ a a a a a a a a a a a a b c c d]
#
# dfill(short_list[short_list.argsort()]):
# [ 0 0 0 0 0 0 0 0 0 0 0 0 12 13 13 15]
#
# np.range(short_list.size):
# [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
#
# np.range(short_list.size) -
# dfill(short_list[short_list.argsort()]):
# [ 0 1 2 3 4 5 6 7 8 9 10 11 0 0 1 0]
#
# unsorted:
# [ 0 1 2 0 3 4 5 0 6 7 8 0 9 10 11 1]
foo function recommended by #hpaulj using defaultdict
div function recommended by #Divakar (old, I'm sure he'd update it)
code
def dfill(a):
n = a.size
b = np.concatenate([[0], np.where(a[:-1] != a[1:])[0] + 1, [n]])
return np.arange(n)[b[:-1]].repeat(np.diff(b))
def argunsort(s):
n = s.size
u = np.empty(n, dtype=np.int64)
u[s] = np.arange(n)
return u
def cumcount(a):
n = a.size
s = a.argsort(kind='mergesort')
i = argunsort(s)
b = a[s]
return (np.arange(n) - dfill(b))[i]
def foo(l):
n = len(l)
r = np.empty(n, dtype=np.int64)
counter = defaultdict(int)
for i in range(n):
counter[l[i]] += 1
r[i] = counter[l[i]]
return r - 1
def div(l):
a = np.unique(l, return_counts=1)[1]
idx = a.cumsum()
id_arr = np.ones(idx[-1],dtype=int)
id_arr[0] = 0
id_arr[idx[:-1]] = -a[:-1]+1
rng = id_arr.cumsum()
return rng[argunsort(np.argsort(l))]
demonstration
cumcount(short_list)
array([ 0, 1, 2, 0, 3, 4, 5, 0, 6, 7, 8, 0, 9, 10, 11, 1])
time testing
code
functions = pd.Index(['cumcount', 'foo', 'foo2', 'div'], name='function')
lengths = pd.RangeIndex(100, 1100, 100, 'array length')
results = pd.DataFrame(index=lengths, columns=functions)
from string import ascii_letters
for i in lengths:
a = np.random.choice(list(ascii_letters), i)
for j in functions:
results.set_value(
i, j,
timeit(
'{}(a)'.format(j),
'from __main__ import a, {}'.format(j),
number=1000
)
)
results.plot()
Here's a vectorized approach using custom grouped range creating function and np.unique for getting the counts -
def grp_range(a):
idx = a.cumsum()
id_arr = np.ones(idx[-1],dtype=int)
id_arr[0] = 0
id_arr[idx[:-1]] = -a[:-1]+1
return id_arr.cumsum()
count = np.unique(A,return_counts=1)[1]
out = grp_range(count)[np.argsort(A).argsort()]
Sample run -
In [117]: A = list('aaabaaacaaadaaac')
In [118]: count = np.unique(A,return_counts=1)[1]
...: out = grp_range(count)[np.argsort(A).argsort()]
...:
In [119]: out
Out[119]: array([ 0, 1, 2, 0, 3, 4, 5, 0, 6, 7, 8, 0, 9, 10, 11, 1])
For getting the count, few other alternatives could be proposed with focus on performance -
np.bincount(np.unique(A,return_inverse=1)[1])
np.bincount(np.fromstring('aaabaaacaaadaaac',dtype=np.uint8)-97)
Additionally, with A containing single-letter characters, we could get the count simply with -
np.bincount(np.array(A).view('uint8')-97)
Besides defaultdict there are a couple of other counters. Testing a slightly simpler case:
In [298]: from collections import defaultdict
In [299]: from collections import defaultdict, Counter
In [300]: def foo(l):
...: counter = defaultdict(int)
...: for i in l:
...: counter[i] += 1
...: return counter
...:
In [301]: short_list = list('aaabaaacaaadaaac')
In [302]: foo(short_list)
Out[302]: defaultdict(int, {'a': 12, 'b': 1, 'c': 2, 'd': 1})
In [303]: Counter(short_list)
Out[303]: Counter({'a': 12, 'b': 1, 'c': 2, 'd': 1})
In [304]: arr=[ord(i)-ord('a') for i in short_list]
In [305]: np.bincount(arr)
Out[305]: array([12, 1, 2, 1], dtype=int32)
I constructed arr because bincount only works with ints.
In [306]: timeit np.bincount(arr)
The slowest run took 82.46 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.63 µs per loop
In [307]: timeit Counter(arr)
100000 loops, best of 3: 13.6 µs per loop
In [308]: timeit foo(arr)
100000 loops, best of 3: 6.49 µs per loop
I'm guessing it would hard to improve on pir2 based on default_dict.
Searching and counting like this are not a strong area for numpy.

Pulp solves multiple combinatorial problems

As the title says,
I want to solve a problem similar to the summation of multiple schemes into a fixed constant, However, when I suggest the constrained optimization model, I can't get all the basic schemes well. Part of the opinion is to add a constraint when I get a solution. However, the added constraint leads to incomplete solution and no addition leads to a dead cycle.
Here is my problem description
I have a list of benchmark data detail_list ,My goal is to select several numbers from the benchmark data list(detail_list), but not all of them, so that the sum of these data can reach the sum of the number(plan_amount) I want.
For Examle
detail_list = [50, 100, 80, 40, 120, 25],
plan_amount = 20,
The feasible schemes are:
detail_list[2]=20 can be satisfied, detail_list[1](noly 10) + detail_list[3](only 10) = plan_amount(20) , detail_list[1](only 5) + detail_list[3](only 15) = plan_amount(20) also can be satisfied, and detail_list1 + detail_list2 + detail_list3 = plan_amount(20). But you can't take four elements in the detail_list are combined, because number = 3, indicating that a maximum of three elements are allowed to be combined.
from pulp import *
num = 6 # the list max length
number_max = 3 # How many combinations can there be at most
plan_amount = 20
detail_list = [50, 100, 80, 40, 120, 25] # Basic data
plan_model = LpProblem("plan_model")
alpha = [LpVariable("alpha_{0}".format(i+1), cat="Binary") for i in range(num)]
upBound_num = [int(detail_list_money) for detail_list_money in detail_list]
num_channel = [
LpVariable("fin_money_{0}".format(i+1), lowBound=0, upBound=upBound_num[i], cat="Integer") for i
in range(num)]
plan_model += lpSum(num_channel) == plan_amount
plan_model += lpSum(alpha) <= number_max
for i in range(num):
plan_model += num_channel[i] >= alpha[i] * 5
plan_model += num_channel[i] <= alpha[i] * detail_list[i]
plan_model.writeLP("2222.lp")
test_dd = open("2222.txt", "w", encoding="utf-8")
i = 0
while True:
plan_model.solve()
if LpStatus[plan_model.status] == "Optimal":
test_dd.write(str(i + 1) + "times result\n")
for v in plan_model.variables():
test_dd.write(v.name + "=" + str(v.varValue))
test_dd.write("\n")
test_dd.write("============================\n\n")
alpha_0_num = 0
alpha_1_num = 0
for alpha_value in alpha:
if value(alpha_value) == 0:
alpha_0_num += 1
if value(alpha_value) == 1:
alpha_1_num += 1
plan_model += (lpSum(
alpha[k] for k in range(num) if value(alpha[k]) == 1)) <= alpha_1_num - 1
plan_model.writeLP("2222.lp")
i += 1
else:
break
test_dd.close()
I don't know how to change my constraints to achieve this goal. Can you help me

unit commitment constraint pyomo

I currently try to use this unit commitment example to build my own model with pyomo. After defining switch-on and switch-off variables I struggle to implement the following equation:Equation
The yalmip example is pretty straight forward:
for k = 2:Horizon
for unit = 1:Nunits
% indicator will be 1 only when switched on
indicator = onoff(unit,k)-onoff(unit,k-1);
range = k:min(Horizon,k+minup(unit)-1);
% Constraints will be redundant unless indicator = 1
Constraints = [Constraints, onoff(unit,range) >= indicator];
end
end
Right now I am only looking into one unit, which gives me this model.
model = ConcreteModel()
p = prices
ts = timesteps
ut = min_uptime1
model.x = Var(ts, within = Binary) #onoff
model.v = Var(ts, within = Binary) #switch_on
model.w = Var(ts, within = Binary) #switch_off
def obj_rule(model):
return sum(p[t] * model.x[t] - 0.001 * (model.v[t] + model.w[t]) for t in ts)
model.revenue = Objective(rule = obj_rule, sense = maximize)
#start-up, shut-down costs will be added
def daily_uptime_rule (model):
return sum(model.x[t] for t in ts) == 12
model.daily_uptime_rule = \
Constraint(rule = daily_uptime_rule)
def switch_on(model, t):
if t == ts[0]:
return model.v[t] >= 1 - (1 - model.x[t])
else:
return model.v[t] >= 1 - model.x[t-1] - (1 - model.x[t])
model.switch_on = \
Constraint(ts, rule = switch_on)
def switch_off(model, t):
if t == ts[23]:
return model.w[t] >= model.x[t]
else:
return model.w[t] >= 1 - model.x[t+1] + (model.x[t] - 1)
model.switch_off = \
Constraint(ts, rule = switch_off)
def min_ut(model, t):
a = list(range(t, (min(ts[23], t+ut-1)+1)))
for i in a:
return model.x[i] >= model.v[t]
model.min_ut = \
Constraint(ts, rule = min_ut)
My problem here is, that i can't access the variable x the same way in pyomo. For every timestep t we need constraints for t+1, t+2, .. t+min_up -1, but I can't use ranges with variables (model.x). Can I use the yalmip example in pyomo or do i need a new formulation?
Ok, so it seems the fundamental issue here is that the index of summation that you would like to do is dependent on the RHS of the inequality. You can construct the indices of the summation in a couple ways. You just need to be careful that the values you construct are valid. Here is an idea that might help you. This toy model tries to maximize the sum of x[t], but limits x[t] <= x[t-1] + x[t-2] just for giggles. Note the construction of the summation range "on the fly" from the passed value of t:
from pyomo.environ import *
m = ConcreteModel()
m.t = Set(initialize=range(5))
m.x = Var(m.t)
# constrain x_t to be less than the sum of x_(t-1), x_(t-2)
def x_limiter(m, t):
if t==0:
return m.x[t] <= 1 # limit the first value
# upperlimit on remainder is sum of previous 2
return sum(m.x[i] for i in range(t-2, t) if i in m.t) >= m.x[t]
m.c1 = Constraint(m.t, rule=x_limiter)
# try to maximize x
m.OBJ = Objective(expr=sum(m.x[t] for t in m.t), sense=maximize)
solver = SolverFactory('glpk')
m.pprint()
solver.solve(m)
m.display()
It gets the job done:
Variables:
x : Size=5, Index=t
Key : Lower : Value : Upper : Fixed : Stale : Domain
0 : None : 1.0 : None : False : False : Reals
1 : None : 1.0 : None : False : False : Reals
2 : None : 2.0 : None : False : False : Reals
3 : None : 3.0 : None : False : False : Reals
4 : None : 5.0 : None : False : False : Reals
Objectives:
OBJ : Size=1, Index=None, Active=True
Key : Active : Value
None : True : 12.0
This recent post also has a similar idea:
Pyomo creating a variable time index

How to find word frequency per country list in pandas?

Let's say I have a .CSV which has three columns: tidytext, location, vader_senti
I was already able to get the amount of *positive, neutral and negative text instead of word* pero country using the following code:
data_vis = pd.read_csv(r"csviamcrpreprocessed.csv", usecols=fields)
def print_sentiment_scores(text):
vadersenti = analyser.polarity_scores(str(text))
return pd.Series([vadersenti['pos'], vadersenti['neg'], vadersenti['neu'], vadersenti['compound']])
data_vis[['vadersenti_pos', 'vadersenti_neg', 'vadersenti_neu', 'vadersenti_compound']] = data_vis['tidytext'].apply(print_sentiment_scores)
data_vis['vader_senti'] = 'neutral'
data_vis.loc[data_vis['vadersenti_compound'] > 0.3 , 'vader_senti'] = 'positive'
data_vis.loc[data_vis['vadersenti_compound'] < 0.23 , 'vader_senti'] = 'negative'
data_vis['vader_possentiment'] = 0
data_vis.loc[data_vis['vadersenti_compound'] > 0.3 , 'vader_possentiment'] = 1
data_vis['vader_negsentiment'] = 0
data_vis.loc[data_vis['vadersenti_compound'] <0.23 , 'vader_negsentiment'] = 1
data_vis['vader_neusentiment'] = 0
data_vis.loc[(data_vis['vadersenti_compound'] <=0.3) & (data_vis['vadersenti_compound'] >=0.23) , 'vader_neusentiment'] = 1
sentimentbylocation = data_vis.groupby(["Location"])['vader_senti'].value_counts()
sentimentbylocation
sentimentbylocation gives me the following results:
Location vader_senti
Afghanistan negative 151
positive 25
neutral 2
Albania negative 6
positive 1
Algeria negative 116
positive 13
neutral 4
TO GET THE MOST COMMON POSITIVE WORDS, I USED THIS CODE:
def process_text(text):
tokens = []
for line in text:
toks = tokenizer.tokenize(line)
toks = [t.lower() for t in toks if t.lower() not in stopwords_list]
tokens.extend(toks)
return tokens
tokenizer=TweetTokenizer()
punct = list(string.punctuation)
stopwords_list = stopwords.words('english') + punct + ['rt','via','...','…','’','—','—:',"‚","â"]
pos_lines = list(data_vis[data_vis.vader_senti == 'positive'].tidytext)
pos_tokens = process_text(pos_lines)
pos_freq = nltk.FreqDist(pos_tokens)
pos_freq.most_common()
Running this will give me the most common words and the number of times they appeared, such as
[(good, 1212),
(amazing, 123)
However, what I want to see is how many of these positive words appeared in a country.
For example:
I have a sample CSV here: https://drive.google.com/file/d/112k-6VLB3UyljFFUbeo7KhulcrMedR-l/view?usp=sharing
Create a column for each most_common word, then do a groupby location and use agg to apply a sum for each count:
words = [i[0] for i in pos_freq.most_common()]
# lowering all cases in tidytext
data_vis.tidytext = data_vis.tidytext.str.lower()
for i in words:
data_vis[i] = data_vis.tidytext.str.count(i)
funs = {i: 'sum' for i in words}
grouped = data_vis.groupby('Location').agg(funs)
Based on the example from the CSV and using most_common as ['good', 'amazing'] the result would be:
grouped
# good amazing
# Location
# Australia 0 1
# Belgium 6 4
# Japan 2 1
# Thailand 2 0
# United States 1 0