Initializing variables in gekko using the array model function - optimization

Defining an array of Gekko variables does not allow any arguments to initialize the variables. For example, I am unable to make an array of integer variables using the m.Array function.
I can make an array of variables using this syntax: m.Array(m.Var, (42, 42)). However, I don't know how to make this array an array of integer variables because the m.Var passed in to the m.Array function does not take any arguments.
I have a single variable as an integer variable:
my_var_is_an_integer_var = m.Var(0, lb=0, ub=1, integer=True)
I have an array of variables that are not integer variables:
my_array_vars_are_not_integer_vars = m.Array(m.Var, (42, 42))
I want an array of integer variables: my_array_vars_are_integer_vars = m.Array(m.Var(0, lb=0, ub=1, integer=True), (42,42)) (Throws error)
HOW DO I INITIALIZE THE VARIABLES IN THE ARRAY TO BE INTEGER VARIABLES???
Error when trying to initialize array as integer variables:
Traceback (most recent call last):
File "integer_array.py", line 7, in <module>
my_array_vars_are_not_integer_vars = m.Array(m.Var(0, lb=0, ub=1,
integer=True), (42,42))
File "C:\Users\wills\Anaconda3\lib\site-packages\gekko\gekko.py", line
1831, in Array
i[...] = f(**args)
TypeError: 'GKVariable' object is not callable

If you need to pass additional arguments when creating a variable array, you can use one of the following options. Option 1 creates a Numpy array while Options 2 and 3 create a Python list.
Option 1 (Preferred)
Create a numpy array with the m.Array function with additional argument integer=True:
y = m.Array(m.Var,(42,42),lb=0,ub=1,integer=True)
Option 2
Create a 2D list of variables with a list comprehension:
y = [[m.Var(lb=0,ub=1,integer=True) for i in range(42)] for j in range(42)]
Option 3
Alternatively, you can create an empty list (y) and append binary values to that list.
y = [[None]*42]*42
for i in range(42):
for j in range(42):
y[i][j] = m.Var(lb=0,ub=1,integer=True)
The UPPER and LOWER bounds can be changed after the variable creation but the integer option is only available at initialization. Don't forget to switch to the APOPT MINLP solver for integer variable solutions with m.options.SOLVER = 1. Below is a complete example that uses all three options but with a 3x4 array for x, y, and z.
from gekko import GEKKO
import numpy as np
m = GEKKO()
# option 1
x = m.Array(m.Var,(3,4),lb=0,ub=1,integer=True)
# option 2
y = [[m.Var(lb=0,ub=1,integer=True) for i in range(4)] for j in range(3)]
# option 3
z = [[None]*4]*3
for i in range(3):
for j in range(4):
z[i][j] = m.Var(lb=0,ub=1,integer=True)
# switch to APOPT
m.options.SOLVER = 1
# define objective function
m.Minimize(m.sum(m.sum(x)))
m.Minimize(m.sum(m.sum(np.array(y))))
m.Minimize(m.sum(m.sum(np.array(z))))
# define equation
m.Equation(x[1,2]==0)
m.Equation(m.sum(x[:,0])==2)
m.Equation(m.sum(x[:,1])==3)
m.Equation(m.sum(x[2,:])==1)
m.solve(disp=True)
print(x)
The objective is to minimize sum of all the elements in x, y, and z but there are certain constraints on an element, row, and columns of x. The solution is:
[[[1.0] [1.0] [0.0] [0.0]]
[[1.0] [1.0] [0.0] [0.0]]
[[0.0] [1.0] [0.0] [0.0]]]

Related

different dimension numpy array broadcasting issue with '+=' operator

I'm new to numpy, and I have an interesting observation on the broadcasting. When I'm adding a 3x5 array directly to a 3x1 array, and update the original 3x1 array with the result, there is no broadcasting issue.
import numpy as np
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(f'init = \n {total}')
for i in range(3):
total = total + np.ones(shape=(3,5))
print(f'total_{i} = \n {total}')
However, if i'm using '+=' operator to increment the 3x1 array with the value of 3x5 array, there is a broadcasting issue. May I know which rule of numpy broadcasting did I violate in the latter case?
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(f'init = \n {total}')
for i in range(3):
total += np.ones(shape=(3,5))
print(f'total_{i} = \n {total}')
Thank you!
hawkoli1987
according to add function overridden in numpy array,
def add(x1, x2, *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__
"""
add(x1, x2, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])
Add arguments element-wise.
Parameters
----------
x1, x2 : array_like
The arrays to be added.
If ``x1.shape != x2.shape``, they must be broadcastable to a common
shape (which becomes the shape of the output).
out : ndarray, None, or tuple of ndarray and None, optional
A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
add function returns a freshly-allocated array when dimensions of arrays are different.
In python, a=a+b and a+=b aren't absolutly same. + calls __add__ function and += calls __iadd__.
a = np.array([1, 2])
b = np.array([3, 4])
first_id = id(a)
a = a + b
second_id = id(a)
assert first_id == second_id # False
a = np.array([1, 2])
b = np.array([3, 4])
first_id = id(a)
a += b
second_id = id(a)
assert first_id == second_id # True
+= function does not create new objects and updates the value to the same address.
numpy add function updates an existing instance when adding an array of the same dimensions, but returns a new object when adding arrays of different dimensions. So when use += functions, the two functions must have the same dimension because the results of the add function must be updated on the same object.
For example,
a = np.array()
total = np.random.uniform(-1,1, size=(3))[:,np.newaxis]
print(id(total))
for i in range(3):
total += np.ones(shape=(3,1))
print(id(total))
id(total) are all same because add function just updates the instance in same address because dimmension of two arrays are same.
In [29]: arr = np.zeros((1,3))
The actual error message is:
In [30]: arr += np.ones((2,3))
Traceback (most recent call last):
Input In [30] in <cell line: 1>
arr += np.ones((2,3))
ValueError: non-broadcastable output operand with shape (1,3) doesn't match the broadcast shape (2,3)
I read that as say that arr on the left is "non-broadcastable", where as arr+np.ones((2,3)) is the result of broadcasting. The wording may be awkward; it's probably produced in some deep compiled function where it makes more sense.
We get a variant on this when we try to assign an array to a slice of an array:
In [31]: temp = arr + np.ones((2,3))
In [32]: temp.shape
Out[32]: (2, 3)
In [33]: arr[:] = temp
Traceback (most recent call last):
Input In [33] in <cell line: 1>
arr[:] = temp
ValueError: could not broadcast input array from shape (2,3) into shape (1,3)
This is clearer, saying that the RHS (2,3) cannot be put into the LHS (1,3) slot.
Or trying to put the (2,3) into one "row" of arr:
In [35]: arr[0] = temp
Traceback (most recent call last):
Input In [35] in <cell line: 1>
arr[0] = temp
ValueError: could not broadcast input array from shape (2,3) into shape (3,)
arr[0] = arr works because it tries to put a (1,3) into a (3,) shape - that's a workable broadcasting combination.
arr[0] = arr.T tries to put a (3,1) into a (3,), and fails.

How do I efficiently compute the gyration tensor in numpy?

The gyration tensor of a set of N points in 3d space is defined as
assuming the condition
.
How do I compute this in numpy without using an explicit for loop? I know that I can just do something like
import numpy as np
def calculate_gyration_tensor(points):
'''
Calculates the gyration tensor of a set of points.
'''
COM = centre_of_mass(points)
gyration_tensor = np.zeros((3, 3))
for p in points:
gyration_tensor += np.outer(p-COM, p-COM)
return gyration_tensor / len(points)
but this quickly becomes inefficient for large N, because of the for loop. Is there a better way to do it?
You can do with np.einsum like this:
def gyration(points):
'''
Calculate the gyrason tensor
points : numpy array of shape N x 3
'''
center = points.mean(0)
# normalized points
normed_points = points - center[None,:]
return np.einsum('im,in->mn', normed_points,normed_points)/len(points)
# test
points = np.arange(36).reshape(12,3)
gyration(points)
Output:
array([[107.25, 107.25, 107.25],
[107.25, 107.25, 107.25],
[107.25, 107.25, 107.25]])

I am trying to take an 1D slice from 2D numpy array, but something goes wrong

I am trying to filter evident measurement mistakes from my data using the 3-sigma rule. x is a numpy array of measurement points and y is an arrray of measured values. To remove wrong points from my data, I zip x.tolist() and y.tolist(), then filter by the second element of each tuple, then I need to convert my zip back into two lists. I tried to first covert my list of tuples into a list of lists, then convert it to numpy 2D array and then take two 1D-slices of it. It looks like the first slice is correct, but then it outputs the following:
x = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 0]
IndexError: too many indices for array
I don't understand what am I doing wrong. Here's the code:
x = np.array(readCol(0, l))
y = np.array(readCol(1, l))
n = len(y)
stdev = np.std(y)
mean = np.mean(y)
print("Stdev is: " + str(stdev))
print("Mean is: " + str(mean))
def flt(n):
global mean
global stdev
global x
if abs(n[1] - mean) < 3*stdev:
return True
else:
print('flt function finds an error: ' + str(n[1]))
return False
def filtration(N):
print(Fore.RED + 'Filtration function launched')
global y
global x
global stdev
global mean
zap = zip(x.tolist(), y.tolist())
for i in range(N):
print(Fore.RED + ' Filtration step number ' + str(i) + Style.RESET_ALL)
y = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 1]
print(Back.GREEN + 'This is y: \n' + Style.RESET_ALL)
print(y)
x = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 0]
print(Back.GREEN + 'This is x: \n' + Style.RESET_ALL)
print(x)
print('filtration fuction main step')
stdev = np.std(y)
print('second step')
mean = np.mean(y)
print('third step')
Have you tried to test the problem line step by step?
x = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 0]
for example:
temp = np.array(list(map(list, list(filter(flt, list(zap))))))
print(temp.shape, temp.dtype)
x = temp[:, 0]
Further break down might be needed, but since [:,0] is the only indexing operation in this line, I'd start there.
Without further study of the code and/or some examples, I'm not going to try to speculate what the nested lists are doing.
The error sounds like temp is not 2d, contrary to your expectations. That could be because temp is object dtype, and composed of lists the vary in length. That seems to be common problem when people make arrays from downloaded databases.

How to optimize the linear coefficients for numpy arrays in a maximization function?

I have to optimize the coefficients for three numpy arrays which maximizes my evaluation function.
I have a target array called train['target'] and three predictions arrays named array1, array2 and array3.
I want to put the best linear coefficients i.e., x,y,z for these three arrays which will maximize the function
roc_aoc_curve(train['target'], xarray1 + yarray2 +z*array3)
the above function would be maximum when prediction is closer to the target.
i.e, xarray1 + yarray2 + z*array3 should be closer to train['target'].
The range of x,y,z >=0 and x,y,z <= 1
Basically I am trying to put the weights x,y,z for each of the three arrays which would make the function
xarray1 + yarray2 +z*array3 closer to the train['target']
Any help in getting this would be appreciated.
I used pulp.LpProblem('Giapetto', pulp.LpMaximize) to do the maximization. It works for normal numbers, integers etc, however failing while trying to do with arrays.
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
prob += score
coef = x+y+z
prob += (coef==1)
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Getting error at the line
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
TypeError: unsupported operand type(s) for /: 'int' and 'LpVariable'
Can't progress beyond this line when using arrays. Not sure if my approach is correct. Any help in optimizing the function would be appreciated.
When you add sums of array elements to a PuLP model, you have to use built-in PuLP constructs like lpSum to do it -- you can't just add arrays together (as you discovered).
So your score definition should look something like this:
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
A few notes about this:
[+] You didn't provide the definition of roc_auc_score so I just pretended that it equals the sum of the element-wise difference between the target array and the weighted sum of the other 3 arrays.
[+] I suspect your actual calculation for roc_auc_score is nonlinear; more on this below.
[+] arr_ind is a list of the indices of the arrays, which I created like this:
# build array index
arr_ind = range(len(array1))
[+] You also didn't include the arrays, so I created them like this:
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
train = {}
train['target'] = np.ones((10, 1))
Here is my complete code, which compiles and executes, though I'm sure it doesn't give you the result you are hoping for, since I just guessed about target and roc_auc_score:
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# dummy arrays since arrays weren't in OP code
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
# build array index
arr_ind = range(len(array1))
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
# dummy roc_auc_score since roc_auc_score wasn't in OP code
train = {}
train['target'] = np.ones((10, 1))
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
prob += score
coef = x + y + z
prob += coef == 1
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Output:
Optimal weekly number of x to produce: 0
Optimal weekly number of y to produce: 0
Optimal weekly number of z to produce: 1
Process finished with exit code 0
Now, if your roc_auc_score function is nonlinear, you will have additional troubles. I would encourage you to try to formulate the score in a way that is linear, possibly using additional variables (for example, if you want the score to be an absolute value).

TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

Strange error from numpy via matplotlib when trying to get a histogram of a tiny toy dataset. I'm just not sure how to interpret the error, which makes it hard to see what to do next.
Didn't find much related, though this nltk question and this gdsCAD question are superficially similar.
I intend the debugging info at bottom to be more helpful than the driver code, but if I've missed something, please ask. This is reproducible as part of an existing test suite.
if n > 1:
return diff(a[slice1]-a[slice2], n-1, axis=axis)
else:
> return a[slice1]-a[slice2]
E TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
../py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py:1567: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) bt
[...]
py2.7.11-venv/lib/python2.7/site-packages/matplotlib/axes/_axes.py(5678)hist()
-> m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(606)histogram()
-> if (np.diff(bins) < 0).any():
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) p numpy.__version__
'1.11.0'
(Pdb) p matplotlib.__version__
'1.4.3'
(Pdb) a
a = [u'A' u'B' u'C' u'D' u'E']
n = 1
axis = -1
(Pdb) p slice1
(slice(1, None, None),)
(Pdb) p slice2
(slice(None, -1, None),)
(Pdb)
I got the same error, but in my case I am subtracting dict.key from dict.value. I have fixed this by subtracting dict.value for corresponding key from other dict.value.
cosine_sim = cosine_similarity(e_b-e_a, w-e_c)
here I got error because e_b, e_a and e_c are embedding vector for word a,b,c respectively. I didn't know that 'w' is string, when I sought out w is string then I fix this by following line:
cosine_sim = cosine_similarity(e_b-e_a, word_to_vec_map[w]-e_c)
Instead of subtracting dict.key, now I have subtracted corresponding value for key
I had a similar issue where an integer in a row of a DataFrame I was iterating over was of type numpy.int64. I got the
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
error when trying to subtract a float from it.
The easiest fix for me was to convert the row using pd.to_numeric(row).
Why is it applying diff to an array of strings.
I get an error at the same point, though with a different message
In [23]: a=np.array([u'A' u'B' u'C' u'D' u'E'])
In [24]: np.diff(a)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-9d5a62fc3ff0> in <module>()
----> 1 np.diff(a)
C:\Users\paul\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\lib\function_base.pyc in diff(a, n, axis)
1112 return diff(a[slice1]-a[slice2], n-1, axis=axis)
1113 else:
-> 1114 return a[slice1]-a[slice2]
1115
1116
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
Is this a array the bins parameter? What does the docs say bins should be?
I am fairly new to this myself, but I had a similar error and found that it is due to a type casting issue. I was trying to concatenate rather than take the difference but I think the principle is the same here. I provided a similar answer on another question so I hope that is OK.
In essence you need to use a different data type cast, in my case I needed str not float, I suspect yours is the same so my suggested solution is. I am sorry I cannot test it before suggesting but I am unclear from your example what you were doing.
return diff(str(a[slice1])-str(a[slice2]), n-1, axis=axis)
Please see my example code below for the fix to my code, the change occurs on the third to last line. The code is to produce a basic random forest model.
import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation
Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"
npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay = npArray.shape
names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)
XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)
# Predictions results initialised
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain) # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)
with open(test_name,'a') as fpred :
lenpredictions = len(RFpreds)
lentrue = yTest.shape[0]
if lenpredictions == lentrue :
fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
for i in range(0,lenpredictions) :
fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
else :
print "ERROR - names, prediction and true value array size mismatch."
This leads to an error of;
Traceback (most recent call last):
File "min_example.py", line 40, in <module>
fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')
The solution is to make each variable a str() type on the third to last line then write to file. No other changes to then code have been made from the above.
import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation
Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"
npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay = npArray.shape
names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)
XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)
# Predictions results initialised
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain) # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)
with open(test_name,'a') as fpred :
lenpredictions = len(RFpreds)
lentrue = yTest.shape[0]
if lenpredictions == lentrue :
fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
for i in range(0,lenpredictions) :
fpred.write(str(RFpreds[i])+",,"+str(yTest[i])+",\n")
else :
print "ERROR - names, prediction and true value array size mismatch."
These examples are from a larger code so I hope the examples are clear enough.
I think #James is right. I got stuck by same error while working on Polyval(). And yeah solution is to use the same type of variabes. You can use typecast to cast all variables in the same type.
BELOW IS A EXAMPLE CODE
import numpy
P = numpy.array(input().split(), float)
x = float(input())
print(numpy.polyval(P,x))
here I used float as an output type. so even the user inputs the INT value (whole number). the final answer will be typecasted to float.
I ran into the same issue, but in my case it was just a Python list instead of a Numpy array used. Using two Numpy arrays solved the issue for me.