Numpy creation fromfunction - numpy

I need something like this
def qqq(i,j):
if i+j>2:
return 0.5
else:
return 0
n=3
dcdt=np.fromfunction(lambda i,j: qqq(i,j)*i*j, (n,n), dtype=int)
but with more complicated qqq. But it leads to the error "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()". I know the problem is that function is called once. How can I do such array creation with the "if-elif-else" structure in the function?

You should turn your qqq function into something like:
def qqq(i, j):
return np.where(i + j > 2, 0.5, 0)
See np.where's docs for details.

Related

Explanation of pandas DataFrame.assign() behaviour using lambda

import pandas as pd
import numpy as np
np.random.seed(99)
rows = 10
df = pd.DataFrame ({'A' : np.random.choice(range(0, 2), rows, replace = True),
'B' : np.random.choice(range(0, 2), rows, replace = True)})
def get_C1(row):
return row.A + row.B
def get_C2(row):
return 'X' if row.A + row.B == 0 else 'Y'
def get_C3(row):
is_zero = row.A + row.B
return "X" if is_zero else "Y"
df = df.assign(C = lambda row: get_C3(row))
Why the get_C2 and get_C3 functions return an error?
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You're thinking that df.assign, when passed a function, behaves like df.apply with axis=1, which calls the function for each row.
That's incorrect.
Per the docs for df.assign
Where the value is a callable, evaluated on df
That means that the function you pass to assign is called on the whole dataframe instead of each individual row.
So, in your function get_C3, the row parameter is not a row at all. It's a whole dataframe (and should be renamed to df or something else) and so row.A and row.B are two whole columns, rather than single cell values.
Thus, is_zero is a whole column as well, and ... if is_zero ... will not work.

Why do I have to use a.any() or a.all() in this code?

In this code below, I found that when I put a number it works, but when I put ndarray then it would post an error message.
Why do I have to use a.any() or a.all() in this case?
import numpy as np
def ht(x):
if x%2 == 1:
return 1
else:
return 0
ht(1)
[Example]
step(1): 1
step(np.array([1,2,3,4])) : The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
when evaluating if statements, you have to pass in a bool.
if var:
pass
var has to be of type bool.
if x is a number, then x%2 == 1 is a bool.
if x is a np.array, then x%2 == 1 is a np.array which isn't a bool, but rather an array of bool, in which each cell states whether *that cell* %2 == 1.
You can check if all elements in it are truthy (1) or if any of them are truthy with np.all or np.any.
This is because when np.array([1,2,3,4])%2 is performed, the output is also in np array format - array([1, 0, 1, 0]). To check whether these individual array elements are 1 or 0, one has to use the any() or all() function. There is no problem when we pass a single element.
So, here is the modified code -
import numpy as np
def ht(x):
if all(x%2 == 1): #denotes true when all modulus results are == 1
return 1
else:
return 0
ht(np.array([1,2,3,4]))
Output for the above code is 0
import numpy as np
def ht(x):
if any(x%2 == 1): #denotes true when any modulus result is == 1
return 1
else:
return 0
ht(np.array([1,2,3,4]))
Output for the above code is 1

I am trying to take an 1D slice from 2D numpy array, but something goes wrong

I am trying to filter evident measurement mistakes from my data using the 3-sigma rule. x is a numpy array of measurement points and y is an arrray of measured values. To remove wrong points from my data, I zip x.tolist() and y.tolist(), then filter by the second element of each tuple, then I need to convert my zip back into two lists. I tried to first covert my list of tuples into a list of lists, then convert it to numpy 2D array and then take two 1D-slices of it. It looks like the first slice is correct, but then it outputs the following:
x = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 0]
IndexError: too many indices for array
I don't understand what am I doing wrong. Here's the code:
x = np.array(readCol(0, l))
y = np.array(readCol(1, l))
n = len(y)
stdev = np.std(y)
mean = np.mean(y)
print("Stdev is: " + str(stdev))
print("Mean is: " + str(mean))
def flt(n):
global mean
global stdev
global x
if abs(n[1] - mean) < 3*stdev:
return True
else:
print('flt function finds an error: ' + str(n[1]))
return False
def filtration(N):
print(Fore.RED + 'Filtration function launched')
global y
global x
global stdev
global mean
zap = zip(x.tolist(), y.tolist())
for i in range(N):
print(Fore.RED + ' Filtration step number ' + str(i) + Style.RESET_ALL)
y = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 1]
print(Back.GREEN + 'This is y: \n' + Style.RESET_ALL)
print(y)
x = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 0]
print(Back.GREEN + 'This is x: \n' + Style.RESET_ALL)
print(x)
print('filtration fuction main step')
stdev = np.std(y)
print('second step')
mean = np.mean(y)
print('third step')
Have you tried to test the problem line step by step?
x = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 0]
for example:
temp = np.array(list(map(list, list(filter(flt, list(zap))))))
print(temp.shape, temp.dtype)
x = temp[:, 0]
Further break down might be needed, but since [:,0] is the only indexing operation in this line, I'd start there.
Without further study of the code and/or some examples, I'm not going to try to speculate what the nested lists are doing.
The error sounds like temp is not 2d, contrary to your expectations. That could be because temp is object dtype, and composed of lists the vary in length. That seems to be common problem when people make arrays from downloaded databases.

remove duplicated values form numpy array

I have three numpy arrays
x =np.array([1,2,3,4,2,1,2,3,3,3])
y =np.array([10,20,30,40,20,10,20,30,39,39])
z =np.array([100,200,300,400,200,100,200,300,300,300])
I want to check if x[i]==x[j] and y[i]==y[j] and z[i]!=z[j]. If this is true I want to remove z[j].
In pseudo code:
label: check
for i in range(0,np.size(x)):
for j in range(0,np.size(x)):
If x[i] == x[j] and y[i]==y[j] and z[i]!=z[j] and i<j:
x = delete(x,j)
y = delete(y,j)
z = delete(z,j)
print "start again from above"
goto check
Since I use goto and I don't know any other way around this I want to ask if there is any quick and elegant way to do this (maybe based on numpy predefined functions)?
This should do it:
np.unique(np.array([x, y, z]), axis=1)

scipy.optimize.minimize with general array indexing

I want to solve an optimization problem with the method 'COBYLA' in scipy.optimize.minimize as follows:
test = spopt.minimize(testobj, x_init, method='COBYLA', constraints=cons1)
y = test.x
print 'solution x =', y
However, since the program is quite large, a scalable way to write the objective function (and the constraints) is to use a general index for the arguments. For example, if I could use x['parameter1'] or x.param1 instead of x[0], then the program would be easier to read and debug. I tried both writing x as an object or a pandas Series with general indexing like x['parameter1'], as follows:
def testobj(x):
return x['a']**2 + x['b'] + 1
def testcon1(x):
return x['a']
def testcon2(x):
return x['b']
def testcon3(x):
return 1 - x['a'] - x['b']
x_init = pd.Series([0.1, 0.1])
x_init.index = ['a','b']
cons1 = ({'type': 'ineq', 'fun': testcon1}, \
{'type': 'ineq', 'fun': testcon2}, \
{'type': 'ineq', 'fun': testcon3})
but whenever I pass that into the minimize routine, it throws an error:
return x['a']**2 + x['b'] + 1
ValueError: field named a not found
It works perfectly if I use the normal numpy array. Perhaps I'm not doing it right, but is that a limitation of the minimize function that I have to use numpy array and not any other data structure? The scipy documentation on this topic mentions that the initial guess has to be ndarray, but I'm curious how is the routine calling the arguments, because for pandas Series calling the variable with x[0] or x['a'] are equivalent.
As you note, scipy optimize uses numpy arrays as input, not pandas Series. When you initialize with a pandas series, it effectively converts it to an array and so you cannot access the fields by name anymore.
Probably the easiest way to go is to just create a function which re-wraps the parameters each time you call them; for example:
def make_series(params):
return pd.Series(params, index=['a', 'b'])
def testobj(x):
x = make_series(x)
return x['a']**2 + x['b'] + 1
def testcon1(x):
x = make_series(x)
return x['a']
def testcon2(x):
x = make_series(x)
return x['b']
def testcon3(x):
x = make_series(x)
return 1 - x['a'] - x['b']
x_init = make_series([1, 1])
test = spopt.minimize(testobj, x_init, method='COBYLA', constraints=cons1)
print('solution x =', test.x)
# solution x = [ 1.38777878e-17 0.00000000e+00]