I get the error in the title for my code below, but I do not have any negative values or NaNs in my array.
Is there anything else I should check for?
I don't get the error for small arrays that I could copy here. I only get it in my arrays that are approx 100,000 x 1,000
def norm2(A,B):
"""numpy.ndarray A shape (m1,d), B shape (m2,d)
Returns ndarray of shape (m1, m2) dists_{i,j} = ||A_i - B_j||
"""
print("A.shape: ", A.shape)
print("B.shape: ", B.shape)
sums = np.sum(np.square(A[:,None,:] - B[None,:,:]),
axis = 2
)
print("sums.shape: ", sums.shape)
negs = np.sum(sums < 0)
nans = np.sum(np.isnan(sums))
warn = "There are " + str(negs) + " negative numbers and " + str(nans) + " NaNs."
dists = np.sqrt(sums) # dists_{i,j} = || A_i - B_j ||
return dists, warn
The print statements output:
A.shape: (100000, 1000)
B.shape: (100000, 1000)
sums.shape: (100000, 100000)
The warn returned is:
'There are 0 negative numbers and 0 NaNs.'
What else can I check?
Other similar questions were due to negative numbers:
Numpy, RuntimeWarning: invalid value encountered in sqrt
I checked the manual, but couldn't find anything about the runtimewarning
https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html
I tried to check the source code, but couldn't figure out where the code for sqrt is:
https://github.com/numpy/numpy/tree/master/numpy/lib
Related
I am trying to filter evident measurement mistakes from my data using the 3-sigma rule. x is a numpy array of measurement points and y is an arrray of measured values. To remove wrong points from my data, I zip x.tolist() and y.tolist(), then filter by the second element of each tuple, then I need to convert my zip back into two lists. I tried to first covert my list of tuples into a list of lists, then convert it to numpy 2D array and then take two 1D-slices of it. It looks like the first slice is correct, but then it outputs the following:
x = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 0]
IndexError: too many indices for array
I don't understand what am I doing wrong. Here's the code:
x = np.array(readCol(0, l))
y = np.array(readCol(1, l))
n = len(y)
stdev = np.std(y)
mean = np.mean(y)
print("Stdev is: " + str(stdev))
print("Mean is: " + str(mean))
def flt(n):
global mean
global stdev
global x
if abs(n[1] - mean) < 3*stdev:
return True
else:
print('flt function finds an error: ' + str(n[1]))
return False
def filtration(N):
print(Fore.RED + 'Filtration function launched')
global y
global x
global stdev
global mean
zap = zip(x.tolist(), y.tolist())
for i in range(N):
print(Fore.RED + ' Filtration step number ' + str(i) + Style.RESET_ALL)
y = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 1]
print(Back.GREEN + 'This is y: \n' + Style.RESET_ALL)
print(y)
x = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 0]
print(Back.GREEN + 'This is x: \n' + Style.RESET_ALL)
print(x)
print('filtration fuction main step')
stdev = np.std(y)
print('second step')
mean = np.mean(y)
print('third step')
Have you tried to test the problem line step by step?
x = np.array(list(map(list, list(filter(flt, list(zap))))))[:, 0]
for example:
temp = np.array(list(map(list, list(filter(flt, list(zap))))))
print(temp.shape, temp.dtype)
x = temp[:, 0]
Further break down might be needed, but since [:,0] is the only indexing operation in this line, I'd start there.
Without further study of the code and/or some examples, I'm not going to try to speculate what the nested lists are doing.
The error sounds like temp is not 2d, contrary to your expectations. That could be because temp is object dtype, and composed of lists the vary in length. That seems to be common problem when people make arrays from downloaded databases.
Been trying to solve the newtonian two-body problem using RK45 from scipy however keep running into the TypeError:'Required step size is less than spacing between numbers.' I've tried different values of t_eval than the one below but nothing seems to work.
from scipy import optimize
from numpy import linalg as LA
import matplotlib.pyplot as plt
from scipy.optimize import fsolve
import numpy as np
from scipy.integrate import solve_ivp
AU=1.5e11
a=AU
e=0.5
mss=2E30
ms = 2E30
me = 5.98E24
mv=4.867E24
yr=3.15e7
h=100
mu1=ms*me/(ms+me)
mu2=ms*me/(ms+me)
G=6.67E11
step=24
vi=np.sqrt(G*ms*(2/(a*(1-e))-1/a))
#sun=sphere(pos=vec(0,0,0),radius=0.1*AU,color=color.yellow)
#earth=sphere(pos=vec(1*AU,0,0),radius=0.1*AU)
sunpos=np.array([-903482.12391302, -6896293.6960525, 0. ])
earthpos=np.array([a*(1-e),0,0])
earthv=np.array([0,vi,0])
sunv=np.array([0,0,0])
def accelerations2(t,pos):
norme=sum( (pos[0:3]-pos[3:6])**2 )**0.5
gravit = G*(pos[0:3]-pos[3:6])/norme**3
sunaa = me*gravit
earthaa = -ms*gravit
tota=earthaa+sunaa
return [*earthaa,*sunaa]
def ode45(f,t,y,h):
"""Calculate next step of an initial value problem (IVP) of an ODE with a RHS described
by the RHS function with an order 4 approx. and an order 5 approx.
Parameters:
t: float. Current time.
y: float. Current step (position).
h: float. Step-length.
Returns:
q: float. Order 2 approx.
w: float. Order 3 approx.
"""
s1 = f(t, y[0],y[1])
s2 = f(t + h/4.0, y[0] + h*s1[0]/4.0,y[1] + h*s1[1]/4.0)
s3 = f(t + 3.0*h/8.0, y[0] + 3.0*h*s1[0]/32.0 + 9.0*h*s2[0]/32.0,y[1] + 3.0*h*s1[1]/32.0 + 9.0*h*s2[1]/32.0)
s4 = f(t + 12.0*h/13.0, y[0] + 1932.0*h*s1[0]/2197.0 - 7200.0*h*s2[0]/2197.0 + 7296.0*h*s3[0]/2197.0,y[1] + 1932.0*h*s1[1]/2197.0 - 7200.0*h*s2[1]/2197.0 + 7296.0*h*s3[1]/2197.0)
s5 = f(t + h, y[0] + 439.0*h*s1[0]/216.0 - 8.0*h*s2[0] + 3680.0*h*s3[0]/513.0 - 845.0*h*s4[0]/4104.0,y[1] + 439.0*h*s1[1]/216.0 - 8.0*h*s2[1] + 3680.0*h*s3[1]/513.0 - 845.0*h*s4[1]/4104.0)
s6 = f(t + h/2.0, y[0] - 8.0*h*s1[0]/27.0 + 2*h*s2[0] - 3544.0*h*s3[0]/2565 + 1859.0*h*s4[0]/4104.0 - 11.0*h*s5[0]/40.0,y[1] - 8.0*h*s1[1]/27.0 + 2*h*s2[1] - 3544.0*h*s3[1]/2565 + 1859.0*h*s4[1]/4104.0 - 11.0*h*s5[1]/40.0)
w1 = y[0] + h*(25.0*s1[0]/216.0 + 1408.0*s3[0]/2565.0 + 2197.0*s4[0]/4104.0 - s5[0]/5.0)
w2 = y[1] + h*(25.0*s1[1]/216.0 + 1408.0*s3[1]/2565.0 + 2197.0*s4[1]/4104.0 - s5[1]/5.0)
q1 = y[0] + h*(16.0*s1[0]/135.0 + 6656.0*s3[0]/12825.0 + 28561.0*s4[0]/56430.0 - 9.0*s5[0]/50.0 + 2.0*s6[0]/55.0)
q2 = y[1] + h*(16.0*s1[1]/135.0 + 6656.0*s3[1]/12825.0 + 28561.0*s4[1]/56430.0 - 9.0*s5[1]/50.0 + 2.0*s6[1]/55.0)
return w1,w2, q1,q2
t=0
T=10**5
poss=[-903482.12391302, -6896293.6960525, 0. ,a*(1-e),0,0 ]
sol = solve_ivp(accelerations2, [0, 10**5], poss,t_eval=np.linspace(0,10**5,1))
print(sol)
Not sure what the error even means because I've tried many different t_evl and nothing seems to work.
The default values in solve_ivp are made for a "normal" situation where the scales of the variables are not too different from the range from 0.1 to 100. You could achieve these scales by rescaling the problem so that all lengths and related constants are in AU and all times and related constants are in days.
Or you can try to set the absolute tolerance to something reasonable like 1e-4*AU.
It also helps to use the correct first order system, as I told you recently in another question on this topic. In a mechanical system you get usually a second order ODE x''=a(x). Then the first order system to pass to the ODE solver is [x', v'] = [v, a(x)], which could be implemented as
def firstorder(t,state):
pos, vel = state.reshape(2,-1);
return [*vel, *accelerations2(t,pos)]
Next it is always helpful to apply the acceleration of Earth to Earth and of the sun to the sun. That is, fix an order of the objects. At the moment the initialization has the sun first, while in the acceleration computation you treat the state as Earth first. Switch all to sun first
def accelerations2(t,pos):
pos=pos.reshape(-1,3)
# pos[0] = sun, pos[1] = earth
norme=sum( (pos[1]-pos[0])**2 )**0.5
gravit = G*(pos[1]-pos[0])/norme**3
sunacc = me*gravit
earthacc = -ms*gravit
totacc=earthacc+sunacc
return [*sunacc,*earthacc]
And then it never goes amiss to use the correctly reproduced natural constants like
G = 6.67E-11
Then the solver call and print formatting as
state0=[*sunpos, *earthpos, *sunvel, *earthvel]
sol = solve_ivp(firstorder, [0, T], state0, first_step=1e+5, atol=1e-6*a)
print(sol.message)
for t, pos in zip(sol.t, sol.y[[0,1,3,4]].T):
print("%.6e"%t, ", ".join("%8.4g"%x for x in pos))
gives the short table
The solver successfully reached the end of the integration interval.
t x_sun y_sun x_earth y_earth
0.000000e+00 -9.035e+05, -6.896e+06, 7.5e+10, 0
1.000000e+05 -9.031e+05, -6.896e+06, 7.488e+10, 5.163e+09
that is, for this step the solver only needs one internal step.
I have to optimize the coefficients for three numpy arrays which maximizes my evaluation function.
I have a target array called train['target'] and three predictions arrays named array1, array2 and array3.
I want to put the best linear coefficients i.e., x,y,z for these three arrays which will maximize the function
roc_aoc_curve(train['target'], xarray1 + yarray2 +z*array3)
the above function would be maximum when prediction is closer to the target.
i.e, xarray1 + yarray2 + z*array3 should be closer to train['target'].
The range of x,y,z >=0 and x,y,z <= 1
Basically I am trying to put the weights x,y,z for each of the three arrays which would make the function
xarray1 + yarray2 +z*array3 closer to the train['target']
Any help in getting this would be appreciated.
I used pulp.LpProblem('Giapetto', pulp.LpMaximize) to do the maximization. It works for normal numbers, integers etc, however failing while trying to do with arrays.
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
prob += score
coef = x+y+z
prob += (coef==1)
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Getting error at the line
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
TypeError: unsupported operand type(s) for /: 'int' and 'LpVariable'
Can't progress beyond this line when using arrays. Not sure if my approach is correct. Any help in optimizing the function would be appreciated.
When you add sums of array elements to a PuLP model, you have to use built-in PuLP constructs like lpSum to do it -- you can't just add arrays together (as you discovered).
So your score definition should look something like this:
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
A few notes about this:
[+] You didn't provide the definition of roc_auc_score so I just pretended that it equals the sum of the element-wise difference between the target array and the weighted sum of the other 3 arrays.
[+] I suspect your actual calculation for roc_auc_score is nonlinear; more on this below.
[+] arr_ind is a list of the indices of the arrays, which I created like this:
# build array index
arr_ind = range(len(array1))
[+] You also didn't include the arrays, so I created them like this:
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
train = {}
train['target'] = np.ones((10, 1))
Here is my complete code, which compiles and executes, though I'm sure it doesn't give you the result you are hoping for, since I just guessed about target and roc_auc_score:
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# dummy arrays since arrays weren't in OP code
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
# build array index
arr_ind = range(len(array1))
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
# dummy roc_auc_score since roc_auc_score wasn't in OP code
train = {}
train['target'] = np.ones((10, 1))
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
prob += score
coef = x + y + z
prob += coef == 1
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Output:
Optimal weekly number of x to produce: 0
Optimal weekly number of y to produce: 0
Optimal weekly number of z to produce: 1
Process finished with exit code 0
Now, if your roc_auc_score function is nonlinear, you will have additional troubles. I would encourage you to try to formulate the score in a way that is linear, possibly using additional variables (for example, if you want the score to be an absolute value).
num_samples = 10
def predict(x):
sampled_models = [guide(None, None) for _ in range(num_samples)]
yhats = [model(x).data for model in sampled_models]
mean = torch.mean(torch.stack(yhats), 0)
return np.argmax(mean.numpy(), axis=1)
print('Prediction when network is forced to predict')
correct = 0
total = 0
for j, data in enumerate(test_loader):
images, labels = data
predicted = predict(images.view(-1,28*28))
total += labels.size(0)
correct += (predicted == labels).sum().item()
print("accuracy: %d %%" % (100 * correct / total))
Error:
correct += (predicted == labels).sum().item() TypeError:
eq() received an invalid combination of arguments - got (numpy.ndarray), but expected one of:
* (Tensor other)
didn't match because some of the arguments have invalid types: (!numpy.ndarray!)
* (Number other)
didn't match because some of the arguments have invalid types: (!numpy.ndarray!)
*
You are trying to compare predicted and labels. However, your predicted is an np.array while labels is a torch.tensor therefore eq() (the == operator) cannot compare between them.
Replace the np.argmax with torch.argmax:
return torch.argmax(mean, dim=1)
And you should be okay.
I am drawing a histogram of a column from pandas data frame:
%matplotlib notebook
import matplotlib.pyplot as plt
import matplotlib
df.hist(column='column_A', bins = 100)
but got the following errors:
62 raise ValueError(
63 "num must be 1 <= num <= {maxn}, not {num}".format(
---> 64 maxn=rows*cols, num=num))
65 self._subplotspec = GridSpec(rows, cols)[int(num) - 1]
66 # num - 1 for converting from MATLAB to python indexing
ValueError: num must be 1 <= num <= 0, not 1
Does anyone know what this error mean? Thanks!
Problem
The problem you encounter arises when column_A does not contain numeric data. As you can see in the excerpt from pandas.plotting._core below, the numeric data is essential to make the function hist_frame (which you call by DataFrame.hist()) work correctly.
def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None,
xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False,
sharey=False, figsize=None, layout=None, bins=10, **kwds):
# skipping part of the code
# ...
if column is not None:
if not isinstance(column, (list, np.ndarray, Index)):
column = [column]
data = data[column]
data = data._get_numeric_data() # there is no numeric data in the column
naxes = len(data.columns) # so the number of axes becomes 0
# naxes is passed to the subplot generating function as 0 and later determines the number of columns as 0
fig, axes = _subplots(naxes=naxes, ax=ax, squeeze=False,
sharex=sharex, sharey=sharey, figsize=figsize,
layout=layout)
# skipping the rest of the code
# ...
Solution
If your problem is to represent numeric data (but not of numeric dtype yet) with a histogram, you need to cast your data to numeric, either with pd.to_numeric or df.astype(a_selected_numeric_dtype), e.g. 'float64', and then proceed with your code.
If your problem is to represent non-numeric data in one column with a histogram, you can call the function hist_series with the following line: df['column_A'].hist(bins=100).
If your problem is to represent non-numeric data in many columns with a histogram, you may resort to a handful options:
Use matplotlib and create subplots and histograms directly
Update pandas at least to version 0.25
usually is 0
mta['penn'] = [mta_bystation[mta_bystation.STATION == "34 ST-PENN STA"], 'Penn Station']
mta['grdcntrl'] = [mta_bystation[mta_bystation.STATION == "GRD CNTRL-42 ST"], 'Grand Central']
mta['heraldsq'] = [mta_bystation[mta_bystation.STATION == "34 ST-HERALD SQ"], 'Herald Sq']
mta['23rd'] = [mta_bystation[mta_bystation.STATION == "23 ST"], '23rd St']
#mta['portauth'] = [mta_bystation[mta_bystation.STATION == "42 ST-PORT AUTH"], 'Port Auth']
#mta['unionsq'] = [mta_bystation[mta_bystation.STATION == "14 ST-UNION SQ"], 'Union Sq']
mta['timessq'] = [mta_bystation[mta_bystation.STATION == "TIMES SQ-42 ST"], 'Ti