PyPlot throws an error when DataFrame-Column has missing values - matplotlib

I have the following problem:
I would like to plot a variable from a Dataframe with missing values, which are denoted as "NA". However, if I just go ahead and use with Pyplot
x = df[df[:country] .== "Belgium",:year]
y = df[df[:country] .== "Belgium",:hpNormLog]
plot(x, y, "b-", linewidth=2)
I get the following error message:
PyError (:PyObject_Call) <class 'TypeError'> TypeError("float() argument must be a string or a number, not 'PyCall.jlwrap'",)
File "C:\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 3154, in plot
ret = ax.plot(*args, **kwargs) File "C:\Anaconda3\lib\site-packages\matplotlib\__init__.py", line 1811, in inner
return func(ax, *args, **kwargs) File "C:\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py", line 1425, in plot
self.add_line(line) File "C:\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 1708, in add_line
self._update_line_limits(line) File "C:\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 1730, in _update_line_limits
path = line.get_path() File "C:\Anaconda3\lib\site-packages\matplotlib\lines.py", line 925, in get_path
self.recache() File "C:\Anaconda3\lib\site-packages\matplotlib\lines.py", line 621, in recache
y = np.asarray(yconv, np.float_) File "C:\Anaconda3\lib\site-packages\numpy\core\numeri...
I would be very grateful, if I had a solution around it.
Best,
Ilja

I found the following solution. I am not deep enough into how Julia works, so I can only say what works and what does not. Arrays with NaN can be plotted with the code written above, columns of DataFrames however do not permit the same thing. The column needs to be converted to an Array, before it can be plotted with missing values. The following code solves the problem:
x = df[df[:country] .== "Belgium",:year]
ytest = df[df[:country] .== "Belgium",:hpNormLog]
y = convert(Array,ytest,NaN)
plot(x, y, "b-", linewidth=2)
x does not contain missing values and therefore I can keep using the DataFrame, but y does contain missing values, so it needs to be converted to an Array. The third argument of convert specifies to what missing values should be converted, in this case to NaN.

Why don't you perform error-handling?
try:
plot(x, y, "b-", linewidth=2)
except PyError:
pass
Escape the error when it works most of the time for your input but skip plotting of "NA"-values....

Related

train_test_split errors with two csv files

I am working with 2 csv files and I want to compares values from both using the train_test_split function.
My code is the following:
X = np.append(y1[:100])
X_train, X_test, y_train, y_test = train_test_split(X, y1)
I know that X and y1 are not of the same length and I was trying to fix this error:
ValueError: Found input variables with inconsistent numbers of samples: [4840242, 44898]
However, with the first line I am currently getting this error:
File "<array_function internals>", line 179, in append
TypeError: _append_dispatcher() missing 1 required positional argument: 'values'
How would I be able to fix this?
You are using the numpy append function wrong. The function expects an array to append to. You are just giving it values, but not the array that these values are supposed to be appended on (or the other way around, you are giving the array, but not the values). If the first 100 entries in y are supposed to be your X, simply writing X = y[:100] will suffice.

Inconsistencies between latest numpy and scikit-learn versons?

I just upgraded my versions of numpy and scikit-learn to the latest versions, i.e. numpy-1.16.3 and sklearn-0.21.0 (for Python 3.7). A lot is crashing, e.g. a simple PCA on a numeric matrix will not work anymore. For instance, consider this toy matrix:
Xt
Out[3561]:
matrix([[-0.98200559, 0.80514289, 0.02461868, -1.74564111],
[ 2.3069239 , 1.79912014, 1.47062378, 2.52407335],
[-0.70465054, -1.95163302, -0.67250316, -0.56615338],
[-0.75764211, -1.03073475, 0.98067997, -2.24648769],
[-0.2751523 , -0.46869694, 1.7917171 , -3.31407694],
[-1.52269241, 0.05986123, -1.40287416, 2.57148354],
[ 1.38349325, -1.30947483, 0.90442436, 2.52055143],
[-0.4717785 , -1.46032344, -1.50331841, 3.58598692],
[-0.03124986, -3.52378987, 1.22626145, 1.50521572],
[-1.01453403, -3.3211243 , -0.00752532, 0.56538522]])
Then run PCA on it:
import sklearn.decomposition as skd
est2 = skd.PCA(n_components=4)
est2.fit(Xt)
This fails:
Traceback (most recent call last):
File "<ipython-input-3563-1c97b7d5474f>", line 2, in <module>
est2.fit(Xt)
File "/home/sven/anaconda3/lib/python3.7/site-packages/sklearn/decomposition/pca.py", line 341, in fit
self._fit(X)
File "/home/sven/anaconda3/lib/python3.7/site-packages/sklearn/decomposition/pca.py", line 407, in _fit
return self._fit_full(X, n_components)
File "/home/sven/anaconda3/lib/python3.7/site-packages/sklearn/decomposition/pca.py", line 446, in _fit_full
total_var = explained_variance_.sum()
File "/home/sven/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 36, in _sum
return umr_sum(a, axis, dtype, out, keepdims, initial)
TypeError: float() argument must be a string or a number, not '_NoValueType'
My impression is that numpy has been restructured at a very fundamental level, including single column matrix referencing, such that functions such as np.sum, np.sqrt etc don't behave as they did in older versions.
Does anyone know what the path forward with numpy is and what exactly is going on here?
At this point your code fit as run scipy.linalg.svd on your Xt, and is looking at the singular values S.
self.mean_ = np.mean(X, axis=0)
X -= self.mean_
U, S, V = linalg.svd(X, full_matrices=False)
# flip eigenvectors' sign to enforce deterministic output
U, V = svd_flip(U, V)
components_ = V
# Get variance explained by singular values
explained_variance_ = (S ** 2) / (n_samples - 1)
total_var = explained_variance_.sum()
In my working case:
In [175]: est2.explained_variance_
Out[175]: array([6.12529695, 3.20400543, 1.86208619, 0.11453425])
In [176]: est2.explained_variance_.sum()
Out[176]: 11.305922832602981
np.sum explains that, as of v 1.15, it takes a initial parameter (ref. ufunc.reduce). And the default is initial=np._NoValue
In [178]: np._NoValue
Out[178]: <no value>
In [179]: type(np._NoValue)
Out[179]: numpy._globals._NoValueType
So that explains, in part, the _NoValueType reference in the error.
What's your scipy version?
In [180]: import scipy
In [181]: scipy.__version__
Out[181]: '1.2.1'
I wonder if your scipy.linalg.svd is returning a S array that is an 'old' ndarray, and doesn't fully implement this initial parameter. I can't explain why that could happen, but can't explain otherwise why the array sum is having problems with a np._NoValue.

DataFrame.apply(func, raw=True) doesn't seem to take effect?

I am trying to hash together only a few columns of my dataframe df so I do
temp = df['field1', 'field2']
df["hash"] = temp.apply(lambda x: hash(x), raw=True, axis=1)
I set raw to true because the doc (I am using 0.22) says it will pass a numpy array instead of a mutable Series but even with raw=True I am getting a Series, why?
File "/nix/store/9ampki9dbq0imhhm7i27qkh56788cjpz-python3.6-pandas-0.22.0/lib/python3.6/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/nix/store/9ampki9dbq0imhhm7i27qkh56788cjpz-python3.6-pandas-0.22.0/lib/python3.6/site-packages/pandas/core/frame.py", line 4973, in _apply_standard
results[i] = func(v)
File "/home/teto/mptcpanalyzer/mptcpanalyzer/data.py", line 190, in _hash_row
return hash(x)
File "/nix/store/9ampki9dbq0imhhm7i27qkh56788cjpz-python3.6-pandas-0.22.0/lib/python3.6/site-packages/pandas/core/generic.py", line 1045, in __hash__
' hashed'.format(self.__class__.__name__))
TypeError: ("'Series' objects are mutable, thus they cannot be hashed", 'occurred at index 1')
It's strange, as I can't reproduce your exact error (that is, by me, raw=True indeed results in an np.ndarray being passed). In any case, neither a Series nor a np.ndarray are hashable. The following works, though:
temp.apply(lambda x: hash(tuple(x)), axis=1)
A tuple is hashable.

Tuple indices must be integers not tuple, matplot

I'm trying to code a program that will integrate a function using diferent ways (Euler, Runge...) and using the build-in function scipy.integrate.odeint.
Everything and I'm getting the right results but I also need to create a graph with the results and that's when everything goes wrong.
For the odeint function I can't draw the graph.
Here is my code and the ERROR, I hope someone will be able to help me.
def odeint(phi, t0tf, Y0, N):
T6=numpy.zeros((N+1))
T6[0]=t0tf[0]
h=(t0tf[1]-t0tf[0])/N
for i in range (N):
T6[i+1]=T6[i]+h
def f(t,x):
return phi(x,t)
Y6 = scipy.integrate.odeint(f,Y0,T6, full_output=True)
return Y6
Y6 = edo.odeint(phi, t0tf, Y0, N)
T6Y6 = numpy.hstack([Y6])
print("Solutions Scipy :")
print()
print(T6Y6)
print()
mpl.figure("Courbes")
mpl.plot(Y6[0:N,0],Y6[0:N,1],color="yellow",label="Isoda")
mpl.show()
And the error is :
mpl.plot(Y6[0:N,0],Y6[0:N,1],color="yellow",label="Isoda")
TypeError: tuple indices must be integers, not tuple
Thanks in advance (PS: I'm french so my sentences might be kinda shaky)
Y6 seems to be a tuple that you are calling in an incorrect way. It's difficult to point out exactly what is wrong since you didn't provide the data but the following example shows you how to call elements from a tuple:
y = ((1,2,3,4,5),)
print('This one works: ',y[0][1:])
print(y[1:,0])
, the result is this:
This one works: (2, 3, 4, 5)
Traceback (most recent call last):
File "E:\armatita\stackoverflow\test.py", line 9, in <module>
print(y[1:,0])
TypeError: tuple indices must be integers, not tuple

Using spy() in Julia

I am trying to use spy(). But I am not getting the use of it right. I think my error has something to do with this: https://github.com/JuliaLang/julia/issues/2121
I have a 300x300 Array{Float64,2}
using PyPlot
pygui(true)
spy(I) # where I is my 300x300 array
and it gives me this error:
LoadError: PyError (:PyObject_Call) <type 'exceptions.TypeError'>
TypeError("object of type 'PyCall.jlwrap' has no len()",)
File "/home/ashley/.julia/v0.4/Conda/deps/usr/lib/python2.7/site-packages/matplotlib/pyplot.py", line 3154, in plot
ret = ax.plot(*args, **kwargs)
File "/home/ashley/.julia/v0.4/Conda/deps/usr/lib/python2.7/site-packages/mpl_toolkits/mplot3d/axes3d.py", line 1539, in plot
zs = np.ones(len(xs)) * zs
I have tried specifying spy(I, zs=zeros(size(I)) but then I just get the error:
LoadError: ArgumentError: function spy does not accept keyword arguments
while loading In[260], in expression starting on line 13
Any ideas?
spy shows the non-zero elements. Apparently it doesn't show anything if there are no non-zero elements.
M = sprand(300, 300, 0.1) # generate a sparse matrix with density 0.1 of non-zeros
M = full(M)
spy(M)
works for me.