Enforcing compatibility between numpy 1.8 and 1.9 nansum? - numpy

I have code that needs to behave identically independent of numpy version, but the underlying np.nansum function has changed behavior such that np.nansum([np.nan,np.nan]) is 0.0 in 1.9 and NaN in 1.8. The <=1.8 behavior is the one I would prefer, but the more important thing is that my code be robust against the numpy version.
The tricky thing is, the code applies an arbitrary numpy function (generally, a np.nan[something] function) to an ndarray. Is there any way to force the new or old numpy nan[something] functions to conform to the old or new behavior shy of monkeypatching them?
A possible solution I can think of is something like outarr[np.allnan(inarr, axis=axis)] = np.nan, but there is no np.allnan function - if this is the best solution, is the best implementation np.all(np.isnan(arr), axis=axis) (which would require only supporting np>=1.7, but that's probably OK)?

In Numpy 1.8, nansum was defined as:
a, mask = _replace_nan(a, 0)
if mask is None:
return np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
mask = np.all(mask, axis=axis, keepdims=keepdims)
tot = np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
if np.any(mask):
tot = _copyto(tot, np.nan, mask)
warnings.warn("In Numpy 1.9 the sum along empty slices will be zero.",
FutureWarning)
return tot
in Numpy 1.9, it is:
a, mask = _replace_nan(a, 0)
return np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
I don't think there is a way to make the new nansum behave the old way, but given that the original nansum code isn't that long, can you just include a copy of that code (without the warning) if you care about preserving the pre-1.8 behavior?
Note that _copyto can be imported numpy.lib.nanfunctions

Related

Range of Beta in Kaiser Window

Why is the biggest value for Beta in the Kaiser Window function 709? I get errors when I use values of 710 or more (in Matlab and NumPy).
Seems like a stupid question but I'm still not able to find an answer...
Thanks in advance!
This is the Kaiser window with a length of M=64 and Beta = 710.
For large x, i0(x) is approximately exp(x), and exp(x) overflows at x near 709.79.
If you are programming in Python and you don't mind the dependency on SciPy, you can use the function scipy.special.i0e to implement the function in a way that avoids the overflow:
In [47]: from scipy.special import i0e
In [48]: def i0_ratio(x, y):
...: return i0e(x)*np.exp(x - y)/i0e(y)
...:
Verify that the function returns the same value as np.i0(x)/np.i0(y):
In [49]: np.i0(3)/np.i0(4)
Out[49]: 0.4318550956673735
In [50]: i0_ratio(3, 4)
Out[50]: 0.43185509566737346
An example where the naive implementation overflows, but i0_ratio does not:
In [51]: np.i0(650)/np.i0(720)
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/lib/function_base.py:3359: RuntimeWarning: overflow encountered in exp
return exp(x) * _chbevl(32.0/x - 2.0, _i0B) / sqrt(x)
Out[51]: 0.0
In [52]: i0_ratio(650, 720)
Out[52]: 4.184118428217628e-31
In Matlab, to get the same scaled Bessel function as i0e(x), you can use besseli(0, x, 1).

Inconsistencies between latest numpy and scikit-learn versons?

I just upgraded my versions of numpy and scikit-learn to the latest versions, i.e. numpy-1.16.3 and sklearn-0.21.0 (for Python 3.7). A lot is crashing, e.g. a simple PCA on a numeric matrix will not work anymore. For instance, consider this toy matrix:
Xt
Out[3561]:
matrix([[-0.98200559, 0.80514289, 0.02461868, -1.74564111],
[ 2.3069239 , 1.79912014, 1.47062378, 2.52407335],
[-0.70465054, -1.95163302, -0.67250316, -0.56615338],
[-0.75764211, -1.03073475, 0.98067997, -2.24648769],
[-0.2751523 , -0.46869694, 1.7917171 , -3.31407694],
[-1.52269241, 0.05986123, -1.40287416, 2.57148354],
[ 1.38349325, -1.30947483, 0.90442436, 2.52055143],
[-0.4717785 , -1.46032344, -1.50331841, 3.58598692],
[-0.03124986, -3.52378987, 1.22626145, 1.50521572],
[-1.01453403, -3.3211243 , -0.00752532, 0.56538522]])
Then run PCA on it:
import sklearn.decomposition as skd
est2 = skd.PCA(n_components=4)
est2.fit(Xt)
This fails:
Traceback (most recent call last):
File "<ipython-input-3563-1c97b7d5474f>", line 2, in <module>
est2.fit(Xt)
File "/home/sven/anaconda3/lib/python3.7/site-packages/sklearn/decomposition/pca.py", line 341, in fit
self._fit(X)
File "/home/sven/anaconda3/lib/python3.7/site-packages/sklearn/decomposition/pca.py", line 407, in _fit
return self._fit_full(X, n_components)
File "/home/sven/anaconda3/lib/python3.7/site-packages/sklearn/decomposition/pca.py", line 446, in _fit_full
total_var = explained_variance_.sum()
File "/home/sven/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 36, in _sum
return umr_sum(a, axis, dtype, out, keepdims, initial)
TypeError: float() argument must be a string or a number, not '_NoValueType'
My impression is that numpy has been restructured at a very fundamental level, including single column matrix referencing, such that functions such as np.sum, np.sqrt etc don't behave as they did in older versions.
Does anyone know what the path forward with numpy is and what exactly is going on here?
At this point your code fit as run scipy.linalg.svd on your Xt, and is looking at the singular values S.
self.mean_ = np.mean(X, axis=0)
X -= self.mean_
U, S, V = linalg.svd(X, full_matrices=False)
# flip eigenvectors' sign to enforce deterministic output
U, V = svd_flip(U, V)
components_ = V
# Get variance explained by singular values
explained_variance_ = (S ** 2) / (n_samples - 1)
total_var = explained_variance_.sum()
In my working case:
In [175]: est2.explained_variance_
Out[175]: array([6.12529695, 3.20400543, 1.86208619, 0.11453425])
In [176]: est2.explained_variance_.sum()
Out[176]: 11.305922832602981
np.sum explains that, as of v 1.15, it takes a initial parameter (ref. ufunc.reduce). And the default is initial=np._NoValue
In [178]: np._NoValue
Out[178]: <no value>
In [179]: type(np._NoValue)
Out[179]: numpy._globals._NoValueType
So that explains, in part, the _NoValueType reference in the error.
What's your scipy version?
In [180]: import scipy
In [181]: scipy.__version__
Out[181]: '1.2.1'
I wonder if your scipy.linalg.svd is returning a S array that is an 'old' ndarray, and doesn't fully implement this initial parameter. I can't explain why that could happen, but can't explain otherwise why the array sum is having problems with a np._NoValue.

reshape is deprecated issue when I pick series from pandas Dataframe

When I try to take one series from dataframe I get this issue
anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py:52: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead
return getattr(obj, method)(*args, **kwds)
This is the code snippet
for idx, categories in enumerate(categorical_columns):
ax = plt.subplot(3,3,idx+1)
ax.set_xlabel(categories[0])
box = [df[df[categories[0]] == atype].price for atype in categories[1]]
ax.boxplot(box)
For avoid chained indexing use DataFrame.loc:
box = [df.loc[df[categories[0]] == atype, 'price'] for atype in categories[1]]
And for remove FutureWarning is necessary upgrade pandas with matplotlib.

Declaring theano variables for pymc3

I am having issues replicating a pymc2 code using pymc3.
I believe it is due to the fact pymc3 is using the theano type variables which are not compatible with the numpy operations I am using. So I am using the #theano.decorator:
I have this function:
with pymc3.Model() as model:
z_stars = pymc3.Uniform('z_star', self.z_min_ssp_limit, self.z_max_ssp_limit)
Av_stars = pymc3.Uniform('Av_star', 0.0, 5.00)
sigma_stars = pymc3.Uniform('sigma_star',0.0, 5.0)
#Fit observational wavelength
ssp_fit_output = self.ssp_fit_theano(z_stars, Av_stars, sigma_stars,
self.obj_data['obs_wave_resam'],
self.obj_data['obs_flux_norm_masked'],
self.obj_data['basesWave_resam'],
self.obj_data['bases_flux_norm'],
self.obj_data['int_mask'],
self.obj_data['normFlux_obs'])
#Define likelihood
like = pymc.Normal('ChiSq', mu=ssp_fit_output,
sd=self.obj_data['obs_fluxEr_norm'],
observed=self.obj_data['obs_fluxEr_norm'])
#Run the sampler
trace = pymc3.sample(iterations, step=step, start=start_conditions, trace=db)
where:
#theano.compile.ops.as_op(itypes=[t.dscalar,t.dscalar,t.dscalar,t.dvector,
t.dvector,t.dvector,t.dvector,t.dvector,t.dscalar],
otypes=[t.dvector])
def ssp_fit_theano(self, input_z, input_sigma, input_Av, obs_wave, obs_flux_masked,
rest_wave, bases_flux, int_mask, obsFlux_mean):
...
...
The first three variables are scalars (from the pymc3 uniform distribution). The
remaining variables are numpy arrays and the last one is a float. However, I am
getting this "'numpy.ndarray' object has no attribute 'type'" error:
File "/home/user/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 615, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/user/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 963, in make_node
if not all(inp.type == it for inp, it in zip(inputs, self.itypes)):
File "/home/user/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 963, in <genexpr>
if not all(inp.type == it for inp, it in zip(inputs, self.itypes)):
AttributeError: 'numpy.ndarray' object has no attribute 'type'
Please any advice in the right direction will be most welcomed.
I had a bunch of time-wasting-stops when I went from pymc2 to pymc3. The problem, I think, is that the doc is quite bad. I suspect they neglect the doc as far as the code is still evolving. 3 comments/advises:
I wish you could find some help using '#theano.compile.ops.as_op' here: failure to adapt pymc2 into pymc3 or here how to fit a method belonging to an instance with pymc3?
The drawback of '#theano.compile.ops.as_op' is that you implicitly exclude any analysis related to the gradient of your function. To have access to the gradient, I think you need to define your function in a more complex way presented here how to fit a method belonging to an instance with pymc3?
warning: for the moment, using theano seems to be a source of problem if you want to distribute your code under Windows. See build a .exe for Windows from a python 3 script importing theano with pyinstaller, but I am not sure whether it is just a personal clumsiness or really a problem. Personally I had to give up theano to be able to distribute my code...

Why the difference between octave's prctile and numpy's percentile?

I've been rewriting a matlab/octave program into numpy and ran across a difference in some resultant values.
This occurs with both the percentile/prctile and the stdard-deviation functions.
In Numpy:
import matplotlib.mlab as ml
import numpy
>>> t = numpy.linspace(0,100, 100)
>>> numpy.percentile(t,95)
95.0
>>> numpy.std(t)
29.157646512850626
>>> ml.prctile(t,95)
95.000000000000014
In Octave:
octave:1> t = linspace(0,100,100)';
octave:2> prctile(t,95)
ans = 95.454545
octave:3> std(t)
ans = 29.304537
Although the array values of 't' are the same, the results are more different than I would suspect.
In the numpy help(numpy.std) they specifically mention that the algorithm is:
std = sqrt(mean(abs(x - x.mean())**2))
So I implemented that in octave and got the exact answer numpy gives. So it seems the std-deviation function differs.
But why/how? And which is correct? (if there is such a thing)
And even prctile/percentile?
Just in case since I'm in Linux aptosid...
GNU Octave, version 3.6.2
numpy.version '1.6.2rc1'
Numpy simply uses a different algorithm when the percentile lies between two data points. Octave, Matlab and R always center it exactly between two points when needed (I believe), numpy does a bit more then that... if you check http://en.wikipedia.org/wiki/Percentile you will see there are a couple of ways to calculate percentiles.
It seems like Octave assumes ddof=1, at least by default, and numpy uses 0 by default:
>>> numpy.std(t, ddof=0)
29.157646512850633
>>> numpy.std(t, ddof=1)
29.304537349375785