np.polyfit get different results on different machines, when input elements have slight differnece - numpy

Given input arr = np.array([0.023456, 0.023456, 0.023456]),
On my local machine:
import numpy as np
arr = np.array([0.023456, 0.023456, 0.023456])
np.polyfit(range(3), arr, 1, full=True)
(array([-4.65475154e-18, 2.34560000e-02]),
array([1.33213988, 0.47476661]),
The output from Colab notebook:
(array([-1.54182823e-18, 2.34560000e-02]),
array([1.33213988, 0.47476661]),
The Polynomial coefficient is slight different.
What causes this problem? Does it come from the machine floating computation difference?
How to ensure different machines get same result when my input vector has only one unique element or elements with minor difference, like arr = np.array([0.37304688, 0.37109375, 0.37304688])?


How to compare numpy arrays of tuples?

Here's an MWE that illustrates the issue I have:
import numpy as np
arr = np.full((3, 3), -1, dtype="i,i")
doesnt_work = arr == (-1, -1)
n_arr = np.full((3, 3), -1, dtype=int)
works = n_arr == 10
arr is supposed to be an array of tuples, but it doesn't behave as expected.
works is an array of booleans, as expected, but doesnt_work is False. Is there a way to get numpy to do elementwise comparisons on more complex types, or do I have to resort to list comprehension, flatten and reshape?
There's a second problem:
f = arr[(0, 0)] == (-1, -1)
f is False, because arr[(0,0)] is of type numpy.void rather than a tuple. So even if the componentwise comparison worked, it would give the wrong result. Is there a clever numpy way to do this or should I just resort to list comprehension?
Both problems are actually the same problem! And are both related to the custom data type you created when you specified dtype="i,i".
If you run arr.dtype you will get dtype([('f0', '<i4'), ('f1', '<i4')]). That is a 2 signed integers that are placed in one continuous block of memory. This is not a python tuple. Thus it is clear why the naive comparison fails, since (-1,-1) is a python tuple and is not represented in memory the same way that the numpy data type is.
However if you compare with a_comp = np.array((-1,-1), dtype="i,i") you get the exact behavior you are expecting!
You can read more about how the custom dtype stuff works on the numpy docs:
Oh and to address what np.void is: it comes from the idea that it is a void c pointer which essentially means that it is an address to a continuous block of memory of unspecified type. But, provided you (the programer) knows what is going to be stored in that memory (in this case two back to back integers) it's fine provided you are careful (compare with the same custom data type).

Writing SKLearn Regresion Coefficients To Pandas Series

I have a regression model that I fit in SKlearn's LinearRegression module:
To extract the coefficients, I used the code;
coefficients = model.coef_
It produced the following array with a shape of (1, 10):
[-4.72307152e-05 1.29731143e-04 8.75483702e-05 -6.28749019e-04
1.75096740e-04 -3.30209379e-06 1.35937650e-03 3.89048429e-11
8.48406857e-03 -1.36499030e-05]
Now, I would like to save the array to a pd.Series. I am taking the following approach:
features = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10"]
model_coefs = pd.Series(coefficients, index=features)
And, the system gives me the following error:
ValueError: Length of passed values is 1, index implies 10.
What I have tried:
Transposing the underlying array, coefficients, to give it a length of 10.
Reshaping the array to give it a shape of (10,1).
But nothing seems to work. I am not sure where I am going wrong.
For your case you want to flatten the array so .ravel should do the trick for example:
pd.Series(np.zeros((1, 10)).ravel(), index=features)
It's strange the coeffs output are of shape (1, 10), when I run the base sklearn example here (with multiple features) my coeffs are of 1-d:
In [27]: regr.coef_
array([ 3.03499549e-01, -2.37639315e+02, 5.10530605e+02, 3.27736980e+02,
-8.14131709e+02, 4.92814588e+02, 1.02848452e+02, 1.84606489e+02,
7.43519617e+02, 7.60951722e+01])
In [28]: regr.coef_.shape
Out[28]: (10,)

How to write into numpy arrays in shared memory with Ray?

I am attempting to rewrite Python multiprocessing code using Ray since it appears to be able to abstract shared memory management issues and perform parallel computation faster than straight multiprocessing (based on this article). My goal is to process all timeseries for a lat/lon grid (with both input and output arrays having shape [lat, lon, time]) in parallel without unnecessary copies of the input/output arrays. The idea is to have both input and output arrays in shared memory and multiple processes will read and write into the shared memory arrays so no copies/serialization are needed for each process to access the arrays being worked upon.
My use case is that I have a CPU-heavy function that I want to apply on all 1-D sub-arrays of a 3-D array. I have managed to make this execute much faster using a home-rolled approach for shared memory objects with multiprocessing, but the code is much more convoluted/complicated than I'm comfortable with and I'm hoping to simplify it using ray. However I've not yet worked out how to write into shared memory with ray, and without that, I don't see how this can be done. Hopefully someone reading this can suggest a solution.
I have a Jupyter notebook with a simple example of what I've tried to get this to work using ray.
Here's the gist:
I initialize my environment for ray and create a function that performs a simple operation on a 1-D slice of a 3-D array and writes the result into an output array, with the expectation that this function can run in parallel and read/write on shared memory representations of my input/output arrays:
import psutil
import numpy as np
import ray
num_cpus = psutil.cpu_count(logical=False)
ray.init(num_cpus=num_cpus, ignore_reinit_error=True)
def add_average_ray(
in_ary: np.ndarray,
out_ary: np.ndarray,
lat_index: int,
lon_index: int,
ary = in_ary[lat_index, lon_index]
out_ary[lat_index, lon_index] = ary + np.mean(ary)
Next, I create a function that will loop over a 3-D grid of values and apply the above function to each in parallel using ray:
def compute_with_ray(
input_array: np.ndarray,
) -> np.ndarray:
# create an output array that computed values will be written
output_array = np.full(shape=input_array.shape, fill_value=np.NaN)
# put the input and output arrays into ray's object store
in_array_id = ray.put(input_array)
out_array_id = ray.put(output_array)
# make a list of futures, one per lat/lon (assuming shape (lat, lon, time))
futures = []
for lat_index in range(input_array.shape[0]):
for lon_index in range(input_array.shape[1]):
futures.append(add_average_ray.remote(in_array_id, out_array_id, lat_index, lon_index))
# launch the remote tasks in parallel
return output_array
Next I make a an input array and exercise the code:
# create an array that can be used to represent a 2x2 cell lat/lon map with 3 times
tst_ary = np.array([[[1, 6, 5], [3, 2, 7]], [[8, 4., 6.], [9, 4, 2]]])
# exercise the ray remote function in parallel
average_added_ray = compute_with_ray(tst_ary)
Apparently this is not possible since the arrays that have been added into ray's object store are read-only, and it results in an error:
RayTaskError(ValueError): ray_worker (pid=5260, host=skypilot)
File "<ipython-input-5-0bf9c2bf3f2e>", line 9, in add_average_ray
ValueError: assignment destination is read-only
Is there a better way to approach/accomplish this parallel processing on numpy arrays using ray?

NumPy vectorization with integration

I have a vector and wish to make another vector of the same length whose k-th component is
The question is: how can we vectorize this for speed? NumPy vectorize() is actually a for loop, so it doesn't count.
Veedrac pointed out that "There is no way to apply a pure Python function to every element of a NumPy array without calling it that many times". Since I'm using NumPy functions rather than "pure Python" ones, I suppose it's possible to vectorize, but I don't know how.
import numpy as np
from scipy.integrate import quad
ws = 2 * np.random.random(10) - 1
n = len(ws)
integrals = np.empty(n)
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
def temp(x): return np.array([f(x, w) for w in ws]).sum()
def integrand(x, w): return f(x, w) * np.log(temp(x))
## Python for loop
for k in range(n):
integrals[k] = quad(integrand, -1, 1, args = ws[k])[0]
## NumPy vectorize
integrals = np.vectorize(quad)(integrand, -1, 1, args = ws)[0]
On a side note, is a Cython for loop always faster than NumPy vectorization?
The function quad executes an adaptive algorithm, which means the computations it performs depend on the specific thing being integrated. This cannot be vectorized in principle.
In your case, a for loop of length 10 is a non-issue. If the program takes long, it's because integration takes long, not because you have a for loop.
When you absolutely need to vectorize integration (not in the example above), use a non-adaptive method, with the understanding that precision may suffer. These can be directly applied to a 2D NumPy array obtained by evaluating all of your functions on some regularly spaced 1D array (a linspace). You'll have to choose the linspace yourself since the methods aren't adaptive.
numpy.trapz is the simplest and least precise
scipy.integrate.simps is equally easy to use and more precise (Simpson's rule requires an odd number of samples, but the method works around having an even number, too).
scipy.integrate.romb is in principle of higher accuracy than Simpson (for smooth data) but it requires the number of samples to be 2**n+1 for some integer n.
#zaq's answer focusing on quad is spot on. So I'll look at some other aspects of the problem.
In recent I argue that vectorize is of most value when you need to apply the full broadcasting mechanism to a function that only takes scalar values. Your quad qualifies as taking scalar inputs. But you are only iterating on one array, ws. The x that is passed on to your functions is generated by quad itself. quad and integrand are still Python functions, even if they use numpy operations.
cython improves low level iteration, stuff that it can convert to C code. Your primary iteration is at a high level, calling an imported function, quad. Cython can't touch or rewrite that.
You might be able to speed up integrand (and on down) with cython, but first focus on getting the most speed from that with regular numpy code.
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
With if w<0 w must be scalar. Can it be written so it works with an array w? If so, then
np.array([f(x, w) for w in ws]).sum()
could be rewritten as
fn(x, ws).sum()
Alternatively, since both x and w are scalar, you might get a bit of speed improvement by using math.exp etc instead of np.exp. Same for log and abs.
I'd try to write f(x,w) so it takes arrays for both x and w, returning a 2d result. If so, then temp and integrand would also work with arrays. Since quad feeds a scalar x, that may not help here, but with other integrators it could make a big difference.
If f(x,w) can be evaluated on a regular nx10 grid of x=np.linspace(-1,1,n) and ws, then an integral (of sorts) just requires a couple of summations over that space.
You can use quadpy for fully vectorized computation. You'll have to adapt your function to allow for vector inputs first, but that is done rather easily:
import numpy as np
import quadpy
ws = 2 * np.random.random(10) - 1
def f(x):
out = np.empty((len(ws), *x.shape))
out0 = np.abs(np.multiply.outer(ws, x))
out1 = np.multiply.outer(ws, np.exp(x))
out[ws < 0] = out0[ws < 0]
out[ws >= 0] = out1[ws >= 0]
return out
def integrand(x):
return f(x) * np.log(np.sum(f(x), axis=0))
val, err = quadpy.quad(integrand, -1, +1, epsabs=1.0e-10)
[0.3266534 1.44001826 0.68767868 0.30035222 0.18011948 0.97630376
0.14724906 2.62169217 3.10276876 0.27499376]

SciPy UnivariateSpline Specifying Axis?

Using scipy.interpolate.interp1d it is possible to pass in a (1080, 4) nd.array and compute an interpolation function for each 'row' in a single command:
spline = interp1d(np.arange(1,5), np.random.random(1080,4), kind='cubic')
I am getting slightly different interpolation results (off the knots) than some existing Fortran code. I believe this is because the SciPy source is using a b-spline and the Fortran code is using splines derived from numerical recipes.
I am attempting to perform the same interpolation using UnivariateSpline with s=0, so InterpolatedUnivariateSpline.
I am able to get this working if I pass the data row by row, i.e. using an iterator to step over all 1080 rows - this is highly inefficient.
spline = UnivariateSpline(np.arange(1,5).reshape(-1,1), np.random.random(1080,4), s=0, k=3)
I am seeing:
failed in converting 2nd argument `y' of dfitpack.fpcurf0 to C/Fortran array
I believe this is an issue getting the multi-dimensional array into Fitpack? Any insight in how to avoid an iterator? Additionally, any insight into a SciPy interpolation function that matches the one described in numerical recipes (section 3.3, p.120) - You have to type the page number, I can not direct link, it is a Flash viewer...
In older version of SciPy (I observed it in 0.14) the splines returned by interp1d were of relatively poor quality. In versions 0.19 and later, interp1d is consistent with other spline routines, and since it accepts vector inputs, I think that answers the question. Here is the comparison of three spline constructors: the latter two only take one row as input.
from scipy.interpolate import interp1d, UnivariateSpline, splrep, splev
x = np.arange(1, 5)
y = np.random.normal(size=(1080, 4))
spl1 = interp1d(x, y, kind='cubic')
spl2 = UnivariateSpline(x, y[123, :], s=0, k=3)
spl3 = splrep(x, y[123, :], s=0, k=3)
t = 2.345
print(spl1(t)[123], spl2(t), splev(t, spl3))
This prints (with my random numbers)
-0.333973049011 -0.333973049011 -0.333973049011