Writing SKLearn Regresion Coefficients To Pandas Series - pandas

I have a regression model that I fit in SKlearn's LinearRegression module:
To extract the coefficients, I used the code;
coefficients = model.coef_
It produced the following array with a shape of (1, 10):
[-4.72307152e-05 1.29731143e-04 8.75483702e-05 -6.28749019e-04
1.75096740e-04 -3.30209379e-06 1.35937650e-03 3.89048429e-11
8.48406857e-03 -1.36499030e-05]
Now, I would like to save the array to a pd.Series. I am taking the following approach:
features = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10"]
model_coefs = pd.Series(coefficients, index=features)
And, the system gives me the following error:
ValueError: Length of passed values is 1, index implies 10.
What I have tried:
Transposing the underlying array, coefficients, to give it a length of 10.
Reshaping the array to give it a shape of (10,1).
But nothing seems to work. I am not sure where I am going wrong.

For your case you want to flatten the array so .ravel should do the trick for example:
pd.Series(np.zeros((1, 10)).ravel(), index=features)
It's strange the coeffs output are of shape (1, 10), when I run the base sklearn example here (with multiple features) my coeffs are of 1-d:
In [27]: regr.coef_
Out[27]:
array([ 3.03499549e-01, -2.37639315e+02, 5.10530605e+02, 3.27736980e+02,
-8.14131709e+02, 4.92814588e+02, 1.02848452e+02, 1.84606489e+02,
7.43519617e+02, 7.60951722e+01])
In [28]: regr.coef_.shape
Out[28]: (10,)

Related

Efficient axis-wise cartesian product of multiple 2D matrices with Numpy or TensorFlow

So first off, I think what I'm trying to achieve is some sort of Cartesian product but elementwise, across the columns only.
What I'm trying to do is, if you have multiple 2D arrays of size [ (N,D1), (N,D2), (N,D3)...(N,Dn) ]
The result is thus to be a combinatorial product across axis=1 such that the final result will then be of shape (N, D) where D=D1*D2*D3*...Dn
e.g.
A = np.array([[1,2],
[3,4]])
B = np.array([[10,20,30],
[5,6,7]])
cartesian_product( [A,B], axis=1 )
>> np.array([[ 1*10, 1*20, 1*30, 2*10, 2*20, 2*30 ]
[ 3*5, 3*6, 3*7, 4*5, 4*6, 4*7 ]])
and extendable to cartesian_product([A,B,C,D...], axis=1)
e.g.
A = np.array([[1,2],
[3,4]])
B = np.array([[10,20],
[5,6]])
C = np.array([[50, 0],
[60, 8]])
cartesian_product( [A,B,C], axis=1 )
>> np.array([[ 1*10*50, 1*10*0, 1*20*50, 1*20*0, 2*10*50, 2*10*0, 2*20*50, 2*20*0]
[ 3*5*60, 3*5*8, 3*6*60, 3*6*8, 4*5*60, 4*5*8, 4*6*60, 4*6*8]])
I have a working solution that essentially creates an empty (N,D) matrix and then broadcasting a vector columnwise product for each column within nested for loops for each matrix in the provided list. Clearly is horrible once the arrays get larger!
Is there an existing solution within numpy or tensorflow for this? Potentially one that is efficiently paralleizable (A tensorflow solution would be wonderful but a numpy is ok and as long as the vector logic is clear then it shouldn't be hard to make a tf equivalent)
I'm not sure if I need to use einsum, tensordot, meshgrid or some combination thereof to achieve this. I have a solution but only for single-dimension vectors from https://stackoverflow.com/a/11146645/2123721 even though that solution says to work for arbitrary dimensions array (which appears to mean vectors). With that one i can do a .prod(axis=1), but again this is only valid for vectors.
thanks!
Here's one approach to do this iteratively in an accumulating manner making use of broadcasting after extending dimensions for each pair from the list of arrays for elmentwise multiplications -
L = [A,B,C] # list of arrays
n = L[0].shape[0]
out = (L[1][:,None]*L[0][:,:,None]).reshape(n,-1)
for i in L[2:]:
out = (i[:,None]*out[:,:,None]).reshape(n,-1)

column_stack returns non cotiguous array

I am having a problem in my code with non contiguous arrays.
In particular I get the following warning message:
C:\Program Files\Anaconda2\lib\site-packages\skimage\util\shape.py:247: RuntimeWarning: Cannot provide views on a non-contiguous input array without copying.
warn(RuntimeWarning("Cannot provide views on a non-contiguous input "
I am using np.column_stack
import numpy as np
x = np.array([1,2,3,4])
y = np.array([5,6,7,8])
stack = np.column_stack((x,y))
stack.flags.f_contiguous
Out[2]: False
but I get a non contiguous array
Do you know how can I get contigous array? should I use always ascontiguousarray after column_stack?
np.stack([x, y]) is not contiguous. However, np.stack([x, y]).T is.
np.stack([x, y]) # Transpose of what you want and not contiguous
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Instead:
stack = np.stack([x, y]).T
In [276]: xy=np.column_stack((x,y))
In [277]: np.info(xy)
class: ndarray
shape: (4, 2)
strides: (8, 4)
itemsize: 4
aligned: True
contiguous: True
fortran: False
data pointer: 0xa836ec0
byteorder: little
byteswap: False
type: int32
The skimage code, https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py
# -- build rolling window view
if not arr_in.flags.contiguous:
warn(RuntimeWarning("Cannot provide views on a non-contiguous input "
"array without copying."))
arr_in = np.ascontiguousarray(arr_in)
That test, on the column_stack is ok:
In [278]: xy.flags.contiguous
Out[278]: True
In [279]: xy.T.flags.contiguous
Out[279]: False
Normally constructed 2d arrays are contiguous. But their transpose is F-contiguous. The warning is that np.ascontiguousarray will produce a copy. For very large arrays that could be a problem.
If this warning comes up often you could either suppress it, or routinely use ascontiguousarray before calling this function.
RuntimeWarning: Cannot provide views on a non-contiguous input array without copying

trying to reshape a numpy array to from 4X4 to 2X2

I have a simple numpy array of ('left_lines =', (4, 4)) which is [x1,y1,x2,y2] which is 4x4 and trying to reshape it to 2x2 and take the mean for X's and Y's . I use this code:
mean_left = np.mean(left_lines.reshape(2,2),axis=0)
But I get this error:
total size of new array must be unchanged
print(type(left_lines))
Gives:
<type 'numpy.ndarray'>
Not sure what is wrong with reshape syntax !?

numpy argmax not getting all values from generator expression

The output of the following
import numpy as np
print(np.argmax([i for i in range(0, 10)]))
print(np.argmax(i for i in range(0, 10)))
is
9
0
Why does argmax reduce the generator expression only once?
Compare these two expressions:
In [682]: np.asarray([i for i in range(3)])
Out[682]: array([0, 1, 2])
In [683]: np.asarray(i for i in range(3))
Out[683]: array(<generator object <genexpr> at 0xb367bb9c>, dtype=object)
asarray (or array) applied to a list produces an array with numbers. The same thing applied to the generator produces a dtype=object array with 1 item, the generator itself. In fact its shape is () (0d). You can recover this generator with np.array(i for i in range(3))[()]
fromiter can iterate a generator, but array only iterates on things like lists and tuples.
In [688]: np.fromiter((i for i in range(3)),int)
Out[688]: array([0, 1, 2])
And argmax depends on its input being an array.
As i have don't have required reputation amount i am adding this an answer.
As suggested by #hpaulj, np.argmax calls asarray function in numeric.py. Here the developers have mentioned this in the code:
def asarray(a, dtype=None, order=None):
"""Convert the input to an array.
Parameters
----------
a : array_like
Input data, in any form that can be converted to an array. This
includes lists, lists of tuples, tuples, tuples of tuples, tuples
of lists and ndarrays.
...
Hence your a doesn't match the requirements. Also why zero is returned for any input. This return value is from the function
result = getattr(asarray(obj), method)(*args, **kwds)
in fromnumeric.py which is called first. as for a generator object the code can't resolve the method, this function might return 0 as default
This was my research regarding the question

Why does MinMaxScaler add lines to image?

I want to normalize the pixel values of an image to the range [0, 1] for each channel (R, G, B).
Minimal Example
#!/usr/bin/env python
import numpy as np
import scipy
from sklearn import preprocessing
original = scipy.misc.imread('Crocodylus-johnsoni-3.jpg')
scipy.misc.imshow(original)
transformed = np.zeros(original.shape, dtype=np.float64)
scaler = preprocessing.MinMaxScaler()
for channel in range(3):
transformed[:, :, channel] = scaler.fit_transform(original[:, :, channel])
scipy.misc.imsave("transformed.jpg", transformed)
What happens
Taking https://commons.wikimedia.org/wiki/File:Crocodylus-johnsoni-3.jpg,
I get the following "normalized" result:
As you can see there are lines from top to bottom at the right side. What happened there? It seems to me that the normalization went wrong. If so: How do I fix it?
In scikit-learn, a two-dimensional array with shape (m, n) is usually interpreted as a collection of m samples, with each sample having n features.
MinMaxScaler.fit_transform() transforms each feature, so each column of your array is transformed independently of the others. That results in the vertical "stripes" in the image.
It looks like you intended to scale each color channel independently. To do that using MinMaxScaler, reshape the input so that each channel becomes one column. That is, if the original image has shape (m, n, 3), reshape it to (m*n, 3) before passing it to the fit_transform() method, and then restore the shape of the result to create the transformed array.
For example,
ascolumns = original.reshape(-1, 3)
t = scaler.fit_transform(ascolumns)
transformed = t.reshape(original.shape)
With this, transformed looks like this:
The image looks exactly like the original, because it turns out that in the array original, the minimum and maximum are 0 and 255, respectively, in each channel:
In [41]: original.min(axis=(0, 1))
Out[41]: array([0, 0, 0], dtype=uint8)
In [42]: original.max(axis=(0, 1))
Out[42]: array([255, 255, 255], dtype=uint8)
So all fit_transform does in this case is transform all the input values to the floating point range [0.0, 1.0] uniformly. If the minimum or maximum was different in one of the channels, the transformed image would look different.
By the way, it is not difficult to perform the transform using pure numpy. (I'm using Python 3, so in the following, the division automatically casts the result to floating point. If you are using Python 2, you'll need to convert one of the argument to floating point, or use from __future__ import division.)
In [58]: omin = original.min(axis=(0, 1), keepdims=True)
In [59]: omax = original.max(axis=(0, 1), keepdims=True)
In [60]: xformed = (original - omin)/(omax - omin)
In [61]: np.allclose(xformed, transformed)
Out[61]: True
(One potential problem with that method is that it will generate an error if one of the channels is constant, because then one of the values in omax - omin will be 0.)