Numpy Array Shape Issue - numpy

I have initialized this empty 2d np.array
inputs = np.empty((300, 2), int)
And I am attempting to append a 2d row to it as such
inputs = np.append(inputs, np.array([1,2]), axis=0)
But Im getting
ValueError: all the input arrays must have same number of dimensions
And Numpy thinks it's a 2 row 0 dimensional object (transpose of 2d)
np.array([1, 2]).shape
(2,)
Where have I gone wrong?

To add a row to a (300,2) shape array, you need a (1,2) shape array. Note the matching 2nd dimension.
np.array([[1,2]]) works. So does np.array([1,2])[None, :] and np.atleast_2d([1,2]).
I encourage the use of np.concatenate. It forces you to think more carefully about the dimensions.
Do you really want to start with np.empty? Look at its values. They are random, and probably large.
#Divakar suggests np.row_stack. That puzzled me a bit, until I checked and found that it is just another name for np.vstack. That function passes all inputs through np.atleast_2d before doing np.concatenate. So ultimately the same solution - turn the (2,) array into a (1,2)

Numpy requires double brackets to declare an array literal, so
np.array([1,2])
needs to be
np.array([[1,2]])

If you intend to append that as the last row into inputs, you can just simply use np.row_stack -
np.row_stack((inputs,np.array([1,2])))
Please note this np.array([1,2]) is a 1D array.
You can even pass it a 2D row version for the same result -
np.row_stack((inputs,np.array([[1,2]])))

Related

assign certain entries of Tensor, like set_subtensor of Theano

Can I just assign values to certain entries in a tensor? I got this problems when I compute the cross correlation matrix of a NxP feature matrix feats, where N is observations and P is dimension. Some columns are constant so the standard deviation is zero, and I don't want to devide by std for those constant column. Here is what I did:
fmean, fvar = tf.nn.moments(feats, axes = [0], keep_dims = False)
fstd = tf.sqrt(fvar)
feats = feats - fmean
sel = (fstd != 0)
feats[:, sel] = feats[:, sel]/ fstd[sel]
corr = tf.matmul(tf.transpose(feats), feats)
However, I got this error: TypeError: 'Tensor' object does not support item assignment. Is there any workaround for such issue?
You can make your feats a tf.Variable and use tf.scatter_update to update locations selectively.
It's a bit awkward in that scatter_update needs a list of linear indices to update, so you'd need to convert your [:, sel] implicit 2D specification into explicit list of 1D indices. There's example of constructing 1D indices from 2D here
There's some work in simplifying this kind of use-case in issue #206

how to switch a tensor in theano to numpy.array

I have a tensor T (shape:300) and a array A(shape:300), what i want to do is combine them into a new array [T,A] with the shape (600). I tried the solutiona below:
1 combine directly,use function: np.concatenate((T,A)), the result is:zero-dimensional arrays cannot be concatenated
2 switch one type to another, try to switch the T to the type of numpy.array: i use: a=np.array(T), but when print a.shape, it is (), nothing in the bracket.
Besides, when i print T.shape and A.shape, T.shape is ([300]) and A.shape is (300,)what is the difference?
when we want to get a numpy.array from a tensor T, it can be done by T.eval(), i tried a lot and found this way. But i haven't found the way switched from numpy.array to tensor T yet. Anyone can help?

Should a pandas dataframe column be converted in some way before passing it to a scikit learn regressor?

I have a pandas dataframe and passing df[list_of_columns] as X and df[[single_column]] as Y to a Random Forest regressor.
What does the following warnning mean and what should be done to resolve it?
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). probas = cfr.fit(trainset_X, trainset_Y).predict(testset_X)
Simply check the shape of your Y variable, it should be a one-dimensional object, and you are probably passing something with more (possibly trivial) dimensions. Reshape it to the form of list/1d array.
You can use df.single_column.values or df['single_column'].values to get the underlying numpy array of your series (which, in this case, should also have the correct 1D-shape as mentioned by lejlot).
Actually the warning tells you exactly what is the problem:
You pass a 2d array which happened to be in the form (X, 1), but the method expects a 1d array and has to be in the form (X, ).
Moreover the warning tells you what to do to transform to the form you need: y.values.ravel().
Use Y = df[[single_column]].values.ravel() solves DataConversionWarning for me.

numpy concatenate of empty with non empty arrays yields in float

I just found that concatenating an empty array with a non-empty array yielded in a one value array containing the non-empty array but changed to a float.
for example:
import numpy as np
np.concatenate([1], [1])
array([1, 1])
but
np.concatenate([], [1])
array([1.])
this works the same with np.hstack
By default, the empty array in the code
np.concatenate([], [1])
is initialized with dtype=float, and concatenate casts the second int array to float.
Now, it's worth asking if it ever happens that you use concatenate on empty arrays. Clearly, you never write code like
a=array([1,2,3])#int array
b=np.concatenate([], a)
One case scenario where it may happens follows:
a=array([1,2,3])#int array
b=concatenate((a[:j],a)) #usually j!=0 here
Then for some reasons the code is run with j=0. it is true that a[:0] is empty, but it still retains the dtype=int and the result of concatenate is an array of integer anyway, as you expected.
So I would say that yes, your example shows somehow an unexpected behaviour at first sight, but it's quite harmless.

Iterating over multidimensional arrays(images) with numpy array - python

Hy!
I have two images(same dimension) as numpy array imgA - imgB
i would like to iterate each row and column and get somenthing like that:
for i in range(0, h-1):
for j in range(0, w-1):
final[i][j]= imgA[i,j] - imgB[i-k[i],j]
where h and w are the height and the width of the image and k is and array with dimension[h*w].
i have seen this topic:
Iterating over a numpy array
but it doens't work with images, i get the error: too many values to unpack
Is there any way to do that with numpy and python 2.7?
thanks
edit
I try to explain better myself.
I have 2 images in LAB color space.
these images are (288,384,3).
Now I would like to make deltaE so I could do like that(spitting the 2 arrays):
imgLabL=np.dsplit(imgL,3)
imgLabR=np.dsplit(imgR,3)
imgLl=imgLabL[0]
imgLa=imgLabL[1]
imgLb=imgLabL[2]
imgRl=imgLabR[0]
imgRa=imgLabR[1]
imgRb=imgLabR[2]
delta=np.sqrt(((imgLl-imgRl)**2) + ((imgLa - imgRa)**2) + ((imgLb - imgRb)**2) )
Till now everything is fine.
But now i have this array k of size (288,384).
So now i need a new delta but with different x axis,like the pixel in imgRl(0,0) i want to add the pixel in imgLl(0+k,0)
do you get more my problems?
I'm pretty sure that whatever it is you are trying to do can be vectorized and run without any loops in it. But the way your code is written, it is no surprise that it doesn't work...
If k is an array of shape (h, w), then k[i] is an array of shape (w,). when you do i-k[i], numpy will do its broadcasting magic, and you will get an array of shape (w,). So you are indexing imgB with an array of shape (w,) and a single integer. Because one of the items in the indexing is an array, fancy indexing kicks in. So assuming imgB also has shape (h, w, 1), the return value of imgB[i-k[i], j] will not be an array of shape (1,), but an array of shape (w, 1). When you then try to substract that from imgA[i, j], which is an array of shape (1,), broadcasting magic works again, and so you get an array of shape (w, 1).
We do not know what is final. But if it is an array of shape (h, w, 1), as imgA and imgB, then final[i][j] is an array of shape (1,), and you are trying to assign to it an array of shape (w, 1), which does not fit. Hence the operand requires a reduction,but reduction is not enabled error message.
EDIT
You don't really need to split your arrays to compute DeltaE...
def deltaE(a, b) :
return np.sqrt(((a - b)**2).sum(axis=-1))
delta = deltaE(imgLabL, imgLabR)
I still don't understand what you want to do in the second case... If you want to compare the two images displaced along the x-axis, I would suggest using np.roll:
deltaE(imgLabL, np.roll(imgLabR, k, axis=0))
will have at position (r, c) the deltaE between the pixel (r, c) of imgLabL and the pixel (r - k, c) of imgLAbR. Is that what you want?
I usually use numpy.nditer, the docs for which are here and have many examples. Briefly:
import numpy as np
a = np.ones([4,4])
it = np.nditer(a)
for elem in a:
#do stuff
You can also use c style iteration, i.e.
while not it.finished:
#do stuff
it.iternext()
If you need to access the indices of your arrays. In your situation, I would zip your two images together to create an array of shape [2,h,w] and then iterate over this, filling an empty array with the results of the computation.