Wrong result of np.dot - numpy

I'm new in python and I'm trying to do the multiplication of a 2d matrix with a 1d one. I use np.dot to do it but it gives me a wrong output. I'm trying to do this:
#X_train.shape = 60000
w = np.zeros([784, 1])
lista = range (0, len(X_train))
for i in lista:
score = np.dot(X_train[i,:], w)
print score.shape
Out-> (1L,)
the output should be (60000,1)
Any idea of how I can resolve the problem?

You should avoid the for loop altogether. Indeed, np.dot is supposed to work on N-dim arrays and does the looping internally. See for example
In [1]: import numpy as np
In [2]: a = np.random.rand(1,2) # a.shape = (1,2)
In [3]: b = np.random.rand(2,3) # b.shape = (2,3)
In [4]: np.dot(a,b)
Out[4]: array([[ 0.33735571, 0.29272468, 0.09361096]])

Related

strange implicit conversion after append an empty list in numpy

After the thread strange implicit conversion of data type in numpy, I found another strange conversion with numpy
import numpy as np
a = np.array([1,2,3], dtype=int)
c = np.append(a, [])
print the c gives:
array([1., 2., 3.])
However, if:
c = np.append(a, [4])
gives:
array([1, 2, 3, 4])
why is there such strange automatic conversion? It does not make any sense at all
The empty list has to be first turned into an array:
In [149]: np.array([])
Out[149]: array([], dtype=float64)
np.append actually does:
In [151]: np.ravel([])
Out[151]: array([], dtype=float64)
The append code:
def append(arr, values, axis=None):
arr = asanyarray(arr)
if axis is None:
if arr.ndim != 1:
arr = arr.ravel()
values = ravel(values)
axis = arr.ndim-1
return concatenate((arr, values), axis=axis)

Speed up applying a transformation to each index value of a given array

I need to apply a function to the result of a transformation of all index values of a given numpy array. The following code does this:
import numpy as np
from matplotlib.transforms import IdentityTransform
# some 2D array
a = np.empty((2,3))
# some affine transformation, identity is just an example here
trans = IdentityTransform()
# some function taking a 2D index and returning some value depending
# on that index, again just an example
def f(idx):
return (idx[0]+idx[1])/2
# apply f to the result of transforming each index of a
b=np.empty_like(a)
for idx in np.ndindex(a.shape):
b[idx] = f(trans.transform(idx))
print(b)
This prints the following correct result:
[[0. 0.5 1. ]
[0.5 1. 1.5]]
The problem now is, the code is too slow when the shape of a gets larger, say 2000x3000. Is there a way to speed this up?
My idea is to create an array of indices of a idx = [[0,0], [0,1], ..., [1,2]], then transform this array in one go using something like tmp = trans.transform(idx), and lastly apply f to every element with np.vectorize(f)(tmp).
Is this a reasonable approach? If yes, how would this actually look like? If no, are there any alternatives?
Edit: I managed to get at tmp via the following code:
tmp=trans.transform(np.asarray([idx for idx in np.ndindex(a.shape)]))
So now I have an array containing the results of the affine transformation for every index value of a. But this seems to use an awful lot of memory.
I'll post an answer myself with what I figured out now. Maybe it is of use for someone.
To answer the first part of my question, I found a fast and efficient way to create the result of transforming the index values, using the result of np.indices() and then massaging the result of that until it fits to what t.transform() expects.
Given some array a = np.empty((2,3)), the indices of that array can be obtained via np.indices(a.shape). This returns two 2D arrays (one for each dimension of a, actually). What I failed to understand was how to turn these results into something transform() understands.
The key here is to apply np.ravel() to the result of each of those arrays, np.indices() returns:
>>> a=np.empty((2,3))
>>> list(map(np.ravel, np.indices(a.shape)))
[array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 1, 2])]
Now I have a list of arrays containing all the x and y indices, which just needs to be put together with np.vstack() and then transposed to get an array of all (x, y) indices, and this is the form transform() will accept.
>>> l=list(map(np.ravel, np.indices(a.shape)))
>>> np.vstack(l).transpose()
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 2]])
And finally, for some arbitrary affine transformation:
>>> from matplotlib.transforms import Affine2D
>>> t = Affine2D().translate(10, 20).scale(0.5)
>>> t.transform(np.vstack(l).transpose())
array([[ 5. , 10. ],
[ 5. , 10.5],
[ 5. , 11. ],
[ 5.5, 10. ],
[ 5.5, 10.5],
[ 5.5, 11. ]])
This is quite fast, even for larger array sizes. If the shape gets big enough (something like 20000x30000), I run out of memory, but for shapes 10000x10000 it still is amazingly fast.
>>> timeit.timeit("t.transform(np.vstack(list(map(np.ravel, np.indices(a.shape, dtype=np.uint16)))).transpose())",
... "import numpy as np ; from matplotlib.transforms import Affine2D ; a = np.empty((20, 10)) ; t = Affine2D().translate(10, 20).scale(0.5)", number=10)
0.0003051299718208611
>>> timeit.timeit("t.transform(np.vstack(list(map(np.ravel, np.indices(a.shape, dtype=np.uint16)))).transpose())",
... "import numpy as np ; from matplotlib.transforms import Affine2D ; a = np.empty((200, 100)) ; t = Affine2D().translate(10, 20).scale(0.5)", number=10)
0.0026413939776830375
>>> timeit.timeit("t.transform(np.vstack(list(map(np.ravel, np.indices(a.shape, dtype=np.uint16)))).transpose())",
... "import numpy as np ; from matplotlib.transforms import Affine2D ; a = np.empty((2000, 1000)) ; t = Affine2D().translate(10, 20).scale(0.5)", number=10)
0.35055489401565865
>>> timeit.timeit("t.transform(np.vstack(list(map(np.ravel, np.indices(a.shape, dtype=np.uint16)))).transpose())",
... "import numpy as np ; from matplotlib.transforms import Affine2D ; a = np.empty((20000, 10000)) ; t = Affine2D().translate(10, 20).scale(0.5)", number=10)
43.62860555597581
Now for the second part, for applying the function to each of the transformed index values I use the following code for now, which is fast enough in my case.
xxyy = t.transform(np.vstack(...).transpose())
np.fromiter((f(*xy) for xy in xxyy), dtype=np.short, count=len(xxyy))

reshape numpy 2d array into pandas 1d

I have a numpy array as follow
a.shape = (100, 500)
would like to tranform into pandas dataframe as follow
df.shape = (100 * 500, 1)
df[500*i+j,0] = a[i, j]
without loop...
I'm sure I'm missing something, but isn't it a simple flattening?
df = pd.DataFrame(a.flatten())
If I misunderstood what you mean by i and j, a transpose should do:
df = pd.DataFrame(a.T.flatten())

Numpy eigenvectors aren't eigenvectors?

I was doing some matrix calculations and wanted to calculate the eigenvalues and eigenvectors of this particular matrix:
I found its eigenvalues and eigenvectors analytically and wanted to confirm my answer using numpy.linalg.eigh, since this matrix is symmetric. Here is the problem: I find the expected eigenvalues, but the corresponding eigenvectors appear to be not eigenvectors at all
Here is the little piece of code I used:
import numpy as n
def createA():
#create the matrix A
m=3
T = n.diag(n.ones(m-1.),-1.) + n.diag(n.ones(m)*-4.) +\
n.diag(n.ones(m-1.),1.)
I = n.identity(m)
A = n.zeros([m*m,m*m])
for i in range(m):
a, b, c = i*m, (i+1)*m, (i+2)*m
A[a:b, a:b] = T
if i < m - 1:
A[b:c, a:b] = A[a:b, b:c] = I
return A
A = createA()
ev,vecs = n.linalg.eigh(A)
print vecs[0]
print n.dot(A,vecs[0])/ev[0]
So for the first eigenvalue/eigenvector pair, this yields:
[ 2.50000000e-01 5.00000000e-01 -5.42230975e-17 -4.66157689e-01
3.03192985e-01 2.56458619e-01 -7.84539156e-17 -5.00000000e-01
2.50000000e-01]
[ 0.14149052 0.21187998 -0.1107808 -0.35408209 0.20831606 0.06921674
0.14149052 -0.37390646 0.18211242]
In my understanding of the Eigenvalue problem, it appears that this vector doesn't suffice the equation A.vec = ev.vec, and that therefore this vector is no eigenvalue at all.
I am pretty sure the matrix A itself is correctly implemented and that there is a correct eigenvector. For example, my analytically derived eigenvector:
rvec = [0.25,-0.35355339,0.25,-0.35355339,0.5,-0.35355339,0.25,
-0.35355339,0.25]
b = n.dot(A,rvec)/ev[0]
print n.allclose(real,b)
yields True.
Can anyone, by any means, explain this strange behaviour? Am I misunderstanding the Eigenvalue problem? Might numpy be erroneous?
(As this is my first post here: my apologies for any unconventionalities in my question. Thanks you in advance for your patience.)
The eigen vectors are stored as column vectors as described here. So you have to use vecs[:,0] instead vecs[0]
For example this here works for me (I use eig because A is not symmetric)
import numpy as np
import numpy.linalg as LA
import numpy.random
A = numpy.random.randint(10,size=(4,4))
# array([[4, 7, 7, 7],
# [4, 1, 9, 1],
# [7, 3, 7, 7],
# [6, 4, 6, 5]])
eval,evec = LA.eig(A)
evec[:,0]
# array([ 0.55545073+0.j, 0.37209887+0.j, 0.56357432+0.j, 0.48518131+0.j])
np.dot(A,evec[:,0]) / eval[0]
# array([ 0.55545073+0.j, 0.37209887+0.j, 0.56357432+0.j, 0.48518131+0.j])

Turn 2D NumPy array into 1D array for plotting a histogram

I'm trying to plot a histogram with matplotlib.
I need to convert my one-line 2D Array
[[1,2,3,4]] # shape is (1,4)
into a 1D Array
[1,2,3,4] # shape is (4,)
How can I do this?
Adding ravel as another alternative for future searchers. From the docs,
It is equivalent to reshape(-1, order=order).
Since the array is 1xN, all of the following are equivalent:
arr1d = np.ravel(arr2d)
arr1d = arr2d.ravel()
arr1d = arr2d.flatten()
arr1d = np.reshape(arr2d, -1)
arr1d = arr2d.reshape(-1)
arr1d = arr2d[0, :]
You can directly index the column:
>>> import numpy as np
>>> x2 = np.array([[1,2,3,4]])
>>> x2.shape
(1, 4)
>>> x1 = x2[0,:]
>>> x1
array([1, 2, 3, 4])
>>> x1.shape
(4,)
Or you can use squeeze:
>>> xs = np.squeeze(x2)
>>> xs
array([1, 2, 3, 4])
>>> xs.shape
(4,)
reshape will do the trick.
There's also a more specific function, flatten, that appears to do exactly what you want.
the answer provided by mtrw does the trick for an array that actually only has one line like this one, however if you have a 2d array, with values in two dimension you can convert it as follows
a = np.array([[1,2,3],[4,5,6]])
From here you can find the shape of the array with np.shape and find the product of that with np.product this now results in the number of elements. If you now use np.reshape() to reshape the array to one length of the total number of element you will have a solution that always works.
np.reshape(a, np.product(a.shape))
>>> array([1, 2, 3, 4, 5, 6])
Use numpy.flat
import numpy as np
import matplotlib.pyplot as plt
a = np.array([[1,0,0,1],
[2,0,1,0]])
plt.hist(a.flat, [0,1,2,3])
The flat property returns a 1D iterator over your 2D array. This method generalizes to any number of rows (or dimensions). For large arrays it can be much more efficient than making a flattened copy.