Why does numpy reshape mess up my data pattern? - numpy

Let's say I have the following array A -
import numpy as np
batch_size, seq_len = 3, 5
A = np.zeros((batch_size, seq_len))
A[0,0:] = 1
A[1,0:] = 2
A[2,0:] = 3
A has the following value -
array([[1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2.],
[3., 3., 3., 3., 3.]])
Now, if I reshape it in the following way -
A4 = A.reshape(seq_len, -1)
array([[1., 1., 1.],
[1., 1., 2.],
[2., 2., 2.],
[2., 3., 3.],
[3., 3., 3.]])
However, I expected it to be -
array([[1., 2., 3.],
[1., 2., 3.],
[1., 2., 3.],
[1., 2., 3.],
[1., 2., 3.]])
Kudos to this awesome blog post bringing my attention to this problem - https://discuss.pytorch.org/t/for-beginners-do-not-use-view-or-reshape-to-swap-dimensions-of-tensors/75524

From the np.reshape docs
You can think of reshaping as first raveling the array (using the given index order), then inserting the elements from the raveled array into the new array using the same kind of index ordering as was used for the raveling.
a4 is (5,3) with the elements in the same order [1,1,1,1,1,2,2,...]

Related

Filtering a ndarray in numpy

I have a ndarray and I want to filter out a particular value of it. My array is:
arr = np.array([
[1., 6., 1.],
[1., 7., 0.],
[1., 8., 0.],
[3., 5., 1.],
[5., 1., 1.],
[5., 2., 2.],
[6., 1., 1.],
[6., 2., 2.],
[6., 7., 3.],
[6., 8., 0.]
])
I want to filter out [6., 1., 1.]. So I have tried:
arr[arr != [6., 1., 1.]]
and I got:
array([1., 6., 1., 7., 0., 1., 8., 0., 3., 5., 5., 5., 2., 2., 2., 2., 7.,
3., 8., 0.])
which is not what I want (and also destroyed the previous structure of the array). I have also tried:
arr[arr[:] != [6., 1., 1.]]
but I got the same output as before.
P.S.: I know I can delete an element by its index, but I don't want to do that. I want to check for the particular element.
P.P.S.: For 1-d arrays my method works.
You're very close. The boolean array you get tells you how many elements match in each row. You need to make sure that all the elements in a row match to delete it, or that any of the elements don't match to keep it:
arr[(arr != [6, 1, 1]).any(axis=1)]
You can also write it as
arr[~(arr == [6, 1, 1]).all(axis=1)]

How does Reflection/Mirroring padding work? In numpy or in general?

I am running the following against a vector say [1,2,3]. The first 2 of them I can explain. Each additional padded coordinate is mirrored around the last element (3). However after that I can't.
There's definitely a cycle of 4 here which means a mod of 2*(len(a) -1).
I'd appreciate if someone broke this down. This example is for end reflection. If the begin reflection is any different I'd love to hear about that too:
>>> a
array([1., 2., 3.])
>>> np.pad(a, ((0,1)), 'reflect')
array([1., 2., 3., 2.])
>>> np.pad(a, ((0,2)), 'reflect')
array([1., 2., 3., 2., 1.])
>>> np.pad(a, ((0,3)), 'reflect')
array([1., 2., 3., 2., 1., 2.])
>>> np.pad(a, ((0,4)), 'reflect')
array([1., 2., 3., 2., 1., 2., 3.])
>>> np.pad(a, ((0,5)), 'reflect')
array([1., 2., 3., 2., 1., 2., 3., 2.])
>>> np.pad(a, ((0,6)), 'reflect')
array([1., 2., 3., 2., 1., 2., 3., 2., 1.])
>>> np.pad(a, ((0,7)), 'reflect')
array([1., 2., 3., 2., 1., 2., 3., 2., 1., 2.])
>>> np.pad(a, ((0,8)), 'reflect')
array([1., 2., 3., 2., 1., 2., 3., 2., 1., 2., 3.])
>>> np.pad(a, ((0,9)), 'reflect')
array([1., 2., 3., 2., 1., 2., 3., 2., 1., 2., 3., 2.])
Imagine stepping through the original array, and every time you hit a boundary you go the other direction.
When you progress to the right and get to the end, you reflect and start iterating back to the beginning. When you progress to the left and get to the beginning, you reflect and start iterating back to the end.
It might help to visualize the sequence this way, as a series of reflections:
[1, 2, 3]
1
2
3
2
1
2
3
2
1
2
3
2
1

Torch7: Addition layer with Tensors of inconsistent size like numpy

I would like to add two tensors with one different dimmension.
For example:
x = torch.ones(4,5)
y = torch.ones(4,3,5)
In numpy I'm cappable of doing this with:
import numpy as np
x = np.ones((4,5))
y = np.ones((4,3,5))
y + x[:,None,:]
#Prints out
array([[[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]],
[[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]],
[[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]],
[[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]]])
It has a shape of (4,3,5)
Is there any way to reproduce this on a nn.CMulTable()? When I view x tensor like this x:view(4,1,5) it give me an error inconsistent tensor size.
m = nn.CAddTable()
m:forward({y, x:view(4,1,5)})
Any ideas?
Use expandAs to obtain a 4x3x5 tensor:
m:forward({y, x:view(4,1,5):expandAs(y)})

Does numpy provide methods for fundamental matrix operations?

Namely, rearranging rows, adding multiples of rows, and multiplying by scalars.
I don't see these methods defined in http://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html or elsewhere.
And if they aren't defined, then why not?
Yes, you can manipulate array rows, adding and multiplying them. For example:
In [1]: import numpy as np
In [2]: m = np.ones((3, 4))
In [3]: m
Out[3]:
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
In [4]: m[1, :] = 2*m[1, :] # Multiply
In [5]: m
Out[5]:
array([[ 1., 1., 1., 1.],
[ 2., 2., 2., 2.],
[ 1., 1., 1., 1.]])
In [6]: m[0, :] = m[0, :] + 2*m[1, :] # Multiply and add
In [7]: m
Out[7]:
array([[ 5., 5., 5., 5.],
[ 2., 2., 2., 2.],
[ 1., 1., 1., 1.]])
In [8]: m[ (0, 2), :] = m[ (2, 0), :] # Swap rows
In [9]: m
Out[9]:
array([[ 1., 1., 1., 1.],
[ 2., 2., 2., 2.],
[ 5., 5., 5., 5.]])

vstack numpy arrays

If I have two ndarrays:
a.shape # returns (200,300, 3)
b.shape # returns (200, 300)
numpy.vstack((a,b)) # Gives error
Would print out the error:
ValueError: arrays must have same number of dimensions
I tried doing vstack((a.reshape(-1,300), b) which kind of works, but the output is very weird.
You don't specify what final shape you actually want. If it's (200, 300, 4), you can use dstack instead:
>>> import numpy as np
>>> a = np.random.random((200,300,3))
>>> b = np.random.random((200,300))
>>> c = np.dstack((a,b))
>>> c.shape
(200, 300, 4)
Basically, when you're stacking, the lengths have to agree in all the other axes.
[Updated based on comment:]
If you want (800, 300) you could try something like this:
>>> a = np.ones((2, 3, 3)) * np.array([1,2,3])
>>> b = np.ones((2, 3)) * 4
>>> c = np.dstack((a,b))
>>> c
array([[[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]],
[[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]]])
>>> c.T.reshape(c.shape[0]*c.shape[-1], -1)
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 2., 2., 2.],
[ 2., 2., 2.],
[ 3., 3., 3.],
[ 3., 3., 3.],
[ 4., 4., 4.],
[ 4., 4., 4.]])
>>> c.T.reshape(c.shape[0]*c.shape[-1], -1).shape
(8, 3)