Differences between X.ravel() and X.reshape(s0*s1*s2) when number of axes known - numpy

Seeing this answer I am wondering if the creation of a flattened view of X are essentially the same, as long as I know that the number of axes in X is 3:
A = X.ravel()
s0, s1, s2 = X.shape
B = X.reshape(s0*s1*s2)
C = X.reshape(-1) # thanks to #hpaulj below
I'm not asking if A and B and C are the same.
I'm wondering if the particular use of ravel and reshape in this situation are essentially the same, or if there are significant differences, advantages, or disadvantages to one or the other, provided that you know the number of axes of X ahead of time.
The second method takes a few microseconds, but that does not seem to be size dependent.

Look at their __array_interface__ and do some timings. The only difference that I can see is that ravel is faster.
.flatten() has a more significant difference - it returns a copy.
A.reshape(-1)
is a simpler way to use reshape.
You could study the respective docs, and see if there is something else. I haven't explored what happens when you specify order.
I would use ravel if I just want it to be 1d. I use .reshape most often to change a 1d (e.g. arange()) to nd.
e.g.
np.arange(10).reshape(2,5).ravel()
Or choose the one that makes your code most readable.
reshape and ravel are defined in numpy C code:
In https://github.com/numpy/numpy/blob/0703f55f4db7a87c5a9e02d5165309994b9b13fd/numpy/core/src/multiarray/shape.c
PyArray_Ravel(PyArrayObject *arr, NPY_ORDER order) requires nearly 100 lines of C code. And it punts to PyArray_Flatten if the order changes.
In the same file, reshape punts to newshape. That in turn returns a view is the shape doesn't actually change, tries _attempt_nocopy_reshape, and as last resort returns a PyArray_NewCopy.
Both make use of PyArray_Newshape and PyArray_NewFromDescr - depending on how shapes and order mix and match.
So identifying where reshape (to 1d) and ravel are different would require careful study.
Another way to do this ravel is to make a new array, with a new shape, but the same data buffer:
np.ndarray((24,),buffer=A.data)
It times the same as reshape. Its __array_interface__ is the same. I don't recommend using this method, but it may clarify what is going on with these reshape/ravel functions. They all make a new array, with new shape, but with share data (if possible). Timing differences are the result of different sequences of function calls - in Python and C - not in different handling of the data.

Related

numpy: Different results between diff and gradient for finite differences

I want to calculate the numerical derivative of two arrays a and b.
If I do
c = diff(a) / diff(b)
I get what I want, but I loose the edge (the last point) so c.shape ~= a.shape.
If I do
c = gradient(a, b)
then c.shape = a.shape, but I get a completely different result.
I have read how gradient is calculated in numpy and I guess it does a completely different thing, although I dont understand quite well the difference yet. But is there a way or another function to calculate the differential which also gives the values at the edges?
And why is the result so different between gradient and diff?
These functions, although related, do different actions.
np.diff simply takes the differences of matrix slices along a given axis, and used for n-th difference returns a matrix smaller by n along the given axis (what you observed in the n=1 case). Please see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.diff.html
np.gradient produces a set of gradients of an array along all its dimensions while preserving its shape https://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html Please also observe that np.gradient should be executed for one input array, your second argument b does not make sense here (was interpreted as first non-keyword argument from *varargs which is meant to describe spacings between the values of the first argument), hence the results that don't match your intuition.
I would simply use c = diff(a) / diff(b) and append values to c if you really need to have c.shape match a.shape. For instance, you might append zeros if you expect the gradient to vanish close to the edges of your window.

In tensorflow, how to gather with indices matching params first dimensions?

If i have indices of shape (D_0,...,D_k) and params of shape (D_0,...,D_k,I,F) (with 0 ≤ indices[i_0,...,i_k] < I), what is the fastest/most elegant way to get the array output of shape (D_0,...,D_k,F) with
output[i_0,...,i_k,f]=params[i_0,...,i_k,indices[i_0,...,i_k],f]
If k=0, then we can use gather. So, in the past, I had a solution based on flattening. Is there a nicer solution now that tensorflow has matured?
Most of the times, when I want this type of gathering, indices is obtained by indices = tf.argmax(params[:,...,:,:,0]). For every (i_0,...,i_k), I have I vectors of size (F,) and I want to keep only those with the maximal value for one of the features. A solution which would only work for this special case (a kind of reduce_max only using one feature to decide how to reduce) would satisfy me.

any reasons for inconsistent numpy arguments of numpy.zeros and numpy.random.randn

I'm implementing a computation using numpy zeros and numpy.random.randn
W1 = np.random.randn(n_h, n_x) * .01
b1 = np.zeros((n_h, 1))
I'm not sure why random.randn() can accept two integers while zeros() needs a tuple. Is there a good reason for that?
Cheers, JChen.
Most likely it's just a matter of history. numpy results from the merger of several prior packages, and has a long development. Some quirks get cleaned up, others left as is.
randn(d0, d1, ..., dn)
zeros(shape, dtype=float, order='C')
randn has this note:
This is a convenience function. If you want an interface that takes a
tuple as the first argument, use numpy.random.standard_normal instead.
standard_normal(size=None)
With * it is easy to pass a tuple to randn:
np.random.randn(*(1,2,3))
np.zeros takes a couple of keyword arguments. randn does not. You can define a Python function with a (*args, **kwargs) signature. But accepting a tuple, especially one with a common usage as shape, fits better. But that's a matter of opinion.
np.random.rand and np.random.random_sample are another such pair. Most likely rand and randn are the older versions, and standard_normal and random_sample are newer ones designed to conform to the more common tuple style.

Why is tf.transpose so important in a RNN?

I've been reading the docs to learn TensorFlow and have been struggling on when to use the following functions and their purpose.
tf.split()
tf.reshape()
tf.transpose()
My guess so far is that:
tf.split() is used because inputs must be a sequence.
tf.reshape() is used to make the shapes compatible (Incorrect shapes tends to be a common problem / mistake for me). I used numpy for this before. I'll probably stick to tf.reshape() now. I am not sure if there is a difference between the two.
tf.transpose() swaps the rows and columns from my understanding. If I don't use tf.transpose() my loss doesn't go down. If the parameter values are incorrect the loss doesn't go down. So the purpose of me using tf.transpose() is so that my loss goes down and my predictions become more accurate.
This bothers me tremendously because I'm using tf.transpose() because I have to and have no understanding why it's such an important factor. I'm assuming if it's not used correctly the inputs and labels can be in the wrong position. Making it impossible for the model to learn. If this is true how can I go about using tf.transpose() so that I am not so reliant on figuring out the parameter values via trial and error?
Question
Why do I need tf.transpose()?
What is the purpose of tf.transpose()?
Answer
Why do I need tf.transpose()? I can't imagine why you would need it unless you coded your solution from the beginning to require it. For example, suppose I have 120 student records with 50 stats per student and I want to use that to try and make a linear association with their chance of taking 3 classes. I'd state it like so
c = r x m
r = records, a matrix with a shape if [120x50]
m = the induction matrix. it has a shape of [50x3]
c = the chance of all students taking one of three courses, a matrix with a shape of [120x3]
Now if instead of making m [50x3], we goofed and made m [3x50], then we'd have to transpose it before multiplication.
What is the purpose of tf.transpose()?
Sometimes you just need to swap rows and columns, like above. Wikipedia has a fantastic page on it. The transpose function has some excellent properties for matrix math function, like associativeness and associativeness with the inverse function.
Summary
I don't think I've ever used tf.transpose in any CNN I've written.

Update submatrix in Tensorflow

Quite simply, what I want to do is the following
A = np.ones((3,3)) #arbitrary matrix
B = np.ones((2,2)) #arbitrary matrix
A[1:,1:] = A[1:,1:] + B
except in Tensorflow (where the matrices can be arbitrarily complicated tensor expressions). Neither A nor B is a Tensorflow Variable, but just a run-of-the-mill tensor.
What I have gathered so far: tensors are immutable, so I cannot assign to a submatrix. tf.scatter_nd is the current option for sub-assignment, but does not appear to support sub-matrices, only slices.
Methods that should work, but are perhaps not ideal:
I could pad B with zeros, but I'm sure this leads to instantiation of
an unnecessarily large B - can it be made sparse, maybe?
I could use the padding idea, but write it as a low-rank decomposition, e.g. in Numpy: A+U.dot(B).U.T where U is a stacked zero and identity matrix. I'm not sure this is actually advantageous.
I could split A into submatrices, and stack them back together. Might be the most efficient, but sounds like the code would be convoluted.
Ideally, I want to do this operation N times for progressively smaller matrices, resulting in one large final result, but this is tangential.
I'll use one of the hacks for now, but I'm hoping someone can tell me what the idiomatic version is!