Issue appending ndarray's with different shapes - numpy

I have a numpy ndarray with shape (25,2) and I am trying to append one more value that has shape (2,).
I have tried using the append method, but so far no luck.
Any thoughts?
Thanks!

For append to work in this way you'll need to satisfy two conditions specified in the documentation.
The appended object must have the same dimensions. It should be of shape (1, 2).
You must specify an axis to concatenate, otherwise numpy will flatten the arrays.
For example:
import numpy
x = numpy.ones((3, 2))
y = [[1, 2]]
numpy.append(x, y, axis=0)
Results in:
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 2.]])

What kind of errors did you get with the append method? 'no luck' is as bad a descriptor as 'didnt work'. In a proper question you should show the expected value along with errors. However this topic comes up often enough that we can make good guesses.
In [336]: a = np.ones((3,2),int)
In [337]: b = np.zeros((2,),int)
But first I'll be pedantic and try an append method:
In [338]: a.append(b)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-338-d6231792f85d> in <module>()
----> 1 a.append(b)
AttributeError: 'numpy.ndarray' object has no attribute 'append'
lists have an append method; numpy arrays do not.
There is a poorly named append function:
In [339]: np.append(a,b)
Out[339]: array([1, 1, 1, 1, 1, 1, 0, 0])
In [340]: _.reshape(-1,2)
Out[340]:
array([[1, 1],
[1, 1],
[1, 1],
[0, 0]])
That works - in a way. But if I read the docs, and provide an axis parameter:
In [341]: np.append(a,b, axis=0)
...
-> 5166 return concatenate((arr, values), axis=axis)
ValueError: all the input arrays must have same number of dimensions
Now it is just calling np.concatenate, turning the 2 arguments into a list.
If this is the error you got, and didn't understand it, you may need to review basic numpy docs about dimensions and shapes.
a is 2d, b is 1d. To concatenate, we need to reshape b so it is (1,2), a shape that is compatible with the (3,2) of a. There are several ways of doing that:
In [342]: np.concatenate((a, b.reshape(1,2)), axis=0)
Out[342]:
array([[1, 1],
[1, 1],
[1, 1],
[0, 0]])
Stay away from the np.append; it's too confusing for many beginners, and doesn't add anything significant to the base concatenate.

Related

How to relaibly create a multi-dimensional array and a one-dimensional view of it in numpy, so that the memory layout be contiguous?

According to the documentation of numpy.ravel,
Return a contiguous flattened array.
A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.
For convenience and efficiency of indexing, I would like to have a one-dimensional view of a 2-dimensional array. I am using ravel for creating the view, and so far so good.
However, it is not clear to me what is meant by "A copy is made only if needed." If some day a copy is created while my code is executed, the code will stop working.
I know that there is numpy.reshape, but its documentation says:
It is not always possible to change the shape of an array without copying the data.
In any case, I would like the data to be contiguous.
How can I reliably create at 2-dimensional array and a 1-dimensional view into it? I would like the data to be contiguous in memory (for efficiency). Are there any attributes to specify when creating the 2-dimensional array to assure that it is contiguous and ravel will not need to copy it?
Related question: What is the difference between flatten and ravel functions in numpy?
The warnings for ravel and reshape are the same. ravel is just reshape(-1), to 1d. Conversely reshape docs tells us that we can think of it as first doing a ravel.
Normal array construction produces a contiguous array, and reshape with the same order will produce a view. You can visually test that by looking at the ravel and checking if the values appear in the expected order.
In [348]: x = np.arange(6).reshape(2,3)
In [349]: x
Out[349]:
array([[0, 1, 2],
[3, 4, 5]])
In [350]: x.ravel()
Out[350]: array([0, 1, 2, 3, 4, 5])
I started with the arange, reshaped it to 2d, and back to 1d. No change in order.
But if I make a sliced view:
In [351]: x[:,:2]
Out[351]:
array([[0, 1],
[3, 4]])
In [352]: x[:,:2].ravel()
Out[352]: array([0, 1, 3, 4])
This ravel has a gap, and thus is a copy.
Transpose is also a view, which cannot be reshaped to a view:
In [353]: x.T
Out[353]:
array([[0, 3],
[1, 4],
[2, 5]])
In [354]: x.T.ravel()
Out[354]: array([0, 3, 1, 4, 2, 5])
Except, if we specify the right order, the ravel is a view.
In [355]: x.T.ravel(order='F')
Out[355]: array([0, 1, 2, 3, 4, 5])
reshape has a extensive discussion of order. And transpose actually works by returning a view with different shape and strides. For a 2d array transpose produces a order F array.
So as long as you are aware of manipulations like this, you can safely assume that the reshape/ravel is contiguous.
Note that even though [354] is a copy, assignment to the flat changes the original
In [361]: x[:,:2].flat[:] = [3,4,2,1]
In [362]: x
Out[362]:
array([[3, 4, 2],
[2, 1, 5]])
x[:,:2].ravel()[:] = [10,11,2,3] does not change x. In cases like this y = x[:,:2].flat may be more useful than the ravel equivalent.

Whats the difference between `arr[tuple(seq)]` and `arr[seq]`? Relating to Using a non-tuple sequence for multidimensional indexing is deprecated

I am using an ndarray to slice another ndarray.
Normally I use arr[ind_arr]. numpy seems to not like this and raises a FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated use arr[tuple(seq)] instead of arr[seq].
What's the difference between arr[tuple(seq)] and arr[seq]?
Other questions on StackOverflow seem to be running into this error in scipy and pandas and most people suggest the error to be in the particular version of these packages. I am running into the warning running purely in numpy.
Example posts:
FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated use `arr[tuple(seq)]` instead of `arr[seq]`
FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated use `arr[tuple(seq)]`
FutureWarning with distplot in seaborn
MWE reproducing warning:
import numpy as np
# generate a random 2d array
A = np.random.randint(20, size=(7,7))
print(A, '\n')
# define indices
ind_i = np.array([1, 2, 3]) # along i
ind_j = np.array([5, 6]) # along j
# generate index array using meshgrid
ind_ij = np.meshgrid(ind_i, ind_j, indexing='ij')
B = A[ind_ij]
print(B, '\n')
C = A[tuple(ind_ij)]
print(C, '\n')
# note: both produce the same result
meshgrid returns a list of arrays:
In [50]: np.meshgrid([1,2,3],[4,5],indexing='ij')
Out[50]:
[array([[1, 1],
[2, 2],
[3, 3]]), array([[4, 5],
[4, 5],
[4, 5]])]
In [51]: np.meshgrid([1,2,3],[4,5],indexing='ij',sparse=True)
Out[51]:
[array([[1],
[2],
[3]]), array([[4, 5]])]
ix_ does the same thing, but returns a tuple:
In [52]: np.ix_([1,2,3],[4,5])
Out[52]:
(array([[1],
[2],
[3]]), array([[4, 5]]))
np.ogrid also produces the list.
In [55]: arr = np.arange(24).reshape(4,6)
indexing with the ix tuple:
In [56]: arr[_52]
Out[56]:
array([[10, 11],
[16, 17],
[22, 23]])
indexing with the meshgrid list:
In [57]: arr[_51]
/usr/local/bin/ipython3:1: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
#!/usr/bin/python3
Out[57]:
array([[10, 11],
[16, 17],
[22, 23]])
Often the meshgrid result is used with unpacking:
In [62]: I,J = np.meshgrid([1,2,3],[4,5],indexing='ij',sparse=True)
In [63]: arr[I,J]
Out[63]:
array([[10, 11],
[16, 17],
[22, 23]])
Here [I,J] is the same as [(I,J)], making a tuple of the 2 subarrays.
Basically they are trying to remove a loophole that existed for historical reasons. I don't know if they can change the handling of meshgrid results without causing further compatibility issues.
For Future Readers having this FutureWarning: ... and want to correctly resolve them.
Read the answer of #hpaulj here. And notice the point is when he(she) said:
[...] returns a list of [...]
[...] returns a tuple of [...]
If you don't know why 1. is the point please read the answer: https://stackoverflow.com/a/71487259/5290519, which is also written by him(her). This answer provides:
Why, in the first place, the usage of list as index will cause a warning.
[4] is a problem because in the past, certain lists were interpreted as though they were tuples. This is a legacy case that developers are trying to cleanup, hence the FutureWarning.
A series of concise examples on the (correct) usage of list,tuple,np.array as array index.
If you still don't get the point, try my own answer after I figuring all these complicated concepts: https://stackoverflow.com/a/71493474/5290519

what does the np.array command do?

question about the np.array command.
let's say the content of caches when you displayed it with the print command is
caches = [array([1,2,3]),array([1,2,3]),...,array([1,2,3])]
Then I executed following code:
train_x = np.array(caches)
When I print the content of train_x I have:
train_x = [[1,2,3],[1,2,3],...,[1,2,3]]
Now, the behavior is exactly as I want but do not really understand in dept what the np.array(caches) command has done. Can somebody explain this to me?
Making a 1d array
In [89]: np.array([1,2,3])
Out[89]: array([1, 2, 3])
In [90]: np.array((1,2,3))
Out[90]: array([1, 2, 3])
[1,2,3] is a list; (1,2,3) is a tuple. np.array treats them as the same. (list versus tuple does make a difference when creating structured arrays, but that's a more advanced topic.)
Note the shape is (3,) (shape is a tuple)
Making a 2d array from a nested list - a list of lists:
In [91]: np.array([[1,2],[3,4]])
Out[91]:
array([[1, 2],
[3, 4]])
In [92]: _.shape
Out[92]: (2, 2)
np.array takes data, not shape information. It infers shape from the data.
array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
In these examples the object parameter is a list or list of lists. We aren't, at this stage, defining the other parameters.

Using Index Arrays on Columns of an MXNet NDArray

Given an index array index and, say, a matrix A I want a matrix B with the corresponding permutation of the columns of A.
In Numpy I would do the following,
>>> A = np.arange(6).reshape(2,3); A
array([[0, 1, 2],
[3, 4, 5]])
>>> index = [2,0,1]
>>> A[:,index]
array([[2, 0, 1],
[5, 3, 4]])
Is there a natural or efficient way to do this in MXNet? The functions pick() and take() don't seem to work in this way. I managed to come up with the following but it's not elegant.
>>> mx.nd.take(A.T, mx.nd.array([[2],[0],[1]])).T.reshape((2,3))
[[ 2. 0. 1.]
[ 5. 3. 4.]]
<NDArray 2x3 #cpu(0)>
Finally, to throw a wrench into the works, is there a way to do this in-place?
Update Here is a slightly more elegant, but presumably not as efficient (due to the transposition), version of above:
>>> mx.nd.take(A.T, mx.nd.array([2,0,1])).T
[[ 2. 0. 1.]
[ 5. 3. 4.]]
<NDArray 2x3 #cpu(0)>
What you need is the so-called advanced indexing in MXNet. There is a PR submitted for getting elements through advanced indexing from MXNet NDArray and will add the functionality of setting elements to NDArray as well. It is expected to come out in the release 1.0.
https://github.com/apache/incubator-mxnet/pull/8246

Storing multidimensional variable length array with h5py

I'm trying to store a list of variable length arrays in an HDF file with the following procedure:
phn_mfccs = []
# Import wav files
for waveform in files:
phn_mfcc = mfcc(waveform) # produces a variable length multidim array of the shape (x, 13, 1)
# Add MFCC and label to dataset
# phn_mfccs has dimension (len(files),)
# phn_mfccs[i] has variable dimension ([# of frames in ith segment] (variable), 13, 1)
phn_mfccs.append(phn_mfcc)
dt = h5py.special_dtype(vlen=np.dtype('float64'))
mfccs_out.create_dataset('phn_mfccs', data=phn_mfccs, dtype=dt)
It seems like my datatypes aren't working out though -- instead of each element of the mfccs_out dataset containing a multidimensional array, it contains just a 1D array. e.g. if the first phn_mfcc I append originally has dimension (59,13,1), mfccs_out['phn_mfccs'][0] has dimension (59,).
I suspect it is because I'm just using a float64 datatype, and I need something else for an array of arrays? If I don't specify the dataset or try to use dtype='O', though, it spits out an error like "Object dtype 'O' has no native HDF equivalent."
Ideally, what I'd like is for mfccs_out['phn_mfccs'][i] to contain the ith phn_mfcc that I appended to the list phn_mfccs.
The essence of your code is:
phn_mfccs = []
<loop several layers>
phn_mfcc = <some sort of array expanded by one dimension>
phn_mfccs.append(phn_mfcc)
At the end of loops phn_mfccs is a list of arrays. I can't tell from the code what the dtype and shape is. Or whether it differs for each element of the list.
I'm not entirely sure what create_dataset does when given a list of arrays. It may wrap it in np.array.
mfccs_out.create_dataset('phn_mfccs', data=phn_mfccs, dtype=dt)
What does np.array(phn_mfccs) produce? Shape, dtype? If all the elements are arrays of the same shape and dtype it will produce a higher dimensional array. If they differ in shape, it will produce a 1d array with object dtype. Given the error message, I suspect the latter.
I've answered a few vlen questions but haven't worked with it a lot
http://docs.h5py.org/en/latest/special.html
I vaguely recall that the 'ragged' dimension of a h5 array can only be 1d. So a phn_mfccs object array that contains 1d float arrays of varying dimensions might work.
I might come up with a simple example. And I suggest you construct a simpler problem that we can copy-n-paste and experiement with. We don't need to know how you read the data from your directory. We just need to understand the content of the array (list) that you are trying to write.
A 2015 post on vlen arrays
Inexplicable behavior when using vlen with h5py
H5PY - How to store many 2D arrays of different dimensions
1d ragged arrays example
In [24]: f = h5py.File('vlen.h5','w')
In [25]: dt = h5py.special_dtype(vlen=np.dtype('float64'))
In [26]: dataset = f.create_dataset('vlen',(4,), dtype=dt)
In [27]: dataset.value
Out[27]:
array([array([], dtype=float64), array([], dtype=float64),
array([], dtype=float64), array([], dtype=float64)], dtype=object)
In [28]: for i in range(4):
...: dataset[i]=np.arange(i+3)
In [29]: dataset.value
Out[29]:
array([array([ 0., 1., 2.]), array([ 0., 1., 2., 3.]),
array([ 0., 1., 2., 3., 4.]),
array([ 0., 1., 2., 3., 4., 5.])], dtype=object)
If I try to write 2d arrays to dataset I get an error
OSError: Can't prepare for writing data (Src and dest data spaces have different sizes)
The dataset itself may be multidimensional, but the vlen object has to be a 1d array of floats.