what does the np.array command do? - numpy

question about the np.array command.
let's say the content of caches when you displayed it with the print command is
caches = [array([1,2,3]),array([1,2,3]),...,array([1,2,3])]
Then I executed following code:
train_x = np.array(caches)
When I print the content of train_x I have:
train_x = [[1,2,3],[1,2,3],...,[1,2,3]]
Now, the behavior is exactly as I want but do not really understand in dept what the np.array(caches) command has done. Can somebody explain this to me?

Making a 1d array
In [89]: np.array([1,2,3])
Out[89]: array([1, 2, 3])
In [90]: np.array((1,2,3))
Out[90]: array([1, 2, 3])
[1,2,3] is a list; (1,2,3) is a tuple. np.array treats them as the same. (list versus tuple does make a difference when creating structured arrays, but that's a more advanced topic.)
Note the shape is (3,) (shape is a tuple)
Making a 2d array from a nested list - a list of lists:
In [91]: np.array([[1,2],[3,4]])
Out[91]:
array([[1, 2],
[3, 4]])
In [92]: _.shape
Out[92]: (2, 2)
np.array takes data, not shape information. It infers shape from the data.
array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
In these examples the object parameter is a list or list of lists. We aren't, at this stage, defining the other parameters.

Related

Newshape parameter in numpy.reshape can be a list?

The numpy official document specifies the synopsis of reshape as follows:
numpy.reshape(a, newshape, order='C')
where newshape can be an int or tuple of ints.
The document does not say newshape can be a list, but my testing indicates that newshape can be a list. For example:
a = np.array([[1,2,3],[4,5,6]])
b = a.reshape([3,2])
>>> b
array([[1, 2],
[3, 4],
[5, 6]])
Is the feature of providing newshape as a list a nonstandard extension, so that the document does not mention it?
Tuples and lists have similar properties, and often you can use arraylike objects (or iterables) like lists, tuples, sets, or NumPy arrays interchangeably. They don't write in the documentation about all possibilities. I guess they use tuple in this case because if you call the function shape it returns a tuple of ints.

How to relaibly create a multi-dimensional array and a one-dimensional view of it in numpy, so that the memory layout be contiguous?

According to the documentation of numpy.ravel,
Return a contiguous flattened array.
A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.
For convenience and efficiency of indexing, I would like to have a one-dimensional view of a 2-dimensional array. I am using ravel for creating the view, and so far so good.
However, it is not clear to me what is meant by "A copy is made only if needed." If some day a copy is created while my code is executed, the code will stop working.
I know that there is numpy.reshape, but its documentation says:
It is not always possible to change the shape of an array without copying the data.
In any case, I would like the data to be contiguous.
How can I reliably create at 2-dimensional array and a 1-dimensional view into it? I would like the data to be contiguous in memory (for efficiency). Are there any attributes to specify when creating the 2-dimensional array to assure that it is contiguous and ravel will not need to copy it?
Related question: What is the difference between flatten and ravel functions in numpy?
The warnings for ravel and reshape are the same. ravel is just reshape(-1), to 1d. Conversely reshape docs tells us that we can think of it as first doing a ravel.
Normal array construction produces a contiguous array, and reshape with the same order will produce a view. You can visually test that by looking at the ravel and checking if the values appear in the expected order.
In [348]: x = np.arange(6).reshape(2,3)
In [349]: x
Out[349]:
array([[0, 1, 2],
[3, 4, 5]])
In [350]: x.ravel()
Out[350]: array([0, 1, 2, 3, 4, 5])
I started with the arange, reshaped it to 2d, and back to 1d. No change in order.
But if I make a sliced view:
In [351]: x[:,:2]
Out[351]:
array([[0, 1],
[3, 4]])
In [352]: x[:,:2].ravel()
Out[352]: array([0, 1, 3, 4])
This ravel has a gap, and thus is a copy.
Transpose is also a view, which cannot be reshaped to a view:
In [353]: x.T
Out[353]:
array([[0, 3],
[1, 4],
[2, 5]])
In [354]: x.T.ravel()
Out[354]: array([0, 3, 1, 4, 2, 5])
Except, if we specify the right order, the ravel is a view.
In [355]: x.T.ravel(order='F')
Out[355]: array([0, 1, 2, 3, 4, 5])
reshape has a extensive discussion of order. And transpose actually works by returning a view with different shape and strides. For a 2d array transpose produces a order F array.
So as long as you are aware of manipulations like this, you can safely assume that the reshape/ravel is contiguous.
Note that even though [354] is a copy, assignment to the flat changes the original
In [361]: x[:,:2].flat[:] = [3,4,2,1]
In [362]: x
Out[362]:
array([[3, 4, 2],
[2, 1, 5]])
x[:,:2].ravel()[:] = [10,11,2,3] does not change x. In cases like this y = x[:,:2].flat may be more useful than the ravel equivalent.

Issue appending ndarray's with different shapes

I have a numpy ndarray with shape (25,2) and I am trying to append one more value that has shape (2,).
I have tried using the append method, but so far no luck.
Any thoughts?
Thanks!
For append to work in this way you'll need to satisfy two conditions specified in the documentation.
The appended object must have the same dimensions. It should be of shape (1, 2).
You must specify an axis to concatenate, otherwise numpy will flatten the arrays.
For example:
import numpy
x = numpy.ones((3, 2))
y = [[1, 2]]
numpy.append(x, y, axis=0)
Results in:
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 2.]])
What kind of errors did you get with the append method? 'no luck' is as bad a descriptor as 'didnt work'. In a proper question you should show the expected value along with errors. However this topic comes up often enough that we can make good guesses.
In [336]: a = np.ones((3,2),int)
In [337]: b = np.zeros((2,),int)
But first I'll be pedantic and try an append method:
In [338]: a.append(b)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-338-d6231792f85d> in <module>()
----> 1 a.append(b)
AttributeError: 'numpy.ndarray' object has no attribute 'append'
lists have an append method; numpy arrays do not.
There is a poorly named append function:
In [339]: np.append(a,b)
Out[339]: array([1, 1, 1, 1, 1, 1, 0, 0])
In [340]: _.reshape(-1,2)
Out[340]:
array([[1, 1],
[1, 1],
[1, 1],
[0, 0]])
That works - in a way. But if I read the docs, and provide an axis parameter:
In [341]: np.append(a,b, axis=0)
...
-> 5166 return concatenate((arr, values), axis=axis)
ValueError: all the input arrays must have same number of dimensions
Now it is just calling np.concatenate, turning the 2 arguments into a list.
If this is the error you got, and didn't understand it, you may need to review basic numpy docs about dimensions and shapes.
a is 2d, b is 1d. To concatenate, we need to reshape b so it is (1,2), a shape that is compatible with the (3,2) of a. There are several ways of doing that:
In [342]: np.concatenate((a, b.reshape(1,2)), axis=0)
Out[342]:
array([[1, 1],
[1, 1],
[1, 1],
[0, 0]])
Stay away from the np.append; it's too confusing for many beginners, and doesn't add anything significant to the base concatenate.

What does tf.gather_nd intuitively do?

Can you intuitively explain or give more examples about tf.gather_nd for indexing and slicing into high-dimensional tensors in Tensorflow?
I read the API, but it is kept quite concise that I find myself hard to follow the function's concept.
Ok, so think about it like this:
You are providing a list of index values to index the provided tensor to get those slices. The first dimension of the indices you provide is for each index you will perform. Let's pretend that tensor is just a list of lists.
[[0]] means you want to get one specific slice(list) at index 0 in the provided tensor. Just like this:
[tensor[0]]
[[0], [1]] means you want get two specific slices at indices 0 and 1 like this:
[tensor[0], tensor[1]]
Now what if tensor is more than one dimensions? We do the same thing:
[[0, 0]] means you want to get one slice at index [0,0] of the 0-th list. Like this:
[tensor[0][0]]
[[0, 1], [2, 3]] means you want return two slices at the indices and dimensions provided. Like this:
[tensor[0][1], tensor[2][3]]
I hope that makes sense. I tried using Python indexing to help explain how it would look in Python to do this to a list of lists.
You provide a tensor and indices representing locations in that tensor. It returns the elements of the tensor corresponding to the indices you provide.
EDIT: An example
import tensorflow as tf
sess = tf.Session()
x = [[1,2,3],[4,5,6]]
y = tf.gather_nd(x, [[1,1],[1,2]])
print(sess.run(y))
[5, 6]

Irregular Numpy matrix

In Numpy, it appears that the matrix can simply be a nested list of anything not limited to numbers. For example
import numpy as np
a = [[1,2,5],[3,'r']]
b = np.matrix(a)
generates no complaints.
What is the purpose of this tolerance when list can treat the object that is not a matrix in the strict mathematical sense?
What you've created is an object dtype array:
In [302]: b=np.array([[1,2,5],[3,'r']])
In [303]: b
Out[303]: array([[1, 2, 5], [3, 'r']], dtype=object)
In [304]: b.shape
Out[304]: (2,)
In [305]: b[0]
Out[305]: [1, 2, 5]
In [306]: b[1]=None
In [307]: b
Out[307]: array([[1, 2, 5], None], dtype=object)
The elements of this array are pointers - pointers to objects else where in memory. It has a data buffer just like other arrays. In this case 2 pointers, 2
In [308]: b.__array_interface__
Out[308]:
{'data': (169809984, False),
'descr': [('', '|O')],
'shape': (2,),
'strides': None,
'typestr': '|O',
'version': 3}
In [309]: b.nbytes
Out[309]: 8
In [310]: b.itemsize
Out[310]: 4
It is very much like a list - which also stores object pointers in a buffer. But it differs in that it doesn't have an append method, but does have all the array ones like .reshape.
And for many operations, numpy treats such an array like a list - iterating over the pointers, etc. Many of the math operations that work with numeric values fail with object dtypes.
Why allow this? Partly it's just a generalization, expanding the concept of element values/dtypes beyond the simple numeric and string ones. numpy also allows compound dtypes (structured arrays). MATLAB expanded their matrix class to include cells, which are similar.
I see a lot of questions on SO about object arrays. Sometimes they are produced in error, Creating numpy array from list gives wrong shape.
Sometimes they are created intentionally. pandas readily changes a data series to object dtype to accommodate a mix of values (string, nan, int).
np.array() tries to create as high a dimension array as it can, resorting to object dtype only when it can't, for example when the sublists differ in length. In fact you have to resort to special construction methods to create an object array when the sublists are all the same.
This is still an object array, but the dimension is higher:
In [316]: np.array([[1,2,5],[3,'r',None]])
Out[316]:
array([[1, 2, 5],
[3, 'r', None]], dtype=object)
In [317]: _.shape
Out[317]: (2, 3)