I would like to know what it means to flatten e.g. flatten an iterator of iterators. Can you tell me? Are there any C/Java/Python idioms for it?
In this context, to flatten means to remove nesting. For instance, an array of arrays (an array where each element is an array) of integers is nested; if we flatten it we get an array of integers which contains the same values in the same order, but next to each other in a single array, rather than split into several arrays: [[1 2] [3 4]] -> [1 2 3 4]. Same difference with iterators, other collections, and deeper nesting (array of array of sets of iterators of strings).
As for idioms, there aren't really many -- it's not a common task, and often simple. Note that in the case of regular arrays (all nested arrays are of the same size), nested[i][j] is equivalent to nested[i * INNER_ARRAY_SIZE + j]. This is sometimes used to avoid nesting, especially in languages which treat arrays as reference types and thus require many separately-allocated arrays if you nest them. In Python, you can flatten iterables with itertools.chain(*iterable_of_iterables).
Flattening means to remove nesting of sequence types. Python provides itertools.chain(*iterables) for this purpose (http://docs.python.org/library/itertools.html#itertools.chain).
Related
Could someone, please clarify that what is the benefit of using a Numba typed list over an ND array? Also, how do the two compares in terms of speed, and in what context would it be recommended to use the typed list?
Typed lists are useful when your need to append a sequence of elements but you do not know the total number of elements and you could not even find a reasonable bound. Such a data structure is significantly more expensive than a 1D array (both in memory space and computation time).
1D arrays cannot be resized efficiently: a new array needs to be created and a copy must be performed. However, the indexing of 1D arrays is very cheap. Numpy also provide many functions that can natively operate on them (lists are implicitly converted to arrays when passed to a Numpy function and this process is expensive). Note that is the number of items can be bounded to a reasonably size (ie. not much higher than the number of actual element), you can create a big array, then add the elements and finally work on a sub-view of the array.
ND arrays cannot be directly compared with lists. Note that lists of lists are similar to jagged array (they can contains lists of different sizes) while ND array are likes a (fixed-size) N x ... x M table. Lists of lists are very inefficient and often not needed.
As a result, use ND arrays when you can and you do not need to often resize them (or append/remove elements). Otherwise, use typed lists.
When I mix mask-indices and normal indices, I get a weird result:
a = np.zeros((3,2,5))
cond = [True]*5
print(a[0][:,cond].shape) # prints '(2,5)' as expected
print(a[0,:,cond].shape) # prints '(5,2)' which is surprising
i.e., the resulting array is transposed from what I think should happend
Is this a bug? is this a feature? If someone can point me to a piece of documentation that explains this I would be very glad :)
No, this is not a bug. It happens because you have mixed basic indexing (slice) and advanced indexing (integer and boolean array). It's documented here:
Two cases of index combination need to be distinguished:
The advanced indexes are separated by a slice, Ellipsis or newaxis. For example x[arr1, :, arr2].
The advanced indexes are all next to each other. For example x[..., arr1, arr2, :] but not x[arr1, :, 1] since 1 is an advanced
index in this regard.
In the first case, the dimensions resulting from the advanced indexing
operation come first in the result array, and the subspace dimensions
after that. In the second case, the dimensions from the advanced
indexing operations are inserted into the result array at the same
spot as they were in the initial array (the latter logic is what makes
simple advanced indexing behave just like slicing).
Running numpy.allclose(a, b) throws TypeError: invalid type promotion on structured arrays. What would be the correct way of checking whether the contents of two structured arrays are almost equal?
np.allclose does an np.isclose followed by all(). isclose tests abs(x-y) against tolerances, with accomodations for np.nan and np.inf. So it is designed primarily to work with floats, and by extension ints.
The arrays have to work with np.isfinite(a), as well as a-b and np.abs. In short a.astype(float) should work with your arrays.
None of this works with the compound dtype of a structured array. You could though iterate over the fields of the array, and compare those with isclose (or allclose). But you will have ensure that the 2 arrays have matching dtypes, and use some other test on fields that don't work with isclose (eg. string fields).
So in the simple case
all([np.allclose(a[name], b[name]) for name in a.dtype.names])
should work.
If the fields of the arrays are all the same numeric dtype, you could view the arrays as 2d arrays, and do allclose on those. But usually structured arrays are used when the fields are a mix of string, int and float. And in the most general case, there are compound dtypes within dtypes, requiring some sort of recursive testing.
import numpy.lib.recfunctions as rf
has functions to help with complex structured array operations.
Assuming b is a scalar, you can just iterate over the fields of a:
all(np.allclose(a[field], b) for field in a.dtype.names)
Is there a way of defining a matrix (say m) in numpy with rows of different lengths, but such that m stays 2-dimensional (i.e. m.ndim = 2)?
For example, if you define m = numpy.array([[1,2,3], [4,5]]), then m.ndim = 1. I understand why this happens, but I'm interested if there is any way to trick numpy into viewing m as 2D. One idea would be padding with a dummy value so that rows become equally sized, but I have lots of such matrices and it would take up too much space. The reason why I really need m to be 2D is that I am working with Theano, and the tensor which will be given the value of m expects a 2D value.
I'll give here very new information about Theano. We have a new TypedList() type, that allow to have python list with all elements with the same type: like 1d ndarray. All is done, except the documentation.
There is limited functionality you can do with them. But we did it to allow looping over the typed list with scan. It is not yet integrated with scan, but you can use it now like this:
import theano
import theano.typed_list
a = theano.typed_list.TypedListType(theano.tensor.fvector)()
s, _ = theano.scan(fn=lambda i, tl: tl[i].sum(),
non_sequences=[a],
sequences=[theano.tensor.arange(2, dtype='int64')])
f = theano.function([a], s)
f([[1, 2, 3], [4, 5]])
One limitation is that the output of scan must be an ndarray, not a typed list.
No, this is not possible. NumPy arrays need to be rectangular in every pair of dimensions. This is due to the way they map onto memory buffers, as a pointer, itemsize, stride triple.
As for this taking up space: np.array([[1,2,3], [4,5]]) actually takes up more space than a 2×3 array, because it's an array of two pointers to Python lists (and even if the elements were converted to arrays, the memory layout would still be inefficient).
I've been unable to figure out how to access, add, multiply, replace, etc. single columns of a NumPy matrix. I can do this via looping over individual elements of the column, but I'd like to treat the column as a unit, something that I can do with rows.
When I've tried to search I'm usually directed to answers handling NumPy arrays, but this is not the same thing.
Can you provide code that's giving trouble? The operations on columns that you list are among the most basic operations that are supported and optimized in NumPy. Consider looking over the tutorial on NumPy for MATLAB users, which has many examples of accessing rows or columns, performing vectorized operations on them, and modifying them with copies or in-place.
NumPy for MATLAB Users
Just to clarify, suppose you have a 2-dimensional NumPy ndarray or matrix called a. Then a[:, 0] would access the first column just the same as a[0] or a[0, :] would access the first row. Any operations that work for rows should work for columns as well, with some caveats for broadcasting rules and certain mathematical operations that depend upon array alignment. You can also use the numpy.transpose(a) function (which is also exposed with a.T) to transpose a making the columns become rows.