Im trying to understand numpy where condition.
>>> import numpy as np
>>> x = np.arange(9.).reshape(3, 3)
>>> x
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]])
>>> np.where( x > 5 )
(array([2, 2, 2]), array([0, 1, 2]))
IN the above case, what does the output actually mean, array([0,1,2]) I actually see in the input what is array([2,2,2])
Th first array indicates the row number and the second array indicates the corresponding column number.
If the array is following:
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]])
Then the following
(array([2, 2, 2]), array([0, 1, 2]))
Can be interpreted as
array(2,0) => 6
array(2,1) => 7
array (2,2) => 8
You might also want to know where those values appear visually in your array. In such cases, you can return the array's value where the condition is True and a null value where they are false. In the example below, the value of x is returned at the position where x>5, otherwise assign -1.
x = np.arange(9.).reshape(3, 3)
np.where(x>5, x, -1)
array([[-1., -1., -1.],
[-1., -1., -1.],
[ 6., 7., 8.]])
Three elements found, located at (2,0),(2,1),(2,2)..
By the way, tryhelp(np.where()) will help you a lot.
Related
I want to compute the dot product between two numpy arrays.
For example, my arrays have shape of (3,) and (1,), so from basic math understanding I should an vector of shape (3,1). However using numpy dot would not get the result like that. In general, my input would have the size of (x,n) and (n,x) and I would like to get the shape (x,x) or scalar if x=1.
The only real issue here is that you're using arrays of size (3,) and (1,) but you should be using (3,1) and (1,1). With that it behaves exactly as you want/expect:
>>> np.dot([3, 2, 1], [1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: shapes (3,) and (1,) not aligned: 3 (dim 0) != 1 (dim 0)
>>> np.dot([[3], [2], [1]], [[1]])
array([[3],
[2],
[1]])
For (x, n) and (n, x) shapes:
>>> x = 5
>>> n = 4
>>> A = np.ones((x, n))
>>> B = np.ones((n, x))
>>> A.dot(B)
array([[ 4., 4., 4., 4., 4.],
[ 4., 4., 4., 4., 4.],
[ 4., 4., 4., 4., 4.],
[ 4., 4., 4., 4., 4.],
[ 4., 4., 4., 4., 4.]])
>>> A.dot(B).shape
(5, 5)
Again, exactly as you want/expect. Note that in numpy, an array with shape (n,) is a zero dimensional array, while an array with shape (n,1) is a one dimensional array. Single dimensional arrays are necessary for operations on them to behave like you would expect.
I'm trying to feed 1D numpy arrays (flattend images) via a generator into a H5py data file in order to create training and validation matrices.
The following code was adapted from a solution (can't find it now) in which the data attribute of H5py's File objects's create_dataset function is provided data in the form of a call to np.fromiter which has a generator function as one of its arguments.
from scipy.misc import imread
import h5py
import numpy as np
import os
# Creating h5 data file
f = h5py.File('../data.h5', 'w')
# Source directory for image data
src = '/datasets/aic540/train/images/'
# Showing quantity and dimensionality of data
images = os.listdir(src)
ex_img = imread(src + images[0])
flat_img = ex_img.flatten()
print "# of images is {}".format(len(images))
print "image shape is {}".format(ex_img.shape)
print "flattened image shape is {}".format(flat_img.shape)
# Creating generator to feed in data to h5py's `create_dataset` function
gen = (imread(src + i).flatten().astype(np.int8) for i in os.listdir(src))
# Creating h5 dataset
f.create_dataset(name='training',
#shape=(59482, 1555200),
data=np.fromiter(gen, dtype=np.int8))
Output:
# of images is 59482
image shape is (540, 960, 3)
flattened image shape is (1555200,)
Traceback (most recent call last):
File "process_images.py", line 30, in <module>
data=np.fromiter(gen, dtype=np.int8))
ValueError: setting an array element with a sequence.
I've read when searching for this error in this context that the problem is that np.fromiter() needs a list and not a generator function (which seems opposed to the function that the name "fromiter" implies) -- wrapping the generator in a list call list(gen) allows the code to run but it, of course, uses up all the memory in the expansion of this list before the call to create_dataset is made.
How do I use a generator to feed data into an H5py data file?
If my approach is entirely wrong, what is the correct way to build a very large numpy matrix that doesn't fit in memory -- using H5py or otherwise?
The with a sequence error comes from what you are trying to feed fromiter, not the generator part.
In py3, range is generator like:
In [15]: np.fromiter(range(3),dtype=int)
Out[15]: array([0, 1, 2])
In [16]: np.fromiter((2*x for x in range(3)),dtype=int)
Out[16]: array([0, 2, 4])
But if I start with a 2d array (which imread produces, right?), and create a generator expression as you do:
In [17]: gen = (np.ones((2,3)).flatten().astype(np.int8) for i in range(3))
In [18]: list(gen)
Out[18]:
[array([1, 1, 1, 1, 1, 1], dtype=int8),
array([1, 1, 1, 1, 1, 1], dtype=int8),
array([1, 1, 1, 1, 1, 1], dtype=int8)]
I generate a list of arrays.
In [19]: gen = (np.ones((2,3)).flatten().astype(np.int8) for i in range(3))
In [21]: np.fromiter(gen, np.int8)
...
ValueError: setting an array element with a sequence.
np.fromiter creates a 1d array from an iterator that provides 'numbers' one at a time, not something that dishes out lists or arrays.
In any case, npfromiter creates a full array; not some sort of generator. There's nothing like an array 'generator'.
Even without chunking you can write data to the file by 'row' or other slice.
In [28]: f = h5py.File('test.h5', 'w')
In [29]: data = f.create_dataset(name='test',shape=(100,10))
In [30]: for i in range(100):
...: data[i,:] = np.arange(i,i+10)
...:
In [31]: data
Out[31]: <HDF5 dataset "test": shape (100, 10), type "<f4">
The equivalent in your case is to load an image, reshape it, and write it immediately to the h5py dataset. No need to collect all the images in an array or list.
read 10 rows:
In [33]: data[:10,:]
Out[33]:
array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.],
[ 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.],
[ 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.],
[ 4., 5., 6., 7., 8., 9., 10., 11., 12., 13.],
[ 5., 6., 7., 8., 9., 10., 11., 12., 13., 14.],
[ 6., 7., 8., 9., 10., 11., 12., 13., 14., 15.],
[ 7., 8., 9., 10., 11., 12., 13., 14., 15., 16.],
[ 8., 9., 10., 11., 12., 13., 14., 15., 16., 17.],
[ 9., 10., 11., 12., 13., 14., 15., 16., 17., 18.]], dtype=float32)
Enabling chunking might help with really large datasets, but I don't experience in that area.
How can I individually access one specific element in each row with numpy?
In[308]: cards
Out[296]:
array([[ 3., 8., 7., 12., 1., 4., 12.],
[ 5., 6., 2., 11., 10., 9., 6.],
[ 3., 4., 3., 9., 3., 3., 10.]])
The following will access the same elements [1,2,1] in all rows. But I want 1 of the first row, 2 of the second row and 1 of the third row instead.
cards[:,[1,2,1]]
array([[ 8., 7., 8.],
[ 6., 2., 6.],
[ 4., 3., 4.]])
Desired output:
array([[ 8.],
[ 2.],
[ 4.]])
You can pass indices for both, the rows and the columns:
In [91]: cards[[0, 1, 2], [1, 2, 1]]
Out[91]: array([ 8., 2., 4.])
If the indices have matching shape, they are processed pair-wise. More details can be found in the documentation.
You can pass single elements in two iterable:
cards[[0, 1, 2], [1, 2, 1]]
Would it be possible to use numpy/scipy to multiply matrices composed of polynomials?
Specifically I wish to multiply a 120 by 120 sparse matrix who's entries can look like a+7*b+c by itself.
Honestly, I haven't tried very hard to do this. I see that there is a polynomial module in numpy but I have no experience with it. I am just hoping that someone sees this and says "obviously it's possible, do this".
There is one relevant question asked before from what I've seen: Matrices whose entries are polynomials
I don't know about sparse, but numpy object arrays work fine.
In [1]: from numpy.polynomial import Polynomial as P
In [2]: a = np.array([[P([1,2]), P([3,4])]]*2)
In [3]: a
Out[3]:
array([[Polynomial([ 1., 2.], [-1, 1], [-1, 1]),
Polynomial([ 3., 4.], [-1, 1], [-1, 1])],
[Polynomial([ 1., 2.], [-1, 1], [-1, 1]),
Polynomial([ 3., 4.], [-1, 1], [-1, 1])]], dtype=object)
In [4]: np.dot(a, a)
Out[4]:
array([[Polynomial([ 4., 14., 12.], [-1., 1.], [-1., 1.]),
Polynomial([ 12., 34., 24.], [-1., 1.], [-1., 1.])],
[Polynomial([ 4., 14., 12.], [-1., 1.], [-1., 1.]),
Polynomial([ 12., 34., 24.], [-1., 1.], [-1., 1.])]], dtype=object)
If I have two ndarrays:
a.shape # returns (200,300, 3)
b.shape # returns (200, 300)
numpy.vstack((a,b)) # Gives error
Would print out the error:
ValueError: arrays must have same number of dimensions
I tried doing vstack((a.reshape(-1,300), b) which kind of works, but the output is very weird.
You don't specify what final shape you actually want. If it's (200, 300, 4), you can use dstack instead:
>>> import numpy as np
>>> a = np.random.random((200,300,3))
>>> b = np.random.random((200,300))
>>> c = np.dstack((a,b))
>>> c.shape
(200, 300, 4)
Basically, when you're stacking, the lengths have to agree in all the other axes.
[Updated based on comment:]
If you want (800, 300) you could try something like this:
>>> a = np.ones((2, 3, 3)) * np.array([1,2,3])
>>> b = np.ones((2, 3)) * 4
>>> c = np.dstack((a,b))
>>> c
array([[[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]],
[[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]]])
>>> c.T.reshape(c.shape[0]*c.shape[-1], -1)
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 2., 2., 2.],
[ 2., 2., 2.],
[ 3., 3., 3.],
[ 3., 3., 3.],
[ 4., 4., 4.],
[ 4., 4., 4.]])
>>> c.T.reshape(c.shape[0]*c.shape[-1], -1).shape
(8, 3)