what does myarray[0][:,0] mean - numpy

This is an excerpt from a documentation.
lambda ind, r: 1.0 + any(np.array(points_2d)[ind][:,0] == 0.0)
But I don't understand np.array(points_2d)[ind][:,0].
It seems equivalent to myarray[0][:,0], which doesn't make sense to me.
Can anyone help to explain?

With points_2d from earlier in the doc:
In [38]: points_2d = [(0., 0.), (0., 1.), (1., 1.), (1., 0.),
...: (0.5, 0.25), (0.5, 0.75), (0.25, 0.5), (0.75, 0.5)]
In [39]: np.array(points_2d)
Out[39]:
array([[0. , 0. ],
[0. , 1. ],
[1. , 1. ],
[1. , 0. ],
[0.5 , 0.25],
[0.5 , 0.75],
[0.25, 0.5 ],
[0.75, 0.5 ]])
Indexing with a scalar gives a 1d array, which can't be further indexed with [:,0].
In [40]: np.array(points_2d)[0]
Out[40]: array([0., 0.])
But with a list or slice:
In [41]: np.array(points_2d)[[0,1,2]]
Out[41]:
array([[0., 0.],
[0., 1.],
[1., 1.]])
In [42]: np.array(points_2d)[[0,1,2]][:,0]
Out[42]: array([0., 0., 1.])
So this selects the first column of a subset of rows.
In [43]: np.array(points_2d)[[0,1,2]][:,0]==0.0
Out[43]: array([ True, True, False])
In [44]: any(np.array(points_2d)[[0,1,2]][:,0]==0.0)
Out[44]: True
I think they could have used:
In [45]: np.array(points_2d)[[0,1,2],0]
Out[45]: array([0., 0., 1.])

Related

I'm using a mask to slice a numpy array, but the output is flattened. How do I retain the number of columns?

Here is what I have so far:
arr = np.round(np.random.uniform(0,1,size = (10,10)),decimals = 0)
print(arr)
arr2 = np.cumsum(arr,axis=0)
print(arr2)
mask = np.where((arr == 1)&(arr2<=3),1,0)
print(mask)
population = np.round(np.random.uniform(0,5,size=(10,10)),decimals=0)
print(population)
maskedPop = population[mask==1]
print(maskedPop)
This outputs a flattened array, is there a way I can keep the 10 columns? So the output would be 3x10?
Your code, reduced in scale:
In [153]: arr = np.round(np.random.uniform(0,1,size = (5,5)),decimals = 0)
...: print(arr)
...: arr2 = np.cumsum(arr,axis=0)
...: print(arr2)
...: mask = np.where((arr == 1)&(arr2<=3),1,0)
...: print(mask)
...: population = np.round(np.random.uniform(0,5,size=(5,5)),decimals=0)
...: print(population)
...: print(mask==1)
...: maskedPop = population[mask==1]
...: print(maskedPop)
The print results - I added the mask==1 line, since that's what's doing the indexing:
[[0. 1. 1. 0. 1.]
[1. 0. 1. 1. 1.]
[1. 0. 0. 1. 1.]
[1. 1. 0. 0. 1.]
[0. 0. 0. 0. 0.]]
[[0. 1. 1. 0. 1.]
[1. 1. 2. 1. 2.]
[2. 1. 2. 2. 3.]
[3. 2. 2. 2. 4.]
[3. 2. 2. 2. 4.]]
[[0 1 1 0 1]
[1 0 1 1 1]
[1 0 0 1 1]
[1 1 0 0 0]
[0 0 0 0 0]]
[[0. 5. 2. 2. 2.]
[1. 4. 2. 4. 0.]
[2. 3. 3. 2. 2.]
[4. 4. 3. 1. 3.]
[4. 2. 2. 1. 5.]]
[[False True True False True]
[ True False True True True]
[ True False False True True]
[ True True False False False]
[False False False False False]]
[5. 2. 2. 1. 2. 4. 0. 2. 2. 2. 4. 4.]
Count the number of True per row or column. Tell us how this could retain some sort of 2d result!
===
I see you already display mask, so mask== is the same as
In [158]: mask.astype(bool)
Out[158]:
array([[False, True, True, False, True],
[ True, False, True, True, True],
[ True, False, False, True, True],
[ True, True, False, False, False],
[False, False, False, False, False]])
There is a MaskedArray class that lets you work with an array with certain values 'masked-out':
In [161]: np.ma.masked_array(population, mask!=1)
Out[161]:
masked_array(
data=[[--, 5.0, 2.0, --, 2.0],
[1.0, --, 2.0, 4.0, 0.0],
[2.0, --, --, 2.0, 2.0],
[4.0, 4.0, --, --, --],
[--, --, --, --, --]],
mask=[[ True, False, False, True, False],
[False, True, False, False, False],
[False, True, True, False, False],
[False, False, True, True, True],
[ True, True, True, True, True]],
fill_value=1e+20)
===
Another way to retain masked values in an array is to somehow 'zero-out' values:
In [162]: mpop = population.copy()
In [163]: mpop[mask!=1] = np.nan
In [164]: mpop
Out[164]:
array([[nan, 5., 2., nan, 2.],
[ 1., nan, 2., 4., 0.],
[ 2., nan, nan, 2., 2.],
[ 4., 4., nan, nan, nan],
[nan, nan, nan, nan, nan]])
It looks like the maks produces the same amount of non-zero rows per column. So you could probably mask (using the boolean array directly) and reshape:
population[(arr == 1)&(arr2<=3)].reshape(3,-1)
array([[3., 2., 5., 0., 4., 2., 0., 4., 5., 1.],
[4., 3., 5., 3., 4., 1., 1., 4., 5., 4.],
[3., 3., 4., 3., 4., 2., 4., 4., 1., 5.]])
Note that the output is flattened, since numpy doesn't know that the result is expected to be a 2d homogeneous array. If mask.sum(0) resulted in different values per column, you wouldn't be able to reconstruct as an ndarray, so numpy just doesn't do that guess for you.

How to change dtypes of numpy array for tensorflow

I am creating a neural network in tensorflow and I have created the placeholders like this:
input_tensor = tf.placeholder(tf.float32, shape = (None,n_input), name = "input_tensor")
output_tensor = tf.placeholder(tf.float32, shape = (None,n_classes), name = "output_tensor")
During the training process, I was getting the following error:
Traceback (most recent call last):
File "try.py", line 150, in <module>
sess.run(optimizer, feed_dict={X: x_train[i: i + 1], Y: y_train[i: i + 1]})
TypeError: unhashable type: 'numpy.ndarray'
I identified that is because of the different datatypes of my x_train and y_train to the datatypes of the placeholders.
My x_train looks somewhat like this:
array([[array([[ 1., 0., 0.],
[ 0., 1., 0.]])],
[array([[ 0., 1., 0.],
[ 1., 0., 0.]])],
[array([[ 0., 0., 1.],
[ 0., 1., 0.]])]], dtype=object)
It was initially a dataframe like this:
0 [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
1 [[0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]
2 [[0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]
I did x_train = train_x.values to get the numpy array
And y_train looks this:
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
x_train has dtype object and y_train has dtype float64.
What I want to know is that how I can change the datatypes of my training data so that it can work well with the tensorflow placeholders. Or please suggest if I am missing something.
It is little hard to guess what shape you want your data to be, but I am guessing one of the two combinations which you might be looking for. I will also try to simulate your data in Pandas dataframe.
df = pd.DataFrame([[[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]],
[[[0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]],
[[[0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]]], columns = ['Mydata'])
print(df)
x = df.Mydata.values
print(x.shape)
print(x)
print(x.dtype)
Output:
Mydata
0 [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
1 [[0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]
2 [[0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]
(3,)
[list([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]])
list([[0.0, 1.0, 0.0], [1.0, 0.0, 0.0]])
list([[0.0, 0.0, 1.0], [0.0, 1.0, 0.0]])]
object
Combination 1
y = [item for sub_list in x for item in sub_list]
y = np.array(y, dtype = np.float32)
print(y.dtype, y.shape)
print(y)
Output:
float32 (6, 3)
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 0.]
[ 1. 0. 0.]
[ 0. 0. 1.]
[ 0. 1. 0.]]
Combination 2
y = [sub_list for sub_list in x]
y = np.array(y, dtype = np.float32)
print(y.dtype, y.shape)
print(y)
Output:
float32 (3, 2, 3)
[[[ 1. 0. 0.]
[ 0. 1. 0.]]
[[ 0. 1. 0.]
[ 1. 0. 0.]]
[[ 0. 0. 1.]
[ 0. 1. 0.]]]
Your x_train is a nested object containing arrays, so you have to unpack it and reshape it. Here's a general purpose hack:
def unpack(a, aggregate=[]):
for x in a:
if type(x) is float:
aggregate.append(x)
else:
unpack(x, aggregate=aggregate)
return np.array(aggregate)
x_train = unpack(x_train.values).reshape(x_train.shape[0],-1)
Once you've got a dense array (y_train is already dense), you can use a function like the following:
def cast(placeholder, array):
dtype = placeholder.dtype.as_numpy_dtype
return array.astype(dtype)
x_train, y_train = cast(X,x_train), cast(Y,y_train)

Index variable range in numpy

I have a numpy zero matrix A of the shape (2, 5).
A = [[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]]
I have another array seq of size 2. This is same as the first axis of A.
seq = [2, 3]
I want to create another matrix B which looks like this:
B = [[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0.]]
B is constructed by changing the first seq[i] elements in the ith row of A with 1.
This is a toy example. A and seq can be large so efficiency is required. I would be extra thankful if someone knows how to do this in tensorflow.
You can do this in TensorFlow (and with some analogous code in NumPy) as follows:
seq = [2, 3]
b = tf.expand_dims(tf.range(5), 0) # A 1 x 5 matrix.
seq_matrix = tf.expand_dims(seq, 1) # A 2 x 1 matrix.
b_bool = tf.greater(seq_matrix, b) # A 2 x 5 bool matrix.
B = tf.to_int32(b_bool) # A 2 x 5 int matrix.
Example output:
In [7]: b = tf.expand_dims(tf.range(5), 0)
[[0 1 2 3 4]]
In [21]: b_bool = tf.greater(seq_matrix, b)
In [22]: op = sess.run(b_bool)
In [23]: print(op)
[[ True True False False False]
[ True True True False False]]
In [24]: bint = tf.to_int32(b_bool)
In [25]: op = sess.run(bint)
In [26]: print(op)
[[1 1 0 0 0]
[1 1 1 0 0]]
This #mrry's solution, expressed a little differently
In [667]: [[2],[3]]>np.arange(5)
Out[667]:
array([[ True, True, False, False, False],
[ True, True, True, False, False]], dtype=bool)
In [668]: ([[2],[3]]>np.arange(5)).astype(int)
Out[668]:
array([[1, 1, 0, 0, 0],
[1, 1, 1, 0, 0]])
The idea is to compare [2,3] with [0,1,2,3,4] in an 'outer' broadcasting sense. The result is boolean which can be easily changed to 0/1 integers.
Another approach would be to use cumsum (or another ufunc.accumulate function):
In [669]: A=np.zeros((2,5))
In [670]: A[range(2),[2,3]]=1
In [671]: A
Out[671]:
array([[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.]])
In [672]: A.cumsum(axis=1)
Out[672]:
array([[ 0., 0., 1., 1., 1.],
[ 0., 0., 0., 1., 1.]])
In [673]: 1-A.cumsum(axis=1)
Out[673]:
array([[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0.]])
Or a variation starting with 1's:
In [681]: A=np.ones((2,5))
In [682]: A[range(2),[2,3]]=0
In [683]: A
Out[683]:
array([[ 1., 1., 0., 1., 1.],
[ 1., 1., 1., 0., 1.]])
In [684]: np.minimum.accumulate(A,axis=1)
Out[684]:
array([[ 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0.]])

Correct use of Tensorflow tf.split function in SKFlow

There is a minimal example of an RNN in the Skflow documentation. The input data is a matrix with shape (4,5). Why is the data split according to the following function for input?:
def input_fn(X):
return tf.split(1, 5, X)
This function returns a list of 5 arrays with shape 4,1
[array([[ 2.],
[ 2.],
[ 3.],
[ 2.]], dtype=float32), array([[ 1.],
[ 2.],
[ 3.],
[ 4.]], dtype=float32), array([[ 2.],
[ 3.],
[ 1.],
[ 5.]], dtype=float32), array([[ 2.],
[ 4.],
[ 2.],
[ 4.]], dtype=float32), array([[ 3.],
[ 5.],
[ 1.],
[ 1.]], dtype=f
and, what is the difference/impact on the RNN between the above function, or defining the function like this? As both input functions run
def input_fn(X):
return tf.split(1, 1, X)
Which returns the following:
[[[ 1., 3., 3., 2., 1.],
[ 2., 3., 4., 5., 6.]]
Presented here:
testRNN(self):
random.seed(42)
import numpy as np
data = np.array(list([[2, 1, 2, 2, 3],
[2, 2, 3, 4, 5],
[3, 3, 1, 2, 1],
[2, 4, 5, 4, 1]]), dtype=np.float32)
# labels for classification
labels = np.array(list([1, 0, 1, 0]), dtype=np.float32)
# targets for regression
targets = np.array(list([10, 16, 10, 16]), dtype=np.float32)
test_data = np.array(list([[1, 3, 3, 2, 1], [2, 3, 4, 5, 6]]))
def input_fn(X):
return tf.split(1, 5, X)
# Classification
classifier = skflow.TensorFlowRNNClassifier(
rnn_size=2, cell_type='lstm', n_classes=2, input_op_fn=input_fn)
classifier.fit(data, labels)
classifier.weights_
classifier.bias_
predictions = classifier.predict(test_data)
self.assertAllClose(predictions, np.array([1, 0]))

turn around sparse matrix

I got some sparse matrix like this
>>>import numpy as np
>>>from scipy.sparse import *
>>>A = csr_matrix((np.identity(3)))
>>>print A
(0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
For better understanding A is something like this:
>>>print A.todense()
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
And I would like to have an operator (let us call it op1(n) ) doing this:
>>>A.op1(1)
[[ 0. 1. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]]
=> makes the last n columns the first n ones,
so
>>>A == A.op1(3)
true
. Is there some build-in solution, (EDIT:) that returns a sparse matrix again?
The solution with roll:
X = np.roll(X.todense(),-tau, axis = 0)
print X.__class__
returns
<class 'numpy.matrixlib.defmatrix.matrix'>
scipy.sparse doesn't have roll, but you can simulate it with hstack:
from scipy.sparse import *
A = eye(3, 3, format='csr')
hstack((A[:, 1:], A[:, :1]), format='csr') # roll left
hstack((A[:, -1:], A[:, :-1]), format='csr') # roll right
>>> a = np.identity(3)
>>> a
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> np.roll(a, -1, axis=0)
array([[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.]])
>>> a == np.roll(a, 3, axis=0)
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)