numpy - tuple is not "(...)" in numpy?

numpy - tuple is not "(...)" in numpy? - numpy

My understanding of tuple is it can be enclosed with parenthesis.
class tuple([iterable])
Tuples may be constructed in a number of ways:
Using a pair of parentheses to denote the empty tuple: ()
Using a trailing comma for a singleton tuple: a, or (a,)
Separating items with commas: a, b, c or (a, b, c)
Using the tuple() built-in: tuple() or tuple(iterable)
Basic Slicing and Indexing says slice indexing is a tuple of slice objects and int.
Basic slicing occurs when obj is a slice object (constructed by
start:stop:step notation inside of brackets), an integer, or a tuple
of slice objects and integers.
However, (...) cannot be used causing an error. Is the error from numpy or Python?
I suppose without parenthesis, 0:1, 2:3, 1 is still a tuple but why cannot use parenthesis if it is specified as a tuple of slice objects and integers?
It is not so important but after struggling with numpy indexing, this makes it further confusing, hence a clarification would help.
Z = np.arange(36).reshape(3, 3, 4)
print("Z is \n{}\n".format(Z))
a = Z[
(0:1, 2:3, 1)
]
---
File "<ipython-input-53-26b1604433cd>", line 5
(0:1, 2:3, 1)
^
SyntaxError: invalid syntax
This works.
Z = np.arange(36).reshape(3, 3, 4)
print("Z is \n{}\n".format(Z))
a = Z[
0:1, 2:3, 1
]
print(a)
print(a.base is not None)
As per the comment by hpaulj, numpy s_ internally takes "a list of slices and integers" and returns a tuple of slice objects.
from numpy import s_
print(s_[0:1, 2:3, 1])
Z = np.arange(36).reshape(3, 3, 4)
print("Z is \n{}\n".format(Z))
print(Z[s_[0:1, 2:3, 1]])
---
(slice(0, 1, None), slice(2, 3, None), 1)
Z is
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
[[24 25 26 27]
[28 29 30 31]
[32 33 34 35]]]
[[9]]

Slice notation like 1:2 is syntax, it does not create an object, so you cannot use them in a list or tuple or anything; slice objects on the other hand actually refer to the thing returned by slice() which behave the same, and that's what Numpy is referencing with "tuple of slice objects and integers". The valid syntax to do what you were expecting would be Z[(slice(1, 2), slice(2, 3), 1)]. This allows you to save slices to a variable and use them instead.
Here's a simple code snippet to demonstrate:
>>> 1:2
File "<stdin>", line 1
SyntaxError: illegal target for annotation
>>> slice(1, 2)
slice(1, 2, None)
>>> [1, 2, 3][1:2]
[2]
>>> [1, 2, 3][slice(1, 2)]
[2]

Related

What is the tensorflow equivalent of numpy tuple/array indexing?

Question
What is the Tensorflow equivalent of Numpy tuple/array indexing to select non-continuous indices? With numpy, multiple rows / columns can be selected with tuple or array.
a = np.arange(12).reshape(3,4)
print(a)
print(a[
(0,2), # select row 0 and 2
1 # select col 0
])
---
[[ 0 1 2 3] # a[0][1] -> 1
[ 4 5 6 7]
[ 8 9 10 11]] # a[2][1] -> 9
[1 9]
Looking at Multi-axis indexing but there seems no equivalent way.
Higher rank tensors are indexed by passing multiple indices.
Using the tuple or array causes ypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got (0, 2, 5).
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
training_data = np.array([["This is the 1st sample."], ["And here's the 2nd sample."]])
vectorizer = TextVectorization(output_mode="int")
vectorizer.adapt(training_data)
word_indices = vectorizer(training_data)
word_indices = tf.cast(word_indices, dtype=tf.int8)
print(f"word vocabulary:{vectorizer.get_vocabulary()}\n")
print(f"word indices:\n{word_indices}\n")
index_to_word = tf.reshape(tf.constant(vectorizer.get_vocabulary()), (-1, 1))
print(f"index_to_word:\n{index_to_word}\n")
# Numpy tuple indexing
print(f"indices to words:{words.numpy()[(0,2,5),::]}")
# What is TF equivalent indexing?
print(f"indices to words:{words[(0,2,5),::]}") # <--- cannot use tuple/array indexing
Result:
word vocabulary:['', '[UNK]', 'the', 'sample', 'this', 'is', 'heres', 'and', '2nd', '1st']
word indices:
[[4 5 2 9 3]
[7 6 2 8 3]]
index_to_word:
[[b'']
[b'[UNK]']
[b'the']
[b'sample']
[b'this']
[b'is']
[b'heres']
[b'and']
[b'2nd']
[b'1st']]
indices to words:[[b'']
[b'the']
[b'is']]
TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got (0, 2, 5)
What indexing are available in Tensorflow to select non-consecutive multiple indices?

You can use tf.gather.
>>> tf.gather(words,[0,2,5])
<tf.Tensor: shape=(3, 1), dtype=string, numpy=
array([[b''],
[b'the'],
[b'is']], dtype=object)>
Read more in the guide: Introduction to tensor slicing

numpy.VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences

Here's an example of behavior I cannot understand, maybe someone can share the insight into the logic behind it:
ccn = np.ones(1)
bbb = 7
bbn = np.array(bbb)
bbn * ccn # this is OK
array([7.])
np.prod((bbn,ccn)) # but this is NOT
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "<__array_function__ internals>", line 5, in prod
File "C:\Users\...\venv\lib\site-packages\numpy\core\fromnumeric.py", line 2999, in prod
return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
File "C:\Users\...\venv\lib\site-packages\numpy\core\fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
numpy.VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
Why? Why would a simple multiplication of two numbers be a problem? As far as formal algebra goes there's no dimensional problems, no datatype problems? The result is invariably also a single number, there's no chance it "suddenly" turn vector or object anything alike. prod(a,b) for a and b being scalars or 1by1 "matrices" is something MATLAB or Octave would eat no problem.
I know I can turn this error off and such, but why is it even and error?

In [346]: ccn = np.ones(1)
...: bbb = 7
...: bbn = np.array(bbb)
In [347]: ccn.shape
Out[347]: (1,)
In [348]: bbn.shape
Out[348]: ()
In [349]: np.array((bbn,ccn))
<ipython-input-349-997419ba7a2f>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
np.array((bbn,ccn))
Out[349]: array([array(7), array([1.])], dtype=object)
You have arrays with different dimensions, that can't be combined into one numeric array.
That np.prod expression is actually:
np.multiply.reduce(np.array([bbn,ccn]))
can be deduced from your traceback.
In Octave both objects have shape (1,1), 2d
>> ccn = ones(1)
ccn = 1
>> ccn = ones(1);
>> size(ccn)
ans =
1 1
>> bbn = 7;
>> size(bbn)
ans =
1 1
>> [bbn,ccn]
ans =
7 1
It doesn't have true scalars; everything is 2d (even 3d is a fudge on the last dimension).
And with 'raw' Python inputs:
In [350]: np.array([1,[1]])
<ipython-input-350-f17372e1b22d>:1: VisibleDeprecationWarning: ...
np.array([1,[1]])
Out[350]: array([1, list([1])], dtype=object)
The object dtype array preserves the type of the inputs.
edit
prod isn't a simple multiplication. It's a reduction operation, like the big Pi in math. Even in Octave it isn't:
>> prod([[2,3],[3;4]])
error: horizontal dimensions mismatch (1x2 vs 2x1)
>> [2,3]*[3;4]
ans = 18
>> [2,3].*[3;4]
ans =
6 9
8 12
The numpy equivalent:
In [97]: np.prod((np.array([2,3]),np.array([[3],[4]])))
/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py:87: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences...
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: could not broadcast input array from shape (2,1) into shape (2,)
In [98]: np.array([2,3])#np.array([[3],[4]])
Out[98]: array([18])
In [99]: np.array([2,3])*np.array([[3],[4]])
Out[99]:
array([[ 6, 9],
[ 8, 12]])
The warning, and here the error, is produced by trying to make ONE array from (np.array([2,3]),np.array([[3],[4]])).

Numpy and The best way to remove rows with idendical values

I'm struggling with numpy lib.
I have a tensor of the shape (batch_size, timestep, feature):
For example lets create a dummy:
x = np.arange(42).reshape(2,7,3)
#now make some rows have homogeneous values
x[:,::3,:] =0
x[:,::5,:] =2
Now I need a numpyish way(which is repeatable in tensorflow) to remove rows(axis=-2) where all values are the same. So in the end I need a tensor to look like this:
[[[ 3 4 5]
[ 6 7 8]
[12 13 14]]
[[24 25 26]
[27 28 29]
[33 34 35]]]
Thanks.
P.S. this is not the same question as to "remove all zero rows". Since here we are talking about rows with homo- values. And this is a bit trickier.

If you are okay with losing one dimension (so that your array remains homogeneous), then you can do:
x[~np.all(x == x[:, :, 0, np.newaxis], axis=-1)]
# out:
[[ 3 4 5]
[ 6 7 8]
[12 13 14]
[24 25 26]
[27 28 29]
[33 34 35]]
Credit: #unutbu's answer to a similar problem, here adapted to one more dimension.
Why is the 3rd dimension removed? Imagine if your conditions were such that you wanted to select 2 rows from your first array and 3 from your second: then the result would be heterogeneous, which would have to be stored as a masked array or as a list of arrays.

There might be a more clever way using only numpy. However, you could just iterate over the 2nd dimension and do a comparison.
not_same= []
for n in range(x.shape[1]): # iterate over the 2nd dimension
# test if it is homogeneous i.e. first value equal all values
not_same.append(~np.all(x[:,n,:] ==x[0,n,0]))
out = x[:,not_same,:]
This gives you:
array([[[ 3, 4, 5],
[ 6, 7, 8],
[12, 13, 14]],
[[24, 25, 26],
[27, 28, 29],
[33, 34, 35]]])

Numpy Advanced Indexing : How the broadcast is happening?

array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
if we run the following statement
x[1:, [2,0,1]]
we get the following result
array([[ 6, 4, 5],
[10, 8, 9]])
According to numpy's doc:
Advanced indexes always are broadcast and iterated as one:
I am unable to understand how the pairing of indices is happening here and also broadcasting .

The selected answer is not correct.
Here the [2,0,1] indeed has shape (3,) and will not be extended during broadcasting.
While 1: means you first slicing the array before broadcasting. During the broadcasting, just think of the slicing : as a placeholder for a 0d-scalar at each run. So we get:
shape([2,0,1]) = (3,)
shape([:]) = () -> (1,) -> (3,)
So it's the [:] conceptually extended into shape (3,), like this:
x[[1,1,1], [2,0,1]] =
[6 4 5]
x[[2,2,2], [2,0,1]] =
[10 8 9]
Finally, we need to stack the results back
[[6 4 5]
[10 8 9]]

From NumPy User Guide, Section 3.4.7 Combining index arrays with slices
the slice is converted to an index array np.array that is broadcast
with the index array to produce the resultant array.
In our case the slice 1: is converted to to an index array np.array([[1,2]]) which has shape (1,2) . This is row index array.
The next index array ( column index array) np.array([2,0,1]) has shape (3,2)
row index array shape (1,2)
column index array shape (3,2)
the index arrays do not have the same shape. But they can be broadcasted to same shape. The row index array is broadcasted to match the shape of column index array.

Elementwise multiplication of NumPy arrays of different shapes

When I use numpy.multiply(a,b) to multiply numpy arrays with shapes (2, 1),(2,) I get a 2 by 2 matrix. But what I want is element-wise multiplication.
I'm not familiar with numpy's rules. Can anyone explain what's happening here?

When doing an element-wise operation between two arrays, which are not of the same dimensionality, NumPy will perform broadcasting. In your case Numpy will broadcast b along the rows of a:
import numpy as np
a = np.array([[1],
[2]])
b = [3, 4]
print(a * b)
Gives:
[[3 4]
[6 8]]
To prevent this, you need to make a and b of the same dimensionality. You can add dimensions to an array by using np.newaxis or None in your indexing, like this:
print(a * b[:, np.newaxis])
Gives:
[[3]
[8]]

Let's say you have two arrays, a and b, with shape (2,3) and (2,) respectively:
a = np.random.randint(10, size=(2,3))
b = np.random.randint(10, size=(2,))
The two arrays, for example, contain:
a = np.array([[8, 0, 3],
[2, 6, 7]])
b = np.array([7, 5])
Now for handling a product element to element a*b you have to specify what numpy has to do when reaching for the absent axis=1 of array b. You can do so by adding None:
result = a*b[:,None]
With result being:
array([[56, 0, 21],
[10, 30, 35]])

Here are the input arrays a and b of the same shape as you mentioned:
In [136]: a
Out[136]:
array([[0],
[1]])
In [137]: b
Out[137]: array([0, 1])
Now, when we do multiplication using either * or numpy.multiply(a, b), we get:
In [138]: a * b
Out[138]:
array([[0, 0],
[0, 1]])
The result is a (2,2) array because numpy uses broadcasting.
# b
#a | 0 1
------------
0 | 0*0 0*1
1 | 1*0 1*1

I just explained the broadcasting rules in broadcasting arrays in numpy
In your case
(2,1) + (2,) => (2,1) + (1,2) => (2,2)
It has to add a dimension to the 2nd argument, and can only add it at the beginning (to avoid ambiguity).
So you want a (2,1) result, you have to expand the 2nd argument yourself, with reshape or [:, np.newaxis].

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

numpy - tuple is not "(...)" in numpy? - numpy

Related

What is the tensorflow equivalent of numpy tuple/array indexing?

numpy.VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences

Numpy and The best way to remove rows with idendical values

Numpy Advanced Indexing : How the broadcast is happening?

Elementwise multiplication of NumPy arrays of different shapes

Categories

Resources