Can I get some help regarding the Transpose function in tensorflow - tensorflow

The tensor flow function "transpose" takes in a second argument called perm.
Could someone explain this to me with some examples perhaps?

It's easier to demonstrate on a larger matrix. So consider a (4,2,3) matrix
xx = tf.constant([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]],
[[ 13, 14, 15],
[16, 17, 18]],
[[ 19, 20, 21],
[22, 23, 24]]])
print('Shape of matrix:',xx.shape)
tf.transpose(xx)
>>> (4, 2, 3)
<tf.Tensor: shape=(3, 2, 4), dtype=int32, numpy=
array([[[ 1, 7, 13, 19],
[ 4, 10, 16, 22]],
[[ 2, 8, 14, 20],
[ 5, 11, 17, 23]],
[[ 3, 9, 15, 21],
[ 6, 12, 18, 24]]], dtype=int32)>
If you noticed above, (4,2,3) matrix becomes (3,2,4). By default, tensorflow reverses the shape of the matrix.
So tf.transpose(xx,perm=[2,1,0]) would have the same effect as tf.transpose(xx)
For example, tf.transpose(xx,perm=[1,2,0]) would change the shape to (2,3,4).
If you're having trouble visualising what the transformation would look like, you need to get some practice on it.
To take an instance,
tf.transpose(xx,perm=[1,2,0])
>>>tf.Tensor(
[[[ 1 7 13 19]
[ 2 8 14 20]
[ 3 9 15 21]]
[[ 4 10 16 22]
[ 5 11 17 23]
[ 6 12 18 24]]], shape=(2, 3, 4), dtype=int32)
notice that 4 goes to the last place, so we now need 4 elements in the inner array. So 0th element of the 4 arrays in the original xx (1,7,13,19) go in here for the first row.
And since 3 goes in the 2nd place, the inner matrix will have 3 rows, starting from the 0th element of the first array.

Related

How to get the specific out put for Numpy array slicing?

x is an array of shape(n_dim,n_row,n_col) of 1st n natural numbers
b is boolean array of shape(2,) having elements True,false
def array_slice(n,n_dim,n_row,n_col):
x = np.arange(0,n).reshape(n_dim,n_row,n_col)
b = np.full((2,),True)
print(x[b])
print(x[b,:,1:3])
expected output
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]]
[[[ 1 2]
[ 6 7]
[11 12]]]
my output:-
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
[[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]]]
[[[ 1 2]
[ 6 7]
[11 12]]
[[16 17]
[21 22]
[26 27]]]
An example:
In [83]: x= np.arange(24).reshape(2,3,4)
In [84]: b = np.full((2,),True)
In [85]: x
Out[85]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [86]: b
Out[86]: array([ True, True])
With two True, b selects both plains of the 1st dimension:
In [87]: x[b]
Out[87]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
A b with a mix of true and false:
In [88]: b = np.array([True, False])
In [89]: b
Out[89]: array([ True, False])
In [90]: x[b]
Out[90]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])

The attribute 'children_' of Agglomerative clustering

I am writing a very basic program with observations not exceeding 20 values (X1 is the original dateset).
X1_test=X1_df.iloc[0:20,]
from sklearn.cluster import AgglomerativeClustering
ag= AgglomerativeClustering(n_clusters=6, affinity= 'euclidean', linkage='ward', compute_full_tree= True, compute_distances=True)
ag.fit(X1_test)
When I run the attribute ag.chilren_ the values come as
array([[10, 13],
[ 1, 6],
[16, 18],
[ 2, 19],
[ 4, 20],
[ 8, 15],
[12, 23],
[14, 21],
[ 0, 17],
[ 9, 26],
[22, 27],
[11, 24],
[ 5, 29],
[ 7, 25],
[ 3, 28],
[30, 31],
[32, 35],
[33, 36],
[34, 37]], dtype=int64)
how come values in this output are coming more than 20 since i have only 20 observations?
Please help
According to what I can understand from reading Scikit-learn's documentation, each array represents a node that has two values. If the value is smaller than your sample size, it represents a leaf and you can therefore consider its value to be the sample index. But if the value is bigger or equal to your sample size than this is not a leaf, it is a different node (that merged earlier). Which node is it? The node that is stored in index [value - n_samples] in the children_ attribute.
So for example, if your sample size is 20 and you have a node that merges 3 with 28, you can understand that 3 is the leaf of your third sample and 28 is the node of children_[8] (because 28-20=8). So it will be the node of [14, 21] in your case.

For loop to obtain sum and mean on np 3d array

I have the following array
arr = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])
I want to go through each element and sum on axis 0, so I do:
lst = []
for x in arr:
for y in np.sum(x,axis=0):
lst.append(y)
where now the lst is
[5, 7, 9, 17, 19, 21]
However I want the output to be in the following form:
[[5, 7, 9], [17, 19, 21]]
to then take the mean of its axis 0 namely (5+17)/2 and so on. The final output should look like
[11., 13., 15.]
I wonder how can I do this? Is it possible to write this whole operation in a compact form as list comprehension?
Update: To get the final output I can do:
np.mean(np.reshape(lst, (len(arr),-1)),axis=0)
Yet I am sure there is a Pythonic way of doing this
In [5]: arr = np.array([[[1, 2, 3], [4, 5, 6]],
...: [[7, 8, 9], [10, 11, 12]]])
In [7]: arr
Out[7]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
The for iterates on the 1st dimension, as though it was a list of arrays:
In [8]: for x in arr:print(x)
[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]
list(arr) also makes a list (but it is slower than `arr.tolist()).
One common way of iterating on other dimensions is to use an index:
In [10]: for i in range(2):print(arr[:,i])
[[1 2 3]
[7 8 9]]
[[ 4 5 6]
[10 11 12]]
You could also transpose the array placing the desired axis first.
But you don't need to iterate
In [13]: arr.sum(axis=1)
Out[13]:
array([[ 5, 7, 9],
[17, 19, 21]])
In [14]: arr.sum(axis=1).mean(axis=0)
Out[14]: array([11., 13., 15.])

Efficiently construct numpy matrix from offset ranges of 1D array [duplicate]

Lets say I have a Python Numpy array a.
a = numpy.array([1,2,3,4,5,6,7,8,9,10,11])
I want to create a matrix of sub sequences from this array of length 5 with stride 3. The results matrix hence will look as follows:
numpy.array([[1,2,3,4,5],[4,5,6,7,8],[7,8,9,10,11]])
One possible way of implementing this would be using a for-loop.
result_matrix = np.zeros((3, 5))
for i in range(0, len(a), 3):
result_matrix[i] = a[i:i+5]
Is there a cleaner way to implement this in Numpy?
Approach #1 : Using broadcasting -
def broadcasting_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
return a[S*np.arange(nrows)[:,None] + np.arange(L)]
Approach #2 : Using more efficient NumPy strides -
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
Sample run -
In [143]: a
Out[143]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [144]: broadcasting_app(a, L = 5, S = 3)
Out[144]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
In [145]: strided_app(a, L = 5, S = 3)
Out[145]:
array([[ 1, 2, 3, 4, 5],
[ 4, 5, 6, 7, 8],
[ 7, 8, 9, 10, 11]])
Starting in Numpy 1.20, we can make use of the new sliding_window_view to slide/roll over windows of elements.
And coupled with a stepping [::3], it simply becomes:
from numpy.lib.stride_tricks import sliding_window_view
# values = np.array([1,2,3,4,5,6,7,8,9,10,11])
sliding_window_view(values, window_shape = 5)[::3]
# array([[ 1, 2, 3, 4, 5],
# [ 4, 5, 6, 7, 8],
# [ 7, 8, 9, 10, 11]])
where the intermediate result of the sliding is:
sliding_window_view(values, window_shape = 5)
# array([[ 1, 2, 3, 4, 5],
# [ 2, 3, 4, 5, 6],
# [ 3, 4, 5, 6, 7],
# [ 4, 5, 6, 7, 8],
# [ 5, 6, 7, 8, 9],
# [ 6, 7, 8, 9, 10],
# [ 7, 8, 9, 10, 11]])
Modified version of #Divakar's code with checking to ensure that memory is contiguous and that the returned array cannot be modified. (Variable names changed for my DSP application).
def frame(a, framelen, frameadv):
"""frame - Frame a 1D array
a - 1D array
framelen - Samples per frame
frameadv - Samples between starts of consecutive frames
Set to framelen for non-overlaping consecutive frames
Modified from Divakar's 10/17/16 11:20 solution:
https://stackoverflow.com/questions/40084931/taking-subarrays-from-numpy-array-with-given-stride-stepsize
CAVEATS:
Assumes array is contiguous
Output is not writable as there are multiple views on the same memory
"""
if not isinstance(a, np.ndarray) or \
not (a.flags['C_CONTIGUOUS'] or a.flags['F_CONTIGUOUS']):
raise ValueError("Input array a must be a contiguous numpy array")
# Output
nrows = ((a.size-framelen)//frameadv)+1
oshape = (nrows, framelen)
# Size of each element in a
n = a.strides[0]
# Indexing in the new object will advance by frameadv * element size
ostrides = (frameadv*n, n)
return np.lib.stride_tricks.as_strided(a, shape=oshape,
strides=ostrides, writeable=False)

Split last dimension of arrays in lower dimensional arrays

Assume we have an array with NxMxD shape. I want to get a list with D NxM arrays.
The correct way of doing it would be:
np.dsplit(myarray, D)
However, this returns D NxMx1 arrays.
I can achieve the desired result by doing something like:
[myarray[..., i] for i in range(D)]
Or:
[np.squeeze(subarray) for subarray in np.dsplit(myarray, D)]
However, I feel like it is a bit redundant to need to perform an additional operation. Am I missing any numpy function that returns the desired result?
Try D.swapaxes(1,2).swapaxes(1,0)
>>>import numpy as np
>>>a = np.arange(24).reshape(2,3,4)
>>>a
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
>>>[a[:,:,i] for i in range(4)]
[array([[ 0, 4, 8],
[12, 16, 20]]),
array([[ 1, 5, 9],
[13, 17, 21]]),
array([[ 2, 6, 10],
[14, 18, 22]]),
array([[ 3, 7, 11],
[15, 19, 23]])]
>>>a.swapaxes(1,2).swapaxes(1,0)
array([[[ 0, 4, 8],
[12, 16, 20]],
[[ 1, 5, 9],
[13, 17, 21]],
[[ 2, 6, 10],
[14, 18, 22]],
[[ 3, 7, 11],
[15, 19, 23]]])
Edit: As pointed out by ajcr (thanks again), the transpose command is more convenient since the two swaps can be done in one step by using
D.transpose(2,0,1)
np.dsplit uses np.array_split, the core of which is:
sub_arys = []
sary = _nx.swapaxes(ary, axis, 0)
for i in range(Nsections):
st = div_points[i]; end = div_points[i+1]
sub_arys.append(_nx.swapaxes(sary[st:end], axis, 0))
with axis=-1, this is equivalent to:
[x[...,i:(i+1)] for i in np.arange(x.shape[-1])] # or
[x[...,[i]] for i in np.arange(x.shape[-1])]
which accounts for the singleton dimension.
So there's nothing wrong or inefficient about your
[x[...,i] for i in np.arange(x.shape[-1])]
Actually in quick time tests, any use of dsplit is slow. It's generality costs. So adding squeeze is relatively cheap.
But by accepting the other answer, it looks like you are really looking for an array of the correct shape, rather than a list of arrays. For many operations that makes sense. split is more useful when the subarrays have more than one 'row' along the split axis, or even an uneven number of 'rows'.