Transform sequential 2d array to time-windowed dataset - pandas

I have a 2d dataframe:
C1. C2. C3
0. 2. 3. 6
1. 8. 2. 1
2. 8. 6. 2
3. 4. 9. 0
4. 6. 7. 1
5. 2. 3. 0
I want it to be a 3d data with <num_windows, window_size, num_features>
So if window size is 5, the shape of the 3d data will be <2,5,3> and will be:
[[2,3,4],[8,2,1],[8,6,2],[4,9,0],[6,7,1]] , [[8,2,1],[8,6,2],[4,9,0],[6,7,1],[2,3,0]]
What is the best way to do it?

You can use sliding_window_view:
num_windows = 2
window_size = 5
num_features = 3
np.lib.stride_tricks.sliding_window_view(df, (window_size, num_features))[:num_windows,0,:,:]
gives a 3D array of shape (num_windows, window_size, num_features):
array([[[2., 3., 6.],
[8., 2., 1.],
[8., 6., 2.],
[4., 9., 0.],
[6., 7., 1.]],
[[8., 2., 1.],
[8., 6., 2.],
[4., 9., 0.],
[6., 7., 1.],
[2., 3., 0.]]])

Related

Why does numpy reshape mess up my data pattern?

Let's say I have the following array A -
import numpy as np
batch_size, seq_len = 3, 5
A = np.zeros((batch_size, seq_len))
A[0,0:] = 1
A[1,0:] = 2
A[2,0:] = 3
A has the following value -
array([[1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2.],
[3., 3., 3., 3., 3.]])
Now, if I reshape it in the following way -
A4 = A.reshape(seq_len, -1)
array([[1., 1., 1.],
[1., 1., 2.],
[2., 2., 2.],
[2., 3., 3.],
[3., 3., 3.]])
However, I expected it to be -
array([[1., 2., 3.],
[1., 2., 3.],
[1., 2., 3.],
[1., 2., 3.],
[1., 2., 3.]])
Kudos to this awesome blog post bringing my attention to this problem - https://discuss.pytorch.org/t/for-beginners-do-not-use-view-or-reshape-to-swap-dimensions-of-tensors/75524
From the np.reshape docs
You can think of reshaping as first raveling the array (using the given index order), then inserting the elements from the raveled array into the new array using the same kind of index ordering as was used for the raveling.
a4 is (5,3) with the elements in the same order [1,1,1,1,1,2,2,...]

Filtering a ndarray in numpy

I have a ndarray and I want to filter out a particular value of it. My array is:
arr = np.array([
[1., 6., 1.],
[1., 7., 0.],
[1., 8., 0.],
[3., 5., 1.],
[5., 1., 1.],
[5., 2., 2.],
[6., 1., 1.],
[6., 2., 2.],
[6., 7., 3.],
[6., 8., 0.]
])
I want to filter out [6., 1., 1.]. So I have tried:
arr[arr != [6., 1., 1.]]
and I got:
array([1., 6., 1., 7., 0., 1., 8., 0., 3., 5., 5., 5., 2., 2., 2., 2., 7.,
3., 8., 0.])
which is not what I want (and also destroyed the previous structure of the array). I have also tried:
arr[arr[:] != [6., 1., 1.]]
but I got the same output as before.
P.S.: I know I can delete an element by its index, but I don't want to do that. I want to check for the particular element.
P.P.S.: For 1-d arrays my method works.
You're very close. The boolean array you get tells you how many elements match in each row. You need to make sure that all the elements in a row match to delete it, or that any of the elements don't match to keep it:
arr[(arr != [6, 1, 1]).any(axis=1)]
You can also write it as
arr[~(arr == [6, 1, 1]).all(axis=1)]

Operation from each vector in a 2D tensor on each 2D matrix in a 3D tensor

I have a 2D tensor. I would like to take each vector in that 2D tensor and tf.tensordot(vector, matrix, axes=1) to the matrix in a 3D tensor that has the same index in the 3D tensor as the vector does in the 2D tensor.
Essentially, I'd like the same result as I'd get with this for loop, but by doing tensorflow matrix operations rather than numpy and looping:
tensor2d = np.array([[1.,1.,1.,0.,0.],
[1.,1.,0.,0.,0.]],
np.float32)
tensor3d = np.array([
[
[1., 2., 3.],
[2., 2., 3.],
[3., 2., 3.],
[4., 2., 3.],
[5., 2., 3.],
],
[
[1., 2., 3.],
[2., 2., 3.],
[3., 2., 3.],
[4., 2., 3.],
[5., 2., 3.],
]
], np.float32)
results = []
for i in range(len(tensor2d)):
results.append(np.tensordot(tensor2d[i], tensor3d[i], axes=1))
Output of this should be a matrix that looks like this (though types would be different):
[array([6., 6., 9.], dtype=float32), array([3., 4., 6.], dtype=float32)]
Ok, the self-found answer boils down to use tf.math.multiply and mess around with transposes until the result is the desired shape. Would be great if someone could come up with a more principled answer at some point, but for now, this worked:
result = tf.transpose(tf.math.multiply(tensor2d, tensor3d.transpose([2,0,1])), [1,2,0])

How do I do a matrix multiplication on the last 2 dimensions of a tensor [duplicate]

This question already has answers here:
Tensorflow - matmul of input matrix with batch data
(5 answers)
Closed 5 years ago.
Say I have a shape (3, 5, 3) tensor like so:
x = [[[ 4., 6., 6.],
[ 0., 0., 3.],
[ 6., 6., 5.],
[ 4., 1., 8.],
[ 3., 6., 7.]],
[[ 4., 0., 5.],
[ 4., 7., 2.],
[ 4., 5., 3.],
[ 4., 2., 1.],
[ 3., 4., 4.]],
[[ 0., 3., 4.],
[ 6., 7., 5.],
[ 1., 2., 2.],
[ 3., 8., 3.],
[ 8., 5., 7.]]]
And a shape (3, 3, 4) tensor like so:
y = [[[ 3., 2., 5., 4.],
[ 8., 7., 1., 8.],
[ 4., 0., 5., 3.]],
[[ 8., 7., 7., 3.],
[ 5., 4., 0., 1.],
[ 6., 5., 4., 4.]],
[[ 7., 0., 1., 2.],
[ 7., 5., 0., 6.],
[ 7., 5., 4., 1.]]]
How would do a matrix multiplication so that the resulting matrix is of shape (3, 5, 4)
Whereby the first element of the matrix is given by the matrix multiplication of
[[ 4., 6., 6.],
[ 0., 0., 3.],
[ 6., 6., 5.],
[ 4., 1., 8.],
[ 3., 6., 7.]]
and
[[ 3., 2., 5., 4.]
[ 8., 7., 1., 8.]
[ 4., 0., 5., 3.]]
I've tried using tf.tensordot like:
z = tf.tensorflow(x, y, axes = [[2],[1]])
which I believe is multiply the 3rd axis of x with the 2nd axis of y but it gives me a tensor of shape (3, 5, 3, 4). Any ideas?
Silly me after reading tf.matmul docs it seems like since the inner dimensions match I can just do tf.matmul(x,y) and it gives me the answer

Access individual array in numpy

How can I individually access one specific element in each row with numpy?
In[308]: cards
Out[296]:
array([[ 3., 8., 7., 12., 1., 4., 12.],
[ 5., 6., 2., 11., 10., 9., 6.],
[ 3., 4., 3., 9., 3., 3., 10.]])
The following will access the same elements [1,2,1] in all rows. But I want 1 of the first row, 2 of the second row and 1 of the third row instead.
cards[:,[1,2,1]]
array([[ 8., 7., 8.],
[ 6., 2., 6.],
[ 4., 3., 4.]])
Desired output:
array([[ 8.],
[ 2.],
[ 4.]])
You can pass indices for both, the rows and the columns:
In [91]: cards[[0, 1, 2], [1, 2, 1]]
Out[91]: array([ 8., 2., 4.])
If the indices have matching shape, they are processed pair-wise. More details can be found in the documentation.
You can pass single elements in two iterable:
cards[[0, 1, 2], [1, 2, 1]]