Tensorflow Dataset operation equal to timeseries_dataset_from_array possible? - pandas

I want some more control over the TensorFlow dataset generation. For this reason, I want to mirror the behavior of timeseries_dataset_from_array but with the ability to use consecutive windows or non-overlapping windows (not possible with timeseries_dataset_from_array to set sequence_stride=0).
# df_with_inputs = (x, 19) df_with_labels = (x,1)
ds = tf.data.Dataset.from_tensor_slices((df_with_inputs.values, df_with_labels.values)).window(20, shift=1, stride=1, drop_remainder=True).batch(32)
equals to:
ds = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs[df_with_inputs.columns], df_with_labels[df_with_labels.columns], sequence_length=window_size,sequence_stride=1,shuffle=False,batch_size=batch_size)
both create a BatchDataset with the same amount of samples, but the type-spec of the dataset with the manual method is somehow different, e.g., first, give me:
<BatchDataset shapes: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None]))), types: (DatasetSpec(TensorSpec(shape=(19,), dtype=tf.float32, name=None), TensorShape([None])), DatasetSpec(TensorSpec(shape=(1,), dtype=tf.float32, name=None), TensorShape([None])))>
where the last one give me:
<BatchDataset shapes: ((None, None, 19), (None, 1)), types: (tf.float64, tf.int32)>
. But both contain the same amount of elements, in my case, 3063. Note that stride and sequence_stride have different behavior in both methods (for the same behavior, you need shift=1). Additionally, when I try to feed the first to my NN, I receive the following error (where the ds of timeseries_dataset_from_array works like a charm):
TypeError: Inputs to a layer should be tensors.
Any idea what I am missing here?
My model:
input_shape = (window_size, num_features) #(20,19)
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', padding="same",
input_shape=input_shape), [....]])

The equivalent of this:
import tensorflow as tf
tf.random.set_seed(345)
samples = 30
df_with_inputs = tf.random.normal((samples, 2), dtype=tf.float32)
df_with_labels = tf.random.uniform((samples, 1), maxval=2, dtype=tf.int32)
batch_size = 2
window_size = 20
ds1 = tf.keras.preprocessing.timeseries_dataset_from_array(df_with_inputs, df_with_labels, sequence_length=window_size,sequence_stride=1,shuffle=False, batch_size=batch_size)
for x, y in ds1.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
Using tf.data.Dataset.from_tensor_slices would be this:
ds2 = tf.data.Dataset.from_tensor_slices((df_with_inputs, df_with_labels)).batch(batch_size)
inputs_only_ds = ds2.map(lambda x, y: x)
inputs_only_ds = inputs_only_ds.flat_map(tf.data.Dataset.from_tensor_slices).window(window_size, shift=1, stride=1, drop_remainder=True).flat_map(lambda x: x.batch(window_size)).batch(batch_size)
ds2 = tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y)))
for x, y in ds2.take(1):
print(x, y)
tf.Tensor(
[[[-0.01898661 1.2348452 ]
[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]]
[[-0.33379436 -0.13637085]
[-2.239644 1.5407541 ]
[-0.14988706 0.50577176]
[-1.6328571 -0.9512018 ]
[-3.0481005 0.8019097 ]
[-0.683125 -0.12166552]
[-0.5408724 -0.97584397]
[ 0.47595206 1.0512688 ]
[ 0.15297593 0.7393363 ]
[-0.17052855 -0.12541457]
[ 1.1617764 -2.491248 ]
[-2.5665069 0.9241422 ]
[ 0.40681016 -1.031384 ]
[-0.23945935 1.5275828 ]
[-1.3431666 0.2940185 ]
[ 1.7351524 0.34276873]
[ 0.8059861 2.0647929 ]
[-0.3017126 0.729208 ]
[-0.8672192 -0.79938954]
[-0.14423785 0.95039433]]], shape=(2, 20, 2), dtype=float32) tf.Tensor(
[[1]
[1]], shape=(2, 1), dtype=int32)
Note that flap_map is necessary to flatten the tensor in order to apply sliding windows more easily. The function flat_map(lambda x: x.batch(window_size)) simply creates batches of the flattened tensor after applying sliding windows.
With the line inputs_only_ds = ds2.map(lambda x, y: x) I extract only the data (x) without the labels (y) to run sliding windows. Afterwards, in tf.data.Dataset.zip((inputs_only_ds, ds2.map(lambda x, y: y))), I concatenate / zip the dataset with the sliding windows and the labels (y) resulting in the final result ds2.

Related

Removing row of a tensor in tensorflow

Assuming I have a tensor k
k=tf.random.normal([4,5],0,1)
def sample_without_replacement(logits, K):
"""
Courtesy of https://github.com/tensorflow/tensorflow/issues/9260#issuecomment-437875125
"""
logits=tf.transpose(logits)
z = -tf.math.log(-tf.math.log(tf.random.uniform(tf.shape(logits),0,1)))
_, indices = tf.math.top_k(logits, K)
return indices
indices=sample_without_replacement(k, 2):
k.remove(x,indices_of_size_two)#
Which function can i
use in place of 'remove' to remove rows contained in indices from k ?
You can remove a specific row from a tensor by using np.delete(). For example, if i have to remove the 1st row from a tensor k
<tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[-0.02622163, -1.2923028 , 3.2072415 , 1.2431644 , -0.11518966],
[ 1.2594987 , -1.6813043 , -0.4560027 , 1.4999349 , -0.7349123 ],
[ 0.21005473, 1.1832136 , -2.4060364 , -0.59930867, -0.1646447 ],
[ 0.7740495 , 0.48236254, 0.682837 , -0.54411227, 1.0912068 ]],
dtype=float32)>
np.delete(k, obj=1, axis=0)
output:
array([[-0.02622163, -1.2923028 , 3.2072415 , 1.2431644 , -0.11518966],
[ 0.21005473, 1.1832136 , -2.4060364 , -0.59930867, -0.1646447 ],
[ 0.7740495 , 0.48236254, 0.682837 , -0.54411227, 1.0912068 ]],
dtype=float32)
Thank You.

How to construct an equivalent multivariate normal distribution in tensorflow-probability, using TransformedDistribution?

How to construct an equivalent multivariate normal distribution in tensorflow-probability, using TransformedDistribution and tfb.ScaleMatvecLinearOperator?
I'm reading about a tutorial on a bijector in tensorflow_probability: tfp.bijectors.ScaleMatvecLinearOperator.
An example was provided.
n = 10000
loc = 0
scale = 0.5
normal = tfd.Normal(loc=loc, scale=scale)
The above codes creates a univariate normal distribution.
tril = tf.random.normal((2, 4, 4))
scale_low_tri = tf.linalg.LinearOperatorLowerTriangular(tril)
scale_low_tri.to_dense()
The above codes created a tensor consisting of 2 lower triangular matrix:
<tf.Tensor: shape=(2, 4, 4), dtype=float32, numpy=
array([[[-0.56953585, 0. , 0. , 0. ],
[ 1.1368589 , 0.32028311, 0. , 0. ],
[-0.8328388 , -1.9963025 , -0.6005632 , 0. ],
[ 0.596155 , -0.214932 , 1.0988408 , -0.41731614]],
[[ 2.0778096 , 0. , 0. , 0. ],
[-1.1863967 , 2.4897904 , 0. , 0. ],
[ 0.38001925, 1.4962028 , 1.7609248 , 0. ],
[ 2.9253726 , 0.7047957 , 0.050508 , 0.58643174]]],
dtype=float32)>
Then a matrix-vector multiplication bijector is created:
scale_lin_op = tfb.ScaleMatvecLinearOperator(scale_low_tri)
After that, a TransformedDistribution is constructed as follows:
mvn = tfd.TransformedDistribution(normal, scale_lin_op, batch_shape=[2], event_shape=[4]) #
This should have worked in the old versions of tensorflow_probability. However the constructor of TransformedDistribution is changed now and does not accept the last two parameters batch_shape and event_shape. Therefore I tried to use the following way to do the same:
mvn2 = tfd.TransformedDistribution(
distribution=tfd.Sample(
normal,
sample_shape=[4] # base_dist.event_shape == [4]
),
bijector=scale_lin_op, ) # batch_shape=[2], event_shape=[4]
mvn2
And the result seems to have the correct batch_shape and event_shape
<tfp.distributions.TransformedDistribution 'scale_matvec_linear_operatorSampleNormal' batch_shape=[2] event_shape=[4] dtype=float32>
Then, another distribution for comparison is created:
mvn3 = tfd.MultivariateNormalLinearOperator(loc=loc, scale=scale_low_tri)
mvn3
According to the tutorial, the TransformedDistribution mvn2 should be equivalent to the MultivariateNormalLinearOperator mvn3.
# Check
xn = normal.sample((n, 2, 4)) # sample_shape = (n, 2, 4)
tf.norm(mvn2.log_prob(xn) - mvn3.log_prob(xn)) / tf.norm(mvn2.log_prob(xn))
<tf.Tensor: shape=(), dtype=float32, numpy=0.7498207>
But in my result they are not equivalent. (If they are, the above tensor should be 0)
What have I done wrong?

What is the difference between the following matrix?

I have a piece of code like the following. I have to implement image2vector() that takes an input of shape (length, height, 3) and returns a vector of shape (length*height*3). It doesn't give me a result of what I expect. Actually, I don't understand the difference between the result which I got and the expected one.
def image2vector(image):
v = None
v = image.reshape(1, 9, image.shape[0] * image.shape[1] * image.shape[2])
return v
image = np.array([[[ 0.67826139, 0.29380381],
[ 0.90714982, 0.52835647],
[ 0.4215251 , 0.45017551]],
[[ 0.92814219, 0.96677647],
[ 0.85304703, 0.52351845],
[ 0.19981397, 0.27417313]],
[[ 0.60659855, 0.00533165],
[ 0.10820313, 0.49978937],
[ 0.34144279, 0.94630077]]])
print ("image2vector(image) = " + str(image2vector(image)))
I got te following result:
image2vector(image) = [[ 0.67826139 0.29380381 0.90714982 0.52835647 0.4215251 0.45017551
0.92814219 0.96677647 0.85304703 0.52351845 0.19981397 0.27417313
0.60659855 0.00533165 0.10820313 0.49978937 0.34144279 0.94630077]]
But I want to get the following one:
[[ 0.67826139] [ 0.29380381] [ 0.90714982] [ 0.52835647] [ 0.4215251 ] [ 0.45017551] [ 0.92814219] [ 0.96677647] [ 0.85304703] [ 0.52351845] [ 0.19981397] [ 0.27417313] [ 0.60659855] [ 0.00533165] [ 0.10820313] [ 0.49978937] [ 0.34144279] [ 0.94630077]]
What is the difference between them? How I get the second matrix from the first one?
Your image does not have the shape (length, height, 3)
In [1]: image = np.array([[[ 0.67826139, 0.29380381],
...: [ 0.90714982, 0.52835647],
...: [ 0.4215251 , 0.45017551]],
...:
...: [[ 0.92814219, 0.96677647],
...: [ 0.85304703, 0.52351845],
...: [ 0.19981397, 0.27417313]],
...:
...: [[ 0.60659855, 0.00533165],
...: [ 0.10820313, 0.49978937],
...: [ 0.34144279, 0.94630077]]])
In [2]: image.shape
Out[2]: (3, 3, 2)
and you can't do the reshape you try:
In [3]: image.reshape(1, 9, image.shape[0] * image.shape[1] * image.shape[2])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-aac5649a99ea> in <module>
----> 1 image.reshape(1, 9, image.shape[0] * image.shape[1] * image.shape[2])
ValueError: cannot reshape array of size 18 into shape (1,9,18)
It has only 18 elements; you can't increase the number of elements with reshape.
In [4]: image.reshape(1, image.shape[0] * image.shape[1] * image.shape[2])
Out[4]:
array([[0.67826139, 0.29380381, 0.90714982, 0.52835647, 0.4215251 ,
0.45017551, 0.92814219, 0.96677647, 0.85304703, 0.52351845,
0.19981397, 0.27417313, 0.60659855, 0.00533165, 0.10820313,
0.49978937, 0.34144279, 0.94630077]])
In [5]: _.shape
Out[5]: (1, 18)
The apparently desired shape is:
In [6]: image.reshape(image.shape[0] * image.shape[1] * image.shape[2],1)
Out[6]:
array([[0.67826139],
[0.29380381],
[0.90714982],
[0.52835647],
...
[0.94630077]])
In [7]: _.shape
Out[7]: (18, 1)
The difference if you want just a vector array, or you want a row or column vector.
usually column vector "vertical vector" has the shape(n,1) and row vector "horizontal" has the shape (1,n)
import numpy as np
image = np.array([[[ 0.67826139, 0.29380381],
[ 0.90714982, 0.52835647],
[ 0.4215251 , 0.45017551]],
[[ 0.92814219, 0.96677647],
[ 0.85304703, 0.52351845],
[ 0.19981397, 0.27417313]],
[[ 0.60659855, 0.00533165],
[ 0.10820313, 0.49978937],
[ 0.34144279, 0.94630077]]])
reshapedImage = image.reshape(18,1)
reshapedImage
array([[0.67826139],
[0.29380381],
[0.90714982],
[0.52835647],
[0.4215251],
[0.45017551],
[0.92814219],
[0.96677647],
[0.85304703],
[0.52351845],
[0.19981397],
[0.27417313],
[0.60659855],
[0.00533165],
[0.10820313],
[0.49978937],
[0.34144279],
[0.94630077]], dtype=object)

How to understand the conv2d_transpose in tensorflow

The following is a test for conv2d_transpose.
import tensorflow as tf
import numpy as np
x = tf.constant(np.array([[
[[-67], [-77]],
[[-117], [-127]]
]]), tf.float32)
# shape = (3, 3, 1, 1) -> (height, width, input_channels, output_channels) - 3x3x1 filter
f = tf.constant(np.array([
[[[-1]], [[2]], [[-3]]],
[[[4]], [[-5]], [[6]]],
[[[-7]], [[8]], [[-9]]]
]), tf.float32)
conv = tf.nn.conv2d_transpose(x, f, output_shape=(1, 5, 5, 1), strides=[1, 2, 2, 1], padding='VALID')
The result:
tf.Tensor(
[[[[ 67.]
[ -134.]
[ 278.]
[ -154.]
[ 231.]]
[[ -268.]
[ 335.]
[ -710.]
[ 385.]
[ -462.]]
[[ 586.]
[ -770.]
[ 1620.]
[ -870.]
[ 1074.]]
[[ -468.]
[ 585.]
[-1210.]
[ 635.]
[ -762.]]
[[ 819.]
[ -936.]
[ 1942.]
[-1016.]
[ 1143.]]]], shape=(1, 5, 5, 1), dtype=float32)
To my understanding, it should work as described in Figure 4.5 in the doc
Therefore, the first element (conv[0,0,0,0]) should be -67*-9=603. Why it turns out to be 67?
The result may be expained by the following image:. But why the convolution kernel is inversed?
To explain best, I have made a draw.io figure to explain the results that you obtained.
I guess above illustration might help explain the reason why the first element of transpose conv. feature map is 67.
A key thing to note:
Unlike traditional convolution, in transpose convolution each element of the filter is multiplied by an element of the input feature map and the results of those individual multiplications & intermediate feature maps are overlaid on one another to create the final feature map. The stride determines how far apart the overlays are. In our case, stride = 2, hence the filter moves by 2 in both x & y dimension after each convolution with the original downsampled feature map.

transposing data in array using numpy

I have list as following and need to be tranposed to a numpy array
samplelist= [ [ ['Name-1','Name-2','Name-3'] , ['Age-1','Age-2','Age-3'] ],
[ ['new_Name_1','new_Name_2','new_Name_3'], ['new_Age_1','new_Age_2','new_Age_3'] ]
]
Expected Result:
samplearray = [ [ ['Name-1','Age-1'], ['Name-2','Age-2'], ['Name-3','Age-3'] ],
[ ['new_Name_1','new_Age_1], ['new_Name_2','new_Age_2'], ['new_Name_3','new_Age_3'] ]
]
np.transpose results:
np.transpose(a)
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
samplelist is a 3-D array.
In [58]: samplelist.shape
Out[58]: (2, 2, 3)
Using transpose swaps the first and last axes (0 and 2):
In [55]: samplelist.T
Out[55]:
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
In [57]: samplelist.swapaxes(0,2)
Out[57]:
array([[['Name-1', 'new_Name_1'],
['Age-1', 'new_Age_1']],
[['Name-2', 'new_Name_2'],
['Age-2', 'new_Age_2']],
[['Name-3', 'new_Name_3'],
['Age-3', 'new_Age_3']]],
dtype='|S10')
To get the desired array, swap axes 1 and 2:
import numpy as np
samplelist = np.array([
[ ['Name-1','Name-2','Name-3'] , ['Age-1','Age-2','Age-3'] ],
[ ['new_Name_1','new_Name_2','new_Name_3'], ['new_Age_1','new_Age_2','new_Age_3'] ]
])
print(samplelist.swapaxes(1,2))
# [[['Name-1' 'Age-1']
# ['Name-2' 'Age-2']
# ['Name-3' 'Age-3']]
# [['new_Name_1' 'new_Age_1']
# ['new_Name_2' 'new_Age_2']
# ['new_Name_3' 'new_Age_3']]]