Keras summation Layer acting weird, summing over training set - tensorflow

I am having trouble understanding the basic way Keras works. I am experimenting with a single summation layer, implemented as a Lambda layer using tensorflow as a backend:
from keras import backend as K
test_model = Sequential()
test_model.add( Lambda( lambda x: K.sum(x, axis=0), input_shape=(2,3)) )
x = np.reshape(np.arange(12), (2,2,3))
test_model.predict(x)
This returns:
array([[ 6., 8., 10.],
[ 12., 14., 16.]], dtype=float32)
Which is very weird, as it sums over the first index, which to my understanding corresponds to the index of the training data. Also, if I change the axis to axis=1 then the sum is taken over the second coordinate, which is what I would expect to get for axis=0.
What is going on? Why does it seem like the axis chosen effects how the data is passed to the lambda layer?

The input_shape is the shape of one sample of the batch.
It doesn't matter if you have 200 or 10000 samples in a batch, all the samples should be (2,3).
But the batch itself is what is passed along from one layer to another.
A batch contains "n" samples, each sample with the input_shape:
Batch shape then is: (n, 2, 3) -- n samples, each sample with input_shape = (2,3)
You don't define "n" when input_shape is required, because "n" will be defined when you use fit or another training command, with the batch_size. (In your example, n = 2)
This is the original array:
[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]]
Sample 1 = [ 0 1 2], [ 3 4 5]
Sample 2 = [ 6 7 8], [ 9 10 11]
Summing on index 0 (the batch size dimension) will sum sample 1 with sample 2:
[ 6 8 10], [12 14 16]
Summing on index 1 will sum the first dimension of one sample's input shape:
[ 3, 5, 7 ], [15, 17, 19]

Related

Numpy and The best way to remove rows with idendical values

I'm struggling with numpy lib.
I have a tensor of the shape (batch_size, timestep, feature):
For example lets create a dummy:
x = np.arange(42).reshape(2,7,3)
#now make some rows have homogeneous values
x[:,::3,:] =0
x[:,::5,:] =2
Now I need a numpyish way(which is repeatable in tensorflow) to remove rows(axis=-2) where all values are the same. So in the end I need a tensor to look like this:
[[[ 3 4 5]
[ 6 7 8]
[12 13 14]]
[[24 25 26]
[27 28 29]
[33 34 35]]]
Thanks.
P.S. this is not the same question as to "remove all zero rows". Since here we are talking about rows with homo- values. And this is a bit trickier.
If you are okay with losing one dimension (so that your array remains homogeneous), then you can do:
x[~np.all(x == x[:, :, 0, np.newaxis], axis=-1)]
# out:
[[ 3 4 5]
[ 6 7 8]
[12 13 14]
[24 25 26]
[27 28 29]
[33 34 35]]
Credit: #unutbu's answer to a similar problem, here adapted to one more dimension.
Why is the 3rd dimension removed? Imagine if your conditions were such that you wanted to select 2 rows from your first array and 3 from your second: then the result would be heterogeneous, which would have to be stored as a masked array or as a list of arrays.
There might be a more clever way using only numpy. However, you could just iterate over the 2nd dimension and do a comparison.
not_same= []
for n in range(x.shape[1]): # iterate over the 2nd dimension
# test if it is homogeneous i.e. first value equal all values
not_same.append(~np.all(x[:,n,:] ==x[0,n,0]))
out = x[:,not_same,:]
This gives you:
array([[[ 3, 4, 5],
[ 6, 7, 8],
[12, 13, 14]],
[[24, 25, 26],
[27, 28, 29],
[33, 34, 35]]])

Keras training with shuffled tf.data: if training is interrupted, how to continue training at last data iteration/order of last saved checkpoint

I am training with keras model.fit, and the data comes from tf.records, loaded into a tf.data object, which uses .shuffle to shuffle the data. I am also using callbacks.ModelCheckpoint to save the model every x number of steps/batches.
Sometimes my cloud instance disconnects or crashes before an epoch is finished, but the model at y step is saved into my drive.
I would like to finish training over the data in that epoch (I have very long epochs), before training another epoch, so each that each data example is trained over once per epoch.
Is there a way to get the original order of the data, and the place within the data where model was last saved?
What I have found so far
It looks like you can set a specific order in .shuffle by setting the seed. However, shuffling only occurs in the buffer, so I am not 100% sure if setting the seed will perfectly reproduce the order. Also, I am not sure how that will work with reshuffle_each_iteration. Is a different seed used after each epoch? If so, I guess a work around is train only 1 epoch at a time, with a specified seed for each epoch.
Even if I do get a replica of the training order, I'm not sure how to find where in the order was the model last saved, and then to start training from that point. One idea I have to get to the order, is iterate through the dataset manually until I reach it. Although I'm not sure if model.fit() would continue from this order, or start all over. F
For getting the step/batch number from where the model was last saved, I could probably log this somewhere.
These solutions seem like rough workarounds, and I am wondering if there's some features in Keras that I may be overlooking to help with this.
There seem to be no keras build in way to do this, but please correct me if I am wrong.
My Approach
Dataset.shuffle internally uses the initial seed value to generate seeds to be used for reshuffling during iterations when reshuffle_each_iteration=True. So re-create the same order for a particular epoch and continue the training of the epoch at that particular batch we have to re-create the Dataset with same seed and move the dataset iterator to the same epoch and same batch.
Debugging
For debugging and making sure the epochs and batches are generated in same order, we will need a way to print how the data points are picked up in each epoch-batch. This is tricky in kears, so I will for debugging purpose use the regression problem and have ground truth as sequential numbers. Then I can have a custom loss where I can print ground truth and make user the order is correct.
Model and Data
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import keras.backend as K
# Data
x_train = np.random.randn(15, 10).astype("float32")
y_train = np.arange(15).astype("float32")
# Custom MSE looss just to track the order in which data is picked up
def my_mse(y_true, y_pred):
tf.print(tf.keras.backend.flatten(y_true))
loss = K.square(y_pred - y_true)
loss = K.sum(loss, axis=1)
return loss
# Model
def get_model():
inputs = keras.Input(shape=(10))
outputs = layers.Dense(1, activation="linear")(inputs)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(
optimizer="rmsprop",
loss=my_mse,
)
return model
Dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=8, reshuffle_each_iteration=True, seed=0).batch(8)
epochs = 2
print ("Runs 1")
for e in range(epochs):
for i, (x, y) in enumerate(train_dataset):
print (e, i, y)
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=8, reshuffle_each_iteration=True, seed=0).batch(8)
print ("Runs 2")
for e in range(epochs):
for i, (x, y) in enumerate(train_dataset):
print (e, i, y)
Output:
Runs 1
0 tf.Tensor([1. 3. 5. 7. 4. 0. 8. 2.], shape=(8,), dtype=float32)
1 tf.Tensor([ 6. 11. 10. 14. 9. 12. 13.], shape=(7,), dtype=float32)
2 tf.Tensor([4. 2. 5. 8. 1. 9. 7. 3.], shape=(8,), dtype=float32)
3 tf.Tensor([13. 10. 0. 14. 6. 11. 12.], shape=(7,), dtype=float32)
4 tf.Tensor([ 0. 1. 5. 6. 9. 3. 7. 14.], shape=(8,), dtype=float32)
5 tf.Tensor([13. 8. 4. 10. 2. 12. 11.], shape=(7,), dtype=float32)
Runs 2
0 tf.Tensor([1. 3. 5. 7. 4. 0. 8. 2.], shape=(8,), dtype=float32)
1 tf.Tensor([ 6. 11. 10. 14. 9. 12. 13.], shape=(7,), dtype=float32)
2 tf.Tensor([4. 2. 5. 8. 1. 9. 7. 3.], shape=(8,), dtype=float32)
3 tf.Tensor([13. 10. 0. 14. 6. 11. 12.], shape=(7,), dtype=float32)
4 tf.Tensor([ 0. 1. 5. 6. 9. 3. 7. 14.], shape=(8,), dtype=float32)
5 tf.Tensor([13. 8. 4. 10. 2. 12. 11.], shape=(7,), dtype=float32)
Yes with the seed the order is reproduced.
Now let write a method to forward the dataset to a certain epoch and batch combination
def forward(dataset, n=None):
if not n:
return dataset
i = 0
while True:
for _ in dataset:
i += 1
if i == n:
return dataset
Test cases:
Lets run it normally and observe the order
Data from the beginning
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = forward(train_dataset.shuffle(buffer_size=8, reshuffle_each_iteration=True, seed=0).batch(4), None)
model = get_model()
model.fit(train_dataset, epochs=3, verbose=0, workers=4, shuffle=False)
Output:
[7 3 6 10]
[11 0 1 2]
[8 14 9 13]
[12 5 4]
[5 8 6 3]
[1 12 10 9]
[2 11 0 4]
[14 13 7]
[2 3 0 10]
[4 1 13 6]
[8 7 14 11]
[12 5 9]
Data from the nth state of Dataset
Let forward our dataset to 4th iteration and run the training
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = forward(train_dataset.shuffle(buffer_size=8, reshuffle_each_iteration=True, seed=0).batch(4), 4)
model = get_model()
model.fit(train_dataset, epochs=3, verbose=0, workers=4, shuffle=False)
Output:
[5 8 6 3]
[1 12 10 9]
[2 11 0 4]
[14 13 7]
[2 3 0 10]
[4 1 13 6]
[8 7 14 11]
[12 5 9]
Nice, now we know how to forward the dataset correctly. Lets now write callback to track the current iteration number:
Custom callback to track the iteration (epoch-batch combination)
Now we need to identify epoch and batch combination at which the model is check pointed. If we have this information we can load the last check pointed model and forward our dataset to its batch and epoch combination and continue the training. We will do this using the call backs
class MyCustomCallback(tf.keras.callbacks.ModelCheckpoint, keras.callbacks.Callback):
def __init__(self, the_id=0, **args):
self.the_id = the_id
self.epoch = 0
super().__init__(**args)
def _save_model(self, epoch, logs):
logs['the_id'] = self.the_id
super()._save_model(epoch, logs)
def on_batch_end(self, batch, logs={}):
self.the_id += 1
super().on_batch_end(batch, logs)
checkpoint_filepath = 'checkpoint-{the_id}'
model_checkpoint_callback = MyCustomCallback(
filepath=checkpoint_filepath,
save_freq=2,
save_best_only=False)
model = get_model()
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = forward(train_dataset.shuffle(buffer_size=8, reshuffle_each_iteration=True, seed=0).batch(4), None)
model.fit(train_dataset, epochs=5, verbose=0, callbacks=[model_checkpoint_callback], workers=4, shuffle=False)
Output:
[7 3 6 10]
[11 0 1 2]
[8 14 9 13]
[12 5 4]
[5 8 6 3]
[1 12 10 9]
[2 11 0 4]
[14 13 7]
[2 3 0 10]
[4 1 13 6]
[8 7 14 11]
[12 5 9]
We are check pointing for every two batches. So lets assume it crashes and the last checkpoint is checkpoint-4. We can load this model and forward our dataset to 4 and continue training.
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = forward(train_dataset.shuffle(buffer_size=8, reshuffle_each_iteration=True, seed=0).batch(4), 4)
model = get_model()
model.fit(train_dataset, epochs=2, verbose=0, workers=4, shuffle=False)
Output:
[5 8 6 3]
[1 12 10 9]
[2 11 0 4]
[14 13 7]
[2 3 0 10]
[4 1 13 6]
[8 7 14 11]
[12 5 9]
I suppose you want to restore shuffle order to avoid repetition of some samples inside this epoch.
According to shuffle description during not finished epoch you model had access only to the first current_step_number + shuffle_buffer_size samples from you dataset.
So when you restore you training if you know how many steps were processed, you can just skip this steps + skip shuffle_buffer_size steps and you training will be continued on following samples, which was not observed yet inside current epoch.
Note that some random shuffle_buffer_size samples from first part of dataset will not be observed at all during this epoch. As you say your epoch is very long, so, probably you have a lot of data, so losing shuffle_buffer_size samples should not be problem for you.
So during saving checkpoint also save step number, then after loading checkpoint create dataset copy with skipped steps (using dataset.skip), then use model.fit with this smaller dataset for one epoch (to finish current epoch), then continue your training in usual way.

converting a Tensorflow Dataset of time series elements to a Dataset of windowed sequences

I have a tf.data.Dataset(r1.4) whose elements represent a time series. For example (line breaks separate elements):
1
2
3
4
5
6
7
8
9
Now I want to run a window operation on it so that I get a Dataset of sub sequences of length WINDOW_SIZE for training an RNN. For example, for WINDOW_SIZE=4:
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
The closest Dataset op I could find is tf.contrib.data.group_by_window, but not sure how to apply it for this use case.
Another way is to use tf.contrib.data.batch_and_drop_remainder, but it will divide the elements into buckets and won't have all the sub sequences.
A third option I thought of was to create WINDOW_SIZE iterators, and run them individually so that they point to consecutive elements, and then start using them in a sequence. However, this looks quite counter intuitive.
In TensorFlow 2.0, the Dataset class now has a window() method. You can use it like this:
import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))
dataset = dataset.window(5, shift=1, drop_remainder=True)
for window in dataset:
print([elem.numpy() for elem in window])
It will output:
[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]
[5, 6, 7, 8, 9]
I found my self in a similar situation and I solved in that way (I wrote in the comments dataset values at each step in order to make it clear enough):
length = 12
components = np.array([[i] for i in range(length)], dtype=np.int64)
# components = np.arange(6 * 4, dtype=np.int64).reshape((-1, 4))
dataset = dataset_ops.Dataset.from_tensor_slices(components)
window_size = 4
# window consecutive elements with batch
dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(window_size))
# [[0][1][2][3]]
# [[4][5][6][7]]
# [[8][9][10][11]]
# Skip first row and duplicate all rows, this allows the creation of overlapping window
dataset1 = dataset.apply(tf.contrib.data.group_by_window(lambda x: 0, lambda k, d: d.repeat(2), window_size=1)).skip(1)
# [[0][1][2][3]]
# [[4][5][6][7]]
# [[4][5][6][7]]
# [[8][9][10][11]]
# [[8][9][10][11]]
# Use batch to merge duplicate rows into a single row with both value from window(i) and window(i+1)
dataset1 = dataset1.apply(tf.contrib.data.batch_and_drop_remainder(2))
# [ [[0][1][2][3]] [[4][5][6][7]] ]
# [ [[4][5][6][7]] [[8][9][10][11]] ]
# filter with slice only useful values for overlapping windows
dataset1 = dataset1.map(lambda x: filter_overlapping_values(x, window_size))
# [[2][3][4][5]]
# [[6][7][8][9]]
# Now insert overlapping window into the dataset at the right position
dataset = tf.data.Dataset.zip((dataset, dataset1))
# x0: [[0][1][2][3]] x1: [[2][3][4][5]]
# x0: [[4][5][6][7]] x1: [[6][7][8][9]]
# Flat the dataset with original window and the dataset with overlapping window into a single dataset and flat it
dataset = dataset.flat_map(lambda x0, x1: tf.data.Dataset.from_tensors(x0).concatenate(tf.data.Dataset.from_tensors(x1)))
# [[0][1][2][3]]
# [[2][3][4][5]]
# [[4][5][6][7]]
# [[6][7][8][9]]
In the last step you need to merge the overlapping window with:
def filter_overlapping_values(x, window_size):
s1 = tf.slice(x[0], [window_size//2, 0], [-1, -1])
s2 = tf.slice(x[1], [0, 0], [window_size//2, -1])
return tf.concat((s1, s2), axis=0)
This approach works just with even window_size

Indexing per row in TensorFlow

I have a matrix:
Params =
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
For each row I want to select some elements using column indices:
col_indices =
[[0 1]
[1 2]
[2 3]]
In Numpy, I can create row indices:
row_indices =
[[0 0]
[1 1]
[2 2]]
and do params[row_indices, col_indices]
In TenforFlow, I did this:
tf_params = tf.constant(params)
tf_col_indices = tf.constant(col_indices, dtype=tf.int32)
tf_row_indices = tf.constant(row_indices, dtype=tf.int32)
tf_params[row_indices, col_indices]
But there raised an error:
ValueError: Shape must be rank 1 but is rank 3
What does it mean? How should I do this kind of indexing properly?
Thanks!
Tensor rank (sometimes referred to as order or degree or n-dimension) is the number of dimensions of the tensor. For example, the following tensor (defined as a Python list) has a rank of 2:
t = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
A rank two tensor is what we typically think of as a matrix, a rank one tensor is a vector. For a rank two tensor you can access any element with the syntax t[i, j]. For a rank three tensor you would need to address an element with t[i, j, k]. See this for more details.
ValueError: Shape must be rank 1 but is rank 3 means you are trying to create a 3-tensor (cube of numbers) instead of a vector.
To see how you can declare tensor constants of different shape, you can see this.

tensorflow transform a (structured) dense matrix to sparse, when number of rows unknow

My task is to transform a special formed dense matrix tensor into a sparse one. e.g. input matrix M as followed (dense positive integer sequence followed by 0 as padding in each row)
[[3 5 7 0]
[2 2 0 0]
[1 3 9 0]]
Additionally, given the non-padding length for each row, e.g. given by tensor L =
[3, 2, 3].
The desired output would be sparse tensor S.
SparseTensorValue(indices=array([[0, 0],[0, 1],[0, 2],[1, 0],[1, 1],[2, 0],[2, 1], [2, 2]]), values=array([3, 5, 7, 2, 2, 1, 3, 9], dtype=int32), shape=array([3, 4]))
This is useful in models where objects are described by variable-sized descriptors (S are then used in embedding_lookup_sparse to connect embeddings of descriptors.)
I am able to do it when number of M's row is known (by python loop and ops like slice and concat). However, M's row number here is determined by mini-batch size and could change (say in testing phase). Is there a good way to implement that? I am trying some control_flow_ops but haven't succeeded.
Thanks!!