Why does sharing layers in keras make building the graph extremely slow (tensorflow backend) - tensorflow

I am building a graph where the input is split into a list of tensors of length 30. I then use a shared RNN layer on each element of the list.
It takes ~ 1 minute until the model is compiled. Does it have to be like this (why?) or is there anything I am doing wrong?
Code:
shared_lstm = keras.layers.LSTM(4, return_sequences=True)
shared_dense = TimeDistributed(keras.layers.Dense(1, activation='sigmoid'))
inp_train = keras.layers.Input([None, se.action_space, 3])
# Split each possible measured label into a list:
inputs_train = [ keras.layers.Lambda(lambda x: x[:, :, i, :])(inp_train) for i in range(se.action_space) ]
# Apply the shared weights on each tensor:
lstm_out_train = [shared_lstm(x) for x in inputs_train]
dense_out_train = [(shared_dense(x)) for x in lstm_out_train]
# Merge the tensors again:
out_train = keras.layers.Lambda(lambda x: K.stack(x, axis=2))(dense_out_train)
# "Pick" the unique element along where the inp_train tensor is == 1.0 (along axis=2, in the next time step, of the first dimension of axis=3)
# (please disregard this line if it seems too complex)
shift_and_pick_layer = keras.layers.Lambda(lambda x: K.sum(x[0][:, :-1, :, 0] * x[1][:, 1:, :, 0], axis=2))
out_train = shift_and_pick_layer([out_train, inp_train])
m_train = keras.models.Model(inp_train, out_train)

Related

How to make 2 tensors the same length by mean | median imputation of the shortest tensor?

I'm trying to subclass a the base Keras layer to create a layer that will merge the rank 1 output of 2 layers of a skip connection by outputting the Dot product of 2 tensors. The 2 incoming tensors are created by Dense layers parsed by a Neural Architecture Search algorithm that randomly selects the number of Dense units and hence the length of the 2 tensors. These of course will usually not be of the same length. I am trying an experiment to see if casting them to the same length by means of appending the shorter tensor with a mathematically meaningful imputation: [e.g. mean | median | hypotenuse | cos | ... etc] then merging them by means of the dot product will outperform Add or Concatenate merging strategies. To make them the same length:
I try the overall strategy:
Find the shorter tensor.
Pass it to tf.reduce_mean() (aliasing the resulting mean as "rm" for the sake of discussion).
Create a list of [rm for rm in range(['difference in length of the longer tensor and the shorter tensor']). Cast as a tensor if necessary.
[pad | concatenate] the shorter tensor with the result of the operation above to make it equal in length.
Here is where I am running into a dead wall:
Since the tf operation reduce_mean is returning a future with its shape set as None (not assumed to be a scalar of 1), they are in a state of having a shape of '(None,)', which the tf.keras.layers.Dot layer refuses to ingest and throws a ValueError, as it does not see them as being the same length, though they always will be:
KerasTensor(type_spec=TensorSpec(shape=(None,), dtype=tf.float32, name=None), name='tf.math.reduce_mean/Mean:0', description="created by layer 'tf.math.reduce_mean'")
ValueError: A Concatenate layer should be called on a list of at least 1 input. Received: input_shape=[[(None,), (None,)], [(None, 3)]]
My code (in the package/module):
import tensorflow as tf
import numpy as np
class Linear1dDot(tf.keras.layers.Layer):
def __init__(self, input_dim=None,):
super(Linear1dDot, self).__init__()
def __call__(self, inputs):
max_len = tf.reduce_max(tf.Variable(
[inp.shape[1] for inp in inputs]))
print(f"max_len:{max_len}")
for i in range(len(inputs)):
inp = inputs[i]
print(inp.shape)
inp_lenght = inp.shape[1]
if inp_lenght < max_len:
print(f"{inp_lenght} < {max_len}")
# pad_with = inp.reduce_mean()
pad_with = tf.reduce_mean(inp, axis=1)
print(pad_with)
padding = [pad_with for _ in range(max_len - inp_lenght)]
inputs[i] = tf.keras.layers.concatenate([padding, [inp]])
# inputs[i] = tf.reshape(
# tf.pad(inp, padding, mode="constant"), (None, max_len))
print(inputs)
return tf.keras.layers.Dot(axes=1)(inputs)
...
# Alternatively substituting the last few lines with:
pad_with = tf.reduce_mean(inp, axis=1, keepdims=True)
print(pad_with)
padding = tf.keras.layers.concatenate(
[pad_with for _ in range(max_len - inp_lenght)])
inputs[i] = tf.keras.layers.concatenate([padding, [inp]])
# inputs[i] = tf.reshape(
# tf.pad(inp, padding, mode="constant"), (None, max_len))
print(inputs)
return tf.keras.layers.Dot(axes=1)(inputs)
... and countless other permutations of attempts ...
Does anyone know a workaround or have any advice? (other than 'Don't try to do this.')?
In the parent folder of this module's package ...
Test to simulate a skip connection merging into the current layer:
from linearoneddot.linear_one_d_dot import Linear1dDot
x = tf.constant([1, 2, 3, 4, 5])
y = tf.constant([0, 9, 8])
inp1 = tf.keras.layers.Input(shape=3)
inp2 = tf.keras.layers.Input(shape=5)
xd = tf.keras.layers.Dense(3, "relu")(inp1)
yd = tf.keras.layers.Dense(5, 'elu')(inp2)
combined = Linear1dDot()([xd, yd]) # tf.keras.layers.Dot(axes=1)([xd, yd])
z = tf.keras.layers.Dense(2)(combined)
model = tf.keras.Model(inputs=[inp1, inp2], outputs=z) # outputs=z)
print(model([x, y]))
print(model([np.random.random((3, 3)), np.random.random((3, 5))]))
Does anyone know a workaround that will be able to get the mean of the shorter rank 1 tensor as a scalar, which I can then append / pad to the shorter tensor to a set intended langth (same length as the longer tensor).
Try this, hope this will work, Try to padd the shortest input with 1, and then concat it with the input then take the dot product, then finally subtract the extra ones which were added in the dot product...
class Linear1dDot(tf.keras.layers.Layer):
def __init__(self,**kwargs):
super(Linear1dDot, self).__init__()
def __call__(self, inputs):
_input1 , _input2 = inputs
_input1_shape = _input1.shape[1]
_input2_shape = _input2.shape[1]
difference = tf.math.abs(_input1_shape - _input2_shape)
padded_input = tf.ones(shape=(1,difference))
if _input1_shape > _input2_shape:
padded_tensor = tf.concat([_input2 ,padded_input],axis=1)
scaled_output = tf.keras.layers.Dot(axes=1)([padded_tensor, _input1])
scaled_output -= tf.reduce_sum(padded_input)
return scaled_output
else:
padded_tensor = tf.concat([_input1 , padded_input],axis=1)
scaled_output = tf.keras.layers.Dot(axes=1)([padded_tensor, _input2])
scaled_output -= tf.reduce_sum(padded_input)
return scaled_output
x = tf.constant([[1, 2, 3, 4, 5, 9]])
y = tf.constant([[0, 9, 8]])
inp1 = tf.keras.layers.Input(shape=3)
inp2 = tf.keras.layers.Input(shape=5)
xd = tf.keras.layers.Dense(5, "relu")(x)
yd = tf.keras.layers.Dense(3, 'elu')(y)
combined = Linear1dDot()([xd, yd]) # tf.keras.layers.Dot(axes=1)([xd, yd])
Output:
<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[4.4694786]], dtype=float32)>

Custom TensorFlow loss function with batch size > 1?

I have some neural network with following code snippets, note that batch_size == 1 and input_dim == output_dim:
net_in = tf.Variable(tf.zeros(shape = [batch_size, input_dim]), dtype=tf.float32)
input_placeholder = tf.compat.v1.placeholder(shape = [batch_size, input_dim], dtype=tf.float32)
assign_input = net_in.assign(input_placeholder)
# Some matmuls, activations, dropouts, normalizations...
net_out = tf.tanh(output_before_activation)
def loss_fn(output, input):
#input.shape = output.shape = (batch_size, input_dim)
output = tf.reshape(output, [input_dim,]) # shape them into 1d vectors
input = tf.reshape(input, [input_dim,])
return my_fn_that_only_takes_in_vectors(output, input)
# Create session, preprocess data ...
for epoch in epoch_num:
for batch in range(total_example_num // batch_size):
sess.run(assign_input, feed_dict = {input_placeholder : some_appropriate_numpy_array})
sess.run(optimizer.minimize(loss_fn(net_out, net_in)))
Currently the neural network above works fine, but it is very slow because it updates gradient every sample (batch size = 1). I would like to set batch size > 1, but my_fn_that_only_takes_in_vectors cannot accommodate matrices whose first dimension is not 1. Due to the nature of my custom loss, flattening the batch input into a vector of length (batch_size * input_dim) seems to not work.
How would I write my new custom loss_fn now that the input and output are N x input_dim where N > 1? In Keras this would not have been an issue because keras somehow takes the average of the gradients of each example in the batch. For my TensorFlow function, should I take each row as a vector individually, pass them to my_fn_that_only_takes_in_vectors, then take the average of the results?
You can use a function that computes the loss on the whole batch, and works independently on the batch size. Basically the operations are applied to the whole first dimension of the input (the first dimension represents the element number in the batch). Here is an example, I hope this helps to see how the operations are carried out:
def my_loss(y_true, y_pred):
dx2 = tf.math.squared_difference(y_true[:, 0], y_true[:, 2]) # shape (BatchSize, )
dy2 = tf.math.squared_difference(y_true[:, 1], y_true[:, 3]) # shape: (BatchSize, )
denominator = dx2 + dy2 # shape: (BatchSize, )
dst_vec = tf.math.squared_difference(y_true, y_pred) # shape: (Batch, n_labels)
numerator = tf.reduce_sum(dst_vec, axis=-1) # shape: (BatchSize,)
loss_vector = tf.cast(numerator / denominator, dtype="float32") # shape: (BatchSize,) this is a vector containing the loss of each element of the batch
loss = tf.reduce_sum(loss_vector ) #if you want to sum the losses
return loss
I am not sure whether you need to return the sum or the avg of the losses for the batch.
If you sum, make sure to use a validation dataset with same batch size, otherwise the loss is not comparable.

Custom Merge Function for Different Size Tensors in Tensorflow

I have two tensors of different sizes and want to write a custom merge function
a = tf.constant([[1,2,3]])
b = tf.constant([[1,1,2,2,3,3]])
I want to take the dot product of each point in tensor a with two points in tensor b. So in the example above element 1 in a is multiplied with the first two elements in b and so on. I'm unsure of how to do loops in tensorflow:
def customMergeFunct(x):
# not sure how to write a loop over a tensor
The output should be:
c = Lambda(customMergeFunct)([a,b])
with tf.Session() as sess:
print(c.eval())
=> [[2,8,18]]
I'm not exactly sure why you call this a merge function. You don't really need to define a custom function. You can do this with a simple lambda function. Here's my solution.
import tensorflow as tf
from tensorflow.keras.layers import Lambda
import tensorflow.keras.backend as K
a = tf.constant([[1,2,3]])
b = tf.constant([[1,1,2,2,3,3]])
a_res = tf.reshape(a,[-1,1]) # make a.shape [3,1]
b_res = tf.reshape(b,[-1,2]) # make b.shape [3,2]
layer = Lambda(lambda x: K.sum(x[0]*x[1],axis=1))
res = layer([a_res,b_res])
with tf.Session() as sess:
print(res.eval())
You can do something following:
a = tf.constant([[1,2,3]]) # Shape: (1, 3)
b = tf.constant([[1,1,2,2,3,3]]) # Shape: (1, 6)
def customMergeFunct(x):
# a_ = tf.tile(x[0], [2, 1]) # Duplicating 2 times (Original) # Update: No need of doing this as tf.multiply will use Broadcasting
b_ = tf.transpose(tf.reshape(x[1], [-1, 2])) # reshaping followed by transpose to make a shape of (2, 3) to serve the purpose + multiplication rule
return tf.reduce_sum(tf.multiply(x[0], b_), axis=0) # Element-wise multiplication followed by sum
# Using function
c = Lambda(customMergeFunct)([a,b])
# OR in a reduced form
c = Lambda(lambda x: tf.reduce_sum(tf.multiply(x[0], tf.transpose(tf.reshape(x[1], [-1, 2]))), axis=0))([a,b])
Output:
with tf.Session() as sess:
print(c.eval()) # Output: [2 8 18]
# OR in eager mode
print(c.numpy()) # Output: [2 8 18]
Updated solution is computationally efficient than the original solution as we don't actually need to apply tile on x[0]

Cannot get GradientTape to give non null results

I am trying to manually implement a very simple RNN using tensorflow2. I modeled my code on the example to manually make models on tensorflow website. The code, stripped to bare essentials for this purpose, is
class ModelSimple(object):
def __init__(self):
# Initialize the weights to `5.0` and the bias to `0.0`
# In practice, these should be initialized to random values (for example, with `tf.random.normal`)
self.W = tf.Variable(tf.random.normal([]))
self.b = tf.Variable(tf.random.normal([]))
def __call__(self, x):
return self.W * x + self.b
def loss(predicted_y, target_y):
return tf.reduce_mean(tf.square(predicted_y - target_y))
NUM_EXAMPLES = 1000
inputs = tf.random.normal(shape=[NUM_EXAMPLES])
outputs = tf.zeros(NUM_EXAMPLES)
model = ModelSimple()
with tf.GradientTape() as t:
t.watch([model.W,model.b])
current_loss = loss(model(inputs), outputs)
dW, db = t.gradient(current_loss, [model.W, model.b])
print(dW,db)
This gives nice tensors for dW and db. Then I try to do what I described above
class ModelRNN(object):
def __init__(self, n_inputs, n_neurons):
self.n_inputs = n_inputs
self.n_neurons = n_neurons
# weights for new input
self.Wx = tf.Variable(tf.random.normal(shape=[self.n_inputs, self.n_neurons], dtype=tf.float32))
# weights for previous output
self.Wy = tf.Variable(tf.random.normal(shape=[self.n_neurons, self.n_neurons], dtype=tf.float32))
# bias weights
self.b = tf.Variable(tf.zeros([1, self.n_neurons], dtype=tf.float32))
def __call__(self, X_batch):
# get shape of input
batch_size, num_time_steps, _ = X_batch.get_shape()
# we will loop through the time steps and the output of the previous computation feeds into
# the next one.
# this variable keeps track of it and is initialized to zero
y_last = tf.Variable(tf.zeros([batch_size, self.n_neurons], dtype=tf.float32))
# the outputs will be stored in this tensor
Ys = tf.Variable(tf.zeros([batch_size, num_time_steps, self.n_neurons], dtype=tf.float32))
for t in range(num_time_steps):
Xt = X_batch[:, t, :]
yt = tf.tanh(tf.matmul(y_last, self.Wy) +
tf.matmul(Xt, self.Wx) +
self.b)
y_last.assign(yt)
Ys[:, t, :].assign(yt)
return Ys
inputs = tf.convert_to_tensor(np.array([
# t = 0 t = 1
[[0, 1, 2], [9, 8, 7]], # instance 1
[[3, 4, 5], [0, 0, 0]], # instance 2
[[6, 7, 8], [6, 5, 4]], # instance 3
[[9, 0, 1], [3, 2, 1]], # instance 4
],dtype=np.float32))
outputs=tf.Variable(tf.zeros((4,2,5),dtype=np.float32))
model = ModelRNN(3, 5)
with tf.GradientTape() as t:
t.watch([model.Wx,model.Wy,model.b])
current_loss = loss(model(inputs), outputs)
dWx,dWy,db = t.gradient(current_loss, [model.Wx, model.Wy,model.b])
print(dWx,dWy,db)
and it turns out dWx,dWy,db are all None. I have tried several things (including watching them using the GradientTape despite them being variables) and yet I keep getting None. What am I doing wrong?
It looks like this is related to this issue:
Tensorflow cannot get gradient wrt a Variable, but can wrt a Tensor
Replacing assign with a python list and tf.stack results in a gradient being returned
Ys = []
for t in range(num_time_steps):
Xt = X_batch[:, t, :]
yt = tf.tanh(tf.matmul(y_last, self.Wy) +
tf.matmul(Xt, self.Wx) +
self.b)
y_last.assign(yt)
Ys.append(yt)
return tf.stack(Ys,axis=1)

TensorFlow: How to embed float sequences to fixed size vectors?

I am looking methods to embed variable length sequences with float values to fixed size vectors. The input formats as following:
[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]-> ... -> [f1,f2,f3,f4]
[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]-> ... -> [f1,f2,f3,f4]
...
[f1,f2,f3,f4]-> ... -> ->[f1,f2,f3,f4]
Each line is a variable length sequnece, with max length 60. Each unit in one sequece is a tuple of 4 float values. I have already paded zeros to fill all sequences to the same length.
The following architecture seems solve my problem if I use the output as the same as input, I need the thought vector in the center as the embedding for the sequences.
In tensorflow, I have found tow candidate methods tf.contrib.legacy_seq2seq.basic_rnn_seq2seq and tf.contrib.legacy_seq2seq.embedding_rnn_seq2seq.
However, these tow methos seems to be used to solve NLP problem, and the input must be discrete value for words.
So, is there another functions to solve my problems?
All you need is only an RNN, not the seq2seq model, since seq2seq goes with an additional decoder which is unecessary in your case.
An example code:
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn
input_size = 4
max_length = 60
hidden_size=64
output_size = 4
x = tf.placeholder(tf.float32, shape=[None, max_length, input_size], name='x')
seqlen = tf.placeholder(tf.int64, shape=[None], name='seqlen')
lstm_cell = rnn.BasicLSTMCell(hidden_size, forget_bias=1.0)
outputs, states = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=x, sequence_length=seqlen, dtype=tf.float32)
encoded_states = states[-1]
W = tf.get_variable(
name='W',
shape=[hidden_size, output_size],
dtype=tf.float32,
initializer=tf.random_normal_initializer())
b = tf.get_variable(
name='b',
shape=[output_size],
dtype=tf.float32,
initializer=tf.random_normal_initializer())
z = tf.matmul(encoded_states, W) + b
results = tf.sigmoid(z)
###########################
## cost computing and training components goes here
# e.g.
# targets = tf.placeholder(tf.float32, shape=[None, input_size], name='targets')
# cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=targets, logits=z))
# optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(cost)
###############################
init = tf.global_variables_initializer()
batch_size = 4
data_in = np.zeros((batch_size, max_length, input_size), dtype='float32')
data_in[0, :4, :] = np.random.rand(4, input_size)
data_in[1, :6, :] = np.random.rand(6, input_size)
data_in[2, :20, :] = np.random.rand(20, input_size)
data_in[3, :, :] = np.random.rand(60, input_size)
data_len = np.asarray([4, 6, 20, 60], dtype='int64')
with tf.Session() as sess:
sess.run(init)
#########################
# training process goes here
#########################
res = sess.run(results,
feed_dict={
x: data_in,
seqlen: data_len})
print(res)
To encode sequence to a fixed length vector you typically use recurrent neural networks (RNNs) or convolutional neural networks (CNNs).
If you use a recurrent neural network you can use the output at the last time step (last element in your sequence). This corresponds to the thought vector in your question. Have a look at tf.dynamic_rnn. dynamic_rnn requires you to specify to type of RNN cell you want to use. tf.contrib.rnn.LSTMCell and tf.contrib.rnn.GRUCell are most common.
If you want to use CNNs you need to use 1 dimensional convolutions. To build CNNs you need tf.layers.conv1d and tf.layers.max_pooling1d
I have found a solution to my problem, using the following architecture,
,
The LSTMs layer below encode the series x1,x2,...,xn. The last output, the green one, is duplicated to the same count as the input for the decoding LSTM layers above. The tensorflow code is as following
series_input = tf.placeholder(tf.float32, [None, conf.max_series, conf.series_feature_num])
print("Encode input Shape", series_input.get_shape())
# encoding layer
encode_cell = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.BasicLSTMCell(conf.rnn_hidden_num, reuse=False) for _ in range(conf.rnn_layer_num)]
)
encode_output, _ = tf.nn.dynamic_rnn(encode_cell, series_input, dtype=tf.float32, scope='encode')
print("Encode output Shape", encode_output.get_shape())
# last output
encode_output = tf.transpose(encode_output, [1, 0, 2])
last = tf.gather(encode_output, int(encode_output.get_shape()[0]) - 1)
# duplite the last output of the encoding layer
decoder_input = tf.stack([last for _ in range(conf.max_series)], axis=1)
print("Decoder input shape", decoder_input.get_shape())
# decoding layer
decode_cell = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.BasicLSTMCell(conf.series_feature_num, reuse=False) for _ in range(conf.rnn_layer_num)]
)
decode_output, _ = tf.nn.dynamic_rnn(decode_cell, decoder_input, dtype=tf.float32, scope='decode')
print("Decode output", decode_output.get_shape())
# Loss Function
loss = tf.losses.mean_squared_error(labels=series_input, predictions=decode_output)
print("Loss", loss)