I'm trying to train an autoencoder, with constraints that force one or more of the hidden/encoded nodes/neurons to have an interpretable value. My training approach uses paired images (though after training the model should operate on a single image) and utilizes a joint loss function that includes (1) the reconstruction loss for each of the images and (2) a comparison between values of the hidden/encoded vector, from each of the two images.
I've created an analogous simple toy problem and model to make this clearer. In the toy problem, the autoencoder is given a vector of length 3 as input. The encoding uses one dense layer to compute the mean (a scalar) and another dense layer to compute some other representation of the vector (given my construction, it will likely just learn an identity matrix, i.e., copy the input vector). See the figure below. The lowest node of the hidden layer is intended to compute the mean of the input vector. The rest of the hidden nodes are unconstrained aside from having to accommodate a reconstruction that matches the input.
The figure below exhibits how I wish to train the model, using paired images. "MSE" is mean-squared-error, although the identity of the actual function is not important for the question I'm asking here. The loss function is the sum of the reconstruction loss and the mean-estimation loss.
I've tried creating (1) a tf.data.Dataset to generate paired vectors, (2) a Keras model, and (3) a custom loss function. However, I'm failing to understand how to do this correctly for this particular situation.
I can't get the Model.fit() to run correctly, and to associate the model outputs with the Dataset targets as intended. See code and errors below. Can anyone help? I've done many Google and stackoverflow searches and still don't understand how I can implement this.
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
DTYPE = tf.dtypes.float32
N_VEC = 3
def my_generator(n):
while True:
# Create two identical vectors of length, except with different means.
# An internal layer (single neuron) of the model should predict the
# mean of the input vector. To train it to do so, with paired
# vector inputs, use a loss function that penalizes incorrect
# predictions of the difference of the means of two input vectors.
input_vec1 = tf.random.normal((n,), dtype=DTYPE)
target_mean_diff = tf.random.normal((1,), dtype=DTYPE)
input_vec2 = input_vec1 + target_mean_diff
# Model is a constrained autoencoder. Output targets are
# identical to the input vectors. Including them as explicit
# targets in this generator, for generalization.
target_vec1 = tf.identity(input_vec1)
target_vec2 = tf.identity(input_vec2)
yield ({'input_vec1':input_vec1,
'input_vec2':input_vec2},
{'target_vec1':target_vec1,
'target_vec2':target_vec2,
'target_mean_diff':target_mean_diff})
def my_dataset(n, batch_size=4):
ds = tf.data.Dataset.from_generator(my_generator,
output_signature=({'input_vec1':tf.TensorSpec(shape=(n,), dtype=DTYPE),
'input_vec2':tf.TensorSpec(shape=(n,), dtype=DTYPE)},
{'target_vec1':tf.TensorSpec(shape=(n,), dtype=DTYPE),
'target_vec2':tf.TensorSpec(shape=(n,), dtype=DTYPE),
'target_mean_diff':tf.TensorSpec(shape=(1,), dtype=DTYPE)}),
args=(n,))
ds = ds.batch(batch_size)
return ds
## Do a brief test using the Dataset
ds = my_dataset(N_VEC, batch_size=4)
ds_iter = iter(ds)
dict_inputs, dict_targets = next(ds_iter)
print(dict_inputs)
print(dict_targets)
## Define the Model
layer_encode_vec = tf.keras.layers.Dense(N_VEC, activation=None, name='encode_vec')
layer_decode_vec = tf.keras.layers.Dense(N_VEC, activation=None, name='decode_vec')
layer_encode_mean = tf.keras.layers.Dense(1, activation=None, name='encode_mean')
layer_decode_mean = tf.keras.layers.Dense(N_VEC, activation=None, name='decode_mean')
input1 = tf.keras.Input(shape=(N_VEC,), name='input_vec1')
input2 = tf.keras.Input(shape=(N_VEC,), name='input_vec2')
vec_encoded1 = layer_encode_vec(input1)
vec_encoded2 = layer_encode_vec(input2)
mean_encoded1 = layer_encode_mean(input1)
mean_encoded2 = layer_encode_mean(input2)
mean_diff = mean_encoded2 - mean_encoded1
pred_vec1 = layer_decode_vec(vec_encoded1) + layer_decode_mean(mean_encoded1)
pred_vec2 = layer_decode_vec(vec_encoded2) + layer_decode_mean(mean_encoded2)
model = tf.keras.Model(inputs=[input1, input2], outputs=[pred_vec1, pred_vec2, mean_diff])
print(model.summary())
## Define the joint loss function
def loss_total(y_true, y_pred):
loss_reconstruct = tf.reduce_mean(tf.keras.MSE(y_true[0], y_pred[0]))/2 + \
tf.reduce_mean(tf.keras.MSE(y_true[1], y_pred[1]))/2
loss_mean = tf.reduce_mean(tf.keras.MSE(y_true[2], y_pred[2]))
return loss_reconstruct + loss_mean
## Compile model
optimizer = tf.keras.optimizers.Adam(lr=0.01)
model.compile(optimizer=optimizer, loss=loss_total)
## Train model
history = model.fit(x=ds, epochs=10, steps_per_epoch=10)
Output: Example batch from the Dataset:
{'input_vec1': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.53022575, -0.02389329, 0.32843253],
[-0.61793506, -0.8276422 , -1.3469328 ],
[-0.5401968 , 0.3141346 , -1.3638284 ],
[-1.2189807 , 0.23848908, 0.75108534]], dtype=float32)>, 'input_vec2': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.23415083, 0.27218163, 0.6245074 ],
[-0.57636774, -0.7860749 , -1.3053654 ],
[ 0.65463066, 1.508962 , -0.16900098],
[-0.49326736, 0.9642024 , 1.4767987 ]], dtype=float32)>}
{'target_vec1': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.53022575, -0.02389329, 0.32843253],
[-0.61793506, -0.8276422 , -1.3469328 ],
[-0.5401968 , 0.3141346 , -1.3638284 ],
[-1.2189807 , 0.23848908, 0.75108534]], dtype=float32)>, 'target_vec2': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.23415083, 0.27218163, 0.6245074 ],
[-0.57636774, -0.7860749 , -1.3053654 ],
[ 0.65463066, 1.508962 , -0.16900098],
[-0.49326736, 0.9642024 , 1.4767987 ]], dtype=float32)>, 'target_mean_diff': <tf.Tensor: shape=(4, 1), dtype=float32, numpy=
array([[0.29607493],
[0.04156734],
[1.1948274 ],
[0.7257133 ]], dtype=float32)>}
Output: The model summary:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_vec1 (InputLayer) [(None, 3)] 0
__________________________________________________________________________________________________
input_vec2 (InputLayer) [(None, 3)] 0
__________________________________________________________________________________________________
encode_vec (Dense) (None, 3) 12 input_vec1[0][0]
input_vec2[0][0]
__________________________________________________________________________________________________
encode_mean (Dense) (None, 1) 4 input_vec1[0][0]
input_vec2[0][0]
__________________________________________________________________________________________________
decode_vec (Dense) (None, 3) 12 encode_vec[0][0]
encode_vec[1][0]
__________________________________________________________________________________________________
decode_mean (Dense) (None, 3) 6 encode_mean[0][0]
encode_mean[1][0]
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None, 3) 0 decode_vec[0][0]
decode_mean[0][0]
__________________________________________________________________________________________________
tf.__operators__.add_1 (TFOpLam (None, 3) 0 decode_vec[1][0]
decode_mean[1][0]
__________________________________________________________________________________________________
tf.math.subtract (TFOpLambda) (None, 1) 0 encode_mean[1][0]
encode_mean[0][0]
==================================================================================================
Total params: 34
Trainable params: 34
Non-trainable params: 0
__________________________________________________________________________________________________
Output: The error message when calling model.fit():
Epoch 1/10
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: Found unexpected keys that do not correspond to any
Model output: dict_keys(['target_vec1', 'target_vec2', 'target_mean_diff']).
Expected: ['tf.__operators__.add', 'tf.__operators__.add_1', 'tf.math.subtract']
You can pass a dict to Model for both inputs and outputs like so:
model = tf.keras.Model(
inputs={"input_vec1": input1, "input_vec2": input2},
outputs={
"target_vec1": pred_vec1,
"target_vec2": pred_vec2,
"target_mean_diff": mean_diff,
},
)
which avoids having to name the output layers.
For the losses, it's currently applying loss_total to each of the 3 outputs individually and summing to get the final loss, which is not what you want. So you can either break out each of the losses individually:
model.compile(
optimizer=optimizer,
loss={"target_vec1": "mse", "target_vec2": "mse", "target_mean_diff": "mse"},
loss_weights={"target_vec1": 0.5, "target_vec2": 0.5, "target_mean_diff": 1},
)
or you can manually train the model using a modified loss function that takes dict input. Something like:
def loss_total(y_true, y_pred):
loss_reconstruct = (
tf.reduce_mean(tf.keras.losses.MSE(y_true["target_vec1"], y_pred["target_vec1"])) / 2
+ tf.reduce_mean(tf.keras.losses.MSE(y_true["target_vec2"], y_pred["target_vec2"])) / 2
)
loss_mean = tf.reduce_mean(tf.keras.losses.MSE(y_true["target_mean_diff"], y_pred["target_mean_diff"]))
return loss_reconstruct + loss_mean
for epoch in range(10):
for batch, (x, y) in zip(range(10), ds):
with tf.GradientTape() as tape:
outputs = model(x, training=True)
loss = loss_total(y, outputs)
trainable_vars = model.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
optimizer.apply_gradients(zip(gradients, trainable_vars))
print(f"Batch: {batch}, loss: {loss.numpy()}")
this is my first question on Stackoverflow, so if I missed somehting please point it out to me.
I have a Problem with my Lambda layer using keras and tensorflow 1. In this Lambda layer I am taking a 100-dimensional glove Vector as Input and compute cosine similarity to 8 other vectors (I converted to Tensors previously). As ouput I want the eight resulting cosine similarities as a Tensor (I thought this is necessary in tensorflow?).
My Problem now is that the shape of the resulting Tensor obviously is (8, 1), but actually I think I Need the Output shape (None, 8). Otherwise it will not match the subsequent layer in my Network which is the Output layer and should Output six class probabilities.
This is the Code for my custom function I feed into the Lambda layer and took from Sentence similarity using keras:
from keras import backend as K
def cosine_distance(ref_vector):
sess = K.get_session()
global emo_vec_array
ref_vector = K.l2_normalize(ref_vector, axis=-1)
cos_sim_list = []
for emo_vector in emo_vec_array:
emo_vector = K.l2_normalize(emo_vector, axis=-1)
cos_sim = K.mean(ref_vector * emo_vector, axis=-1, keepdims=True)*100
cos_sim_list.append(cos_sim[0])
return tf.convert_to_tensor(cos_sim_list)
def cos_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1, 8)
test_vector = tf.convert_to_tensor(embeddings_index['happy'], dtype='float32')
test_result = cosine_distance(test_vector)
array = sess.run(test_result)
Output here, when printing the test result and the converted Tensor is:
Tensor("packed_53:0", shape=(8,), dtype=float32)
[0.5166239 0.2958691 0.317714 0.44583628 0.39608976 0.4195615 0.6432581 0.2618766 ]
The result is as I want it, but the Output shapes in my NN are not Right. These are the last few layers with the respective Output shapes following:
hidden = Dense(vector_dimension, activation='relu')(attention)
distance = Lambda(cosine_distance)(hidden)
out = Dense(6, activation='softmax')(distance)
dense_41 (Dense) (None, 100) 20100
_________________________________________________________________
lambda_26 (Lambda) (8, 1) 0
_________________________________________________________________
dense_42 (Dense) (8, 6) 12
What I want at the end is the following:
dense_41 (Dense) (None, 100) 20100
_________________________________________________________________
lambda_26 (Lambda) (None, 8) 0
_________________________________________________________________
dense_42 (Dense) (None, 6) 12
I already tried K.transpose-ing the Tensor and experimenting with the Output-shape-function but that hadn't the desired effect.
Any help would be very highly appreciated.
I hope I could make clear my Problem and thank you very much in Advance.
Simply change your cosine computation to a vectorized operation,
def cosine_dist(inp):
# I decided to have this as a variable within the function.
# But you can also define this outside and pass it as an input to the function.
emo_vectors = tf.ones(shape=(8,100))
def normalize(x):
return x / K.sum(x**2, axis=1, keepdims=True)
inp = normalize(inp)
emo_vectors = normalize(emo_vectors)
cdist = K.dot(inp, K.transpose(emo_vectors))
return cdist
Here's an example of this in use,
inp = layers.Input(shape=(100))
hidden = layers.Lambda(lambda x: cosine_dist(x))(inp)
model = models.Model(inputs=inp, outputs=hidden)
model.summary()
Which gives,
Model: "model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) [(None, 100)] 0
_________________________________________________________________
lambda_7 (Lambda) (None, 8) 0
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
As you can see, the output of the lambda layer is (None, 8) now.
After another while I was also able to find a second solution. The trick was to account for a flexible batch size. Here is the altered code of the cosine function:
from keras import backend as K
def cosine_distance(ref_vector):
global emo_vec_array
ref_vector = K.l2_normalize(ref_vector, axis=-1)
cos_sim_list = []
for emo_vector in emo_vec_array:
emo_vector = K.l2_normalize(emo_vector, axis=-1)
emo_vector = tf.reshape(emo_vector, [emo_vector.shape[0], 1])
cos_sim = K.dot(ref_vector, emo_vector)
cos_sim_list.append(cos_sim)
result = tf.convert_to_tensor(cos_sim_list)
result = tf.reshape(result, [len(emo_vec_array), -1])
result = tf.transpose(result)
return result
For reinforcement learning I would like to explicitly
compute the neural network gradient with respect to output softmax probabilities
update the neural network weights by gradients * advantage score of actions. (increase probability of successful actions, decrease probability of unsuccessful actions)
I created an agent with a simple policy network:
def simple_policy_model(self):
inputs = Input(shape=(self.state_size,), name="Input")
outputs = Dense(self.action_size, activation='softmax', name="Output")(inputs)
predict_model = Model(inputs=[inputs], outputs=[outputs])
return predict_model
Then I try to get gradients:
agent = REINFORCE_Agent(state_size=env.observation_space.shape[0],
action_size=env.action_space.n)
print(agent.predict_model.summary())
state_memory = np.random.uniform(size=(3,4))/10
#state_memory = tf.convert_to_tensor(state_memory)
print(state_memory)
print(agent.predict_model.predict(state_memory))
with tf.GradientTape() as tape:
probs = agent.predict_model.predict(state_memory)
### fails below ###
grads = tape.gradient(probs, agent.predict_model.trainable_weights)
Output:
Model: "model_18"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Input (InputLayer) (None, 4) 0
_________________________________________________________________
Output (Dense) (None, 2) 10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
None
state_memory [[0.01130021 0.01476066 0.09524527 0.05552276]
[0.02018996 0.03127809 0.07232339 0.07146596]
[0.08925738 0.08890574 0.04845396 0.0056015 ]]
prediction [[0.5127161 0.4872839 ]
[0.5063317 0.49366832]
[0.4817074 0.51829267]]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
...
AttributeError: 'numpy.dtype' object has no attribute 'is_floating'
If I convert state_memory to tensor by uncommenting convert_to_tensor it fails at .predict():
ValueError: If your data is in the form of symbolic tensors, you should specify the `steps` argument (instead of the `batch_size` argument, because symbolic tensors are expected to produce batches of input data).
Seems simple enough but got pretty stuck, any idea what the correct way to obtain the gradients is?
The problem is that,
probs = agent.predict_model.predict(state_memory)
Produces a numpy tensor as the output. And you cannot get gradients w.r.t numpy tensors. Instead you need a tf.Tensor from your model. For that, do the following.
with tf.GradientTape() as tape:
probs = agent.predict_model(state_memory)
### fails below ###
grads = tape.gradient(probs, agent.predict_model.trainable_weights)
I'm implementing a basic RNN composed of a 512 units GRU and a dense layer using Keras:
model = Sequential()
model.add(GRU(units=512,
return_sequences=True,
input_shape=(None, num_x_signals,)))
model.add(Dense(num_y_signals, activation='sigmoid'))
I needed to generate input batches on the fly so I used fit_generator :
model.fit_generator(generator=generator_train, epochs=NB_EPOCHS, steps_per_epoch=STEPS_PER_EPOCH,
validation_data=generator_test, validation_steps=900, callbacks=callbacks)
And here is how I define my batch generator :
SAMPLE_PERIOD_PER_INPUT = 1728
PERIOD_TO_PREDICT = 288
BATCH_SIZE = 64
def batch_generator(batch_size, sequence_length, train = True):
while True:
x_shape = (batch_size, sequence_length, num_x_signals)
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
y_shape = (batch_size, PERIOD_TO_PREDICT, num_y_signals)
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
for i in range(batch_size):
if train:
idx = np.random.randint(num_train - sequence_length)
predict_idx = (idx + sequence_length) - PERIOD_TO_PREDICT
x_batch[i] = x_train_scaled[idx:idx+sequence_length]
y_batch[i] = y_train_scaled[predict_idx:idx+sequence_length]
else:
idx = np.random.randint(num_test - sequence_length)
predict_idx = (idx + sequence_length) - PERIOD_TO_PREDICT
x_batch[i] = x_test_scaled[idx:idx+sequence_length]
y_batch[i] = y_test_scaled[predict_idx:idx+sequence_length]
yield (x_batch, y_batch)
generator_train = batch_generator(batch_size=BATCH_SIZE, sequence_length=SAMPLE_PERIOD_PER_INPUT)
generator_test = batch_generator(batch_size=BATCH_SIZE, sequence_length=SAMPLE_PERIOD_PER_INPUT, train = False)
I also use a "custom" loss function because I need to ignore the first computed sequence which is supposed to not be accurate :
warmup_steps = 50
def loss_mse_warmup(y_true, y_pred):
y_true_slice = y_true[:, warmup_steps:, :]
y_pred_slice = y_pred[:, warmup_steps:, :]
loss = tf.losses.mean_squared_error(labels=y_true_slice,
predictions=y_pred_slice)
loss_mean = tf.reduce_mean(loss)
return loss_mean
optimizer = RMSprop(lr=1e-3)
model.compile(loss=loss_mse_warmup, optimizer=optimizer)
Here is the summary of my model :
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru (GRU) (None, None, 512) 798720
_________________________________________________________________
dense (Dense) (None, None, 1) 513
=================================================================
Total params: 799,233
Trainable params: 799,233
Non-trainable params: 0
_________________________________________________________________
But when I run this it says that there shape errors :
2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [64,238,1] vs. [64,1678,1]
[[{{node loss_4/dense_loss/mean_squared_error/SquaredDifference}}]]
[[loss_4/mul/_167]]
(1) Invalid argument: Incompatible shapes: [64,238,1] vs. [64,1678,1]
[[{{node loss_4/dense_loss/mean_squared_error/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.
Any ideas why ? Where did I write something wrong ?
I want to leverage google's AI-platform to deploy my keras model, which requires the model to be in a tensorflow SavedModel format. I am saving a keras model to a tensorflow estimator model, and then exporting this estimator model. I run into issues in defining my serving_input_receiver_fn.
Here is a summary of my model:
Model: "model_49"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_49 (InputLayer) [(None, 400, 254)] 0
_________________________________________________________________
gru_121 (GRU) (None, 400, 64) 61248
_________________________________________________________________
gru_122 (GRU) (None, 64) 24768
_________________________________________________________________
dropout_73 (Dropout) (None, 64) 0
_________________________________________________________________
1M (Dense) (None, 1) 65
=================================================================
Total params: 86,081
Trainable params: 86,081
Non-trainable params: 0
_________________________________________________________________
and here is the error I run into:
KeyError: "The dictionary passed into features does not have the expected
inputs keys defined in the keras model.\n\tExpected keys:
{'input_49'}\n\tfeatures keys: {'col1','col2', ..., 'col254'}
Below is my code.
def serving_input_receiver_fn():
feature_placeholders = {
column.name: tf.placeholder(tf.float64, [None]) for column in INPUT_COLUMNS
}
# feature_placeholders = {
# 'input_49': tf.placeholder(tf.float64, [None])
# }
features = {
key: tf.expand_dims(tensor, -1)
for key, tensor in feature_placeholders.items()
}
return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)
def run():
h5_model_file = '../models/model2.h5'
json_model_file = '../models/model2.json'
model = get_keras_model(h5_model_file, json_model_file)
print(model.summary())
estimator_model = tf.keras.estimator.model_to_estimator(keras_model=model, model_dir='estimator_model')
export_path = estimator_model.export_saved_model('export',
serving_input_receiver_fn=serving_input_receiver_fn)
It seems that my model expects a single feature key: input_49 (first layer of my neural network), however, from the code samples I've seen for example, the serving_receiver_input_fn feeds a dict of all features into my model.
How can I resolve this?
I am using tensorflow==2.0.0-beta1.
I've managed to save a Keras model and host it using TF Serving using the tf.saved_model.Builder() object. I'm not sure if this can be easily generalized to your application, but below is what worked for me, made as general as I can make it.
# Set the path where the model will be saved.
export_base_path = os.path.abspath('models/versions/')
model_version = '1'
export_path = os.path.join(tf.compat.as_bytes(export_base_path),
tf.compat.as_bytes(model_version))
# Make the model builder.
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
# Define the TensorInfo protocol buffer objects that encapsulate our
# input/output tensors.
# Note you can have a list of model.input layers, or just a single model.input
# without any indexing. I'm showing a list of inputs and a single output layer.
# Input tensor info.
tensor_info_input0 = tf.saved_model.utils.build_tensor_info(model.input[0])
tensor_info_input1 = tf.saved_model.utils.build_tensor_info(model.input[1])
# Output tensor info.
tensor_info_output = tf.saved_model.utils.build_tensor_info(model.output)
# Define the call signatures used by the TF Predict API. Note the name
# strings here should match what the layers are called in your model definition.
# Might have to play with that because I forget if it's the name parameter, or
# the actual object handle in your code.
prediction_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={'input0': tensor_info_input0, 'input1': tensor_info_input1},
outputs={'prediction': tensor_info_output},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
# Now we build the SavedModel protocol buffer object and then save it.
builder.add_meta_graph_and_variables(sess,
[tf.saved_model.tag_constants.SERVING],
signature_def_map={'predict': prediction_signature})
builder.save(as_text=True)
I will try to find the references that got me here, but I failed to make a note of them at the time. I'll update with links when I find them.
I ended up changing the following:
feature_placeholders = {
column.name: tf.placeholder(tf.float64, [None]) for column in INPUT_COLUMNS
}
to this:
feature_placeholders = {
'input_49': tf.placeholder(tf.float32, (254, None), name='input_49')
}
and I was able to get a folder with my saved_model.pb.