Merge multiple CNN models - tensorflow

I am trying to implement the paper Sarcasm Detection Using Deep Learning With Contextual Features.
This is the CNN architecture I'm trying to implement here:
This text is from the Paper itself that describes the layers:
The CNN architecture in Figure 5 is shown in a top-down manner
starting from the start (top) to the finish (bottom) node. ‘‘NL’’
stands for N-gram Length. The breakdown is:
An input layer of size 1 × 100 × N where N is the number of instances from the dataset. Vectors of embedded-words are used as the
initial input.
Then the layers between the input and the concatenation is introduced:
One convolutional layer with 200 neurons to receive and filter size 1 × 100 × N where N is the number of instances from the dataset.
The stride is [1 1].
Two convolutional layer with 200 neurons to receive and filter size 1 × 100 × 200. The stride is [1 1].
Three batch normalization with 200 channels.
Three ReLU activation layers.
Three dropout layers with 20 percent dropout.
A max pooling layer with stride [1 1].
A depth concatenation layer to concatenate all the last max pooling layers.
A fully connected layer with ten neurons.
The code that I have tried so far is here.
model1 = Input((train_vector1.shape[1:]))
#1_1
model1 = Conv1D(200, filters=train_vector1.shape[0], kernel_size=(1, 100), strides = 1, activation = "relu")(model1)
model1 = BatchNormalization(200)(model1)
model1 = Dropout(0.2)(model1)
#1_2
model1 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model1)
model1 = BatchNormalization(200)(model1)
model1 = Dropout(0.2)(model1)
#1_3
model1 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model1)
model1 = BatchNormalization(200)(model1)
model1 = Dropout(0.2)(model1)
model1 = MaxPooling1D(strides=1)(model1)
model1 = Flatten()(model1)
## Second Part
model2 = Input((train_vector1.shape[1:]))
#2_1
model2 = Conv1D(200, filters=train_vector1.shape[0], kernel_size=(1, 100), strides = 1, activation = "relu")(model2)
model2 = BatchNormalization(200)(model2)
model2 = Dropout(0.2)(model2)
#2_2
model2 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model2)
model2 = BatchNormalization(200)(model2)
model2 = Dropout(0.2)(model2)
#2_3
model2 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model2)
model2 = BatchNormalization(200)(model2)
model2 = Dropout(0.2)(model2)
model2 = MaxPooling1D(strides=1)(model2)
model2 = Flatten()(model2)
## Third Part
model3 = Input((train_vector1.shape[1:]))
#3_1
model3 = Conv1D(200, filters=train_vector1.shape[0], kernel_size=(1, 100), strides = 1, activation = "relu")(model3)
model3 = BatchNormalization(200)(model3)
model3 = Dropout(0.2)(model3)
#3_2
model3 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model3)
model3 = BatchNormalization(200)(model3)
model3 = Dropout(0.2)(model3)
#3_3
model3 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model3)
model3 = BatchNormalization(200)(model3)
model3 = Dropout(0.2)(model3)
model3 = MaxPooling1D(strides=1)(model3)
model3 = Flatten()(model3)
concat_model = Concatenate()([model1, model2, model3])
output = Dense(10, activation='sigmoid')
I just want to know if my implementation is correct here, or am I misinterpreting something?
Am I understanding what the author is trying to do here?

From that image I think that the input could be shared among the other layers.
In that case you would have:
input = Input((train_vector1.shape[1:]))
model1 = Conv1D(...)(input)
# ...
model1 = Flatten()(model1)
model2 = Conv1D(...)(input)
# ...
model2 = Flatten()(model2)
model3 = Conv1D(...)(input)
# ...
model3 = Flatten()(model3)
concat_model = Concatenate()([model1, model2, model3])
output = Dense(10, activation='sigmoid')
Also most probably the convolutions are not 1D but 2D. You can get confirmation of it from the fact that it says:
The stride is [1 1]
Se we are in two dimensions. Same for MaxPooling.
Also you said:
when I run this code, it say too many arguments for "filters". Am I
doing anything wrong here?
Let's take:
model1 = Conv1D(200, filters=train_vector1.shape[0], kernel_size=(1, 100), strides = 1, activation = "relu")(model1)
The Conv1D function accepts this arguments (full documentation):
tf.keras.layers.Conv1D(
filters,
kernel_size,
strides=1,
...
)
It says too many arguments because you are trying to write the number of neurons of the Convolutional layer, but there is simply no argument for that, so you don't have to. The number of neurons depends on the other parameters you set.
Same thing also for BatchNormalization. From the docs:
tf.keras.layers.BatchNormalization(
axis=-1,
momentum=0.99,
...
)
There is no "number of neurons" argument.

Related

Adding Luong attention Layer to CNN

I'm using keras to implement a functional CNN model where I have images with the size of 64x64x1. with 6 convolutional layer like this :
num_classes = 5
def get_model():
##creating CNN functional api for learning
input_ = keras.layers.Input(shape=[64, 64,1])
##first layer of convolutional layer
Conv1 = keras.layers.Conv2D(32, kernel_size=5,activation=tf.nn.relu)(input_layer)
#second convolutional layer
Conv12 = keras.layers.Conv2D(32, kernel_size=5,activation=tf.nn.relu)(Conv1)
#third convolutional layer
Conv13 = keras.layers.Conv2D(32, kernel_size=5,activation=tf.nn.relu)(Conv12)
##i add max pooling with a stride of 2
Max1 = keras.layers.MaxPool2D(2, strides=2)(Conv13)
##i add a second layer of convlutional layer
Conv2 = keras.layers.Conv2D(64, kernel_size=5,activation=tf.nn.relu)(Max1)
##adding second convolutional layer
Conv21 = keras.layers.Conv2D(64, kernel_size=5,activation=tf.nn.relu)(Conv2)
##adding third convolutional layer
Conv23 = keras.layers.Conv2D(64, kernel_size=5,activation=tf.nn.relu)(Conv21)
##i add another layer of max pooling
Max2 = keras.layers.MaxPool2D(2, strides=2)(Conv23)
##here i execute data flatting, i will change this to use attention layer Att.
Flat = keras.layers.Flatten()(Max2)
#i add another dense architecture
Dense1= keras.layers.Dense(2048,activation=tf.nn.relu)(Flat)
Dense2= keras.layers.Dense(700,activation=tf.nn.relu)(Dense1)
#i add now the output layer with softmax
# Output layer, class prediction.
output = keras.layers.Dense(num_classes,activation=tf.nn.softmax)(Dense2)
model = Model(inputs=input_, outputs=output)
##end of creating CNN using functional api
##defining loss function and training data and epoche. I modify the optimizer to rmsprop
optimize_rmsprop = keras.optimizers.RMSprop(learning_rate=0.001, epsilon=1e-08, decay=0.0)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer_rmsprop,metrics=["accuracy"])
###i return the model
return model
to get better performance i want to add this attention layer to the above CNN :
#already imported
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from keras.layers import Dropout
# Variable-length int sequences. 64*64*1
query_input = tf.keras.Input(shape=(4096,), dtype='int32')
value_input = tf.keras.Input(shape=(4096,), dtype='int32')
# Embedding lookup.
token_embedding = tf.keras.layers.Embedding(input_dim=1000, output_dim=64)
# Query embeddings of shape [batch_size, Tq, dimension].
query_embeddings = token_embedding(query_input)
# Value embeddings of shape [batch_size, Tv, dimension].
value_embeddings = token_embedding(value_input)
# CNN layer.
cnn_layer = tf.keras.layers.Conv1D(
filters=100,
kernel_size=4,
# Use 'same' padding so outputs have the same shape as inputs.
padding='same')
# Query encoding of shape [batch_size, Tq, filters].
query_seq_encoding = cnn_layer(query_embeddings)
# Value encoding of shape [batch_size, Tv, filters].
value_seq_encoding = cnn_layer(value_embeddings)
# Query-value attention of shape [batch_size, Tq, filters].
query_value_attention_seq = tf.keras.layers.Attention()(
[query_seq_encoding, value_seq_encoding])
# Reduce over the sequence axis to produce encodings of shape
# [batch_size, filters].
query_encoding = tf.keras.layers.GlobalAveragePooling1D()(
query_seq_encoding)
query_value_attention = tf.keras.layers.GlobalAveragePooling1D()(
query_value_attention_seq)
# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
[query_encoding, query_value_attention])
but the problem is i don't know how to link the attention layer with my cnn model because when i connect the first convolutional network with the attention layer like this :
Conv1 = keras.layers.Conv2D(32, kernel_size=5)(input_layer)
I get this error :
ValueError: Input 0 of layer "conv2d" is incompatible with the layer: expected min_ndim=4, found ndim=2. Full shape received: (None, 200)
can someone show me how to add an attention layer to the CNN model.
Updated Answer...
keras.backend.clear_session()
class AttentionLayer(tf.keras.layers.Layer):
def __init__(self,
output_dims):
super(AttentionLayer, self).__init__()
self.output_dims = output_dims
self.embeddings = tf.keras.layers.Embedding(input_dim=4096, output_dim = output_dims)
self.conv = tf.keras.layers.Conv1D(2048, 4 , padding='same')
self.attn_layer = tf.keras.layers.Attention()
self.global_pooling_1 = tf.keras.layers.GlobalAveragePooling1D()
self.global_pooling_2 = tf.keras.layers.GlobalAveragePooling1D()
def call (self, query_input, value_input):
batch_size = tf.shape(query_input)[0]
query_input = tf.reshape(query_input, (batch_size, 4096))
value_input = tf.reshape(value_input, (batch_size, 4096))
# Query embeddings of shape [batch_size, Tq, dimension].
query_embeddings = self.embeddings(query_input)
# Value embeddings of shape [batch_size, Tv, dimension].
value_embeddings = self.embeddings(value_input)
# Query encoding of shape [batch_size, Tq, filters].
query_seq_encoding = self.conv(query_embeddings)
# Value encoding of shape [batch_size, Tv, filters].
value_seq_encoding = self.conv(value_embeddings)
# Query-value attention of shape [batch_size, Tq, filters].
query_value_attention_seq = self.attn_layer(
[query_seq_encoding, value_seq_encoding])
# Reduce over the sequence axis to produce encodings of shape
# [batch_size, filters].
query_encoding = self.global_pooling_1(
query_seq_encoding)
query_value_attention = self.global_pooling_2(
query_value_attention_seq)
# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
[query_encoding, query_value_attention])
input_layer = tf.reshape(input_layer , (batch_size, 64,64 ,1))
return input_layer
keras.backend.clear_session()
num_classes = 5
class Model(tf.keras.Model):
def __init__(self):
super(Model, self).__init__()
self.attn_layer = AttentionLayer(64)
##first layer of convolutional layer
self.Conv1 = keras.layers.Conv2D(32, kernel_size=5,activation=tf.nn.relu)
#second convolutional layer
self.Conv12 = keras.layers.Conv2D(32, kernel_size=5,activation=tf.nn.relu)
#third convolutional layer
self.Conv13 = keras.layers.Conv2D(32, kernel_size=5,activation=tf.nn.relu)
##i add max pooling with a stride of 2
self.Max1 = keras.layers.MaxPool2D(2, strides=2)
##i add a second layer of convlutional layer
self.Conv2 = keras.layers.Conv2D(64, kernel_size=5,activation=tf.nn.relu)
##adding second convolutional layer
self.Conv21 = keras.layers.Conv2D(64, kernel_size=5,activation=tf.nn.relu)
##adding third convolutional layer
self.Conv23 = keras.layers.Conv2D(64, kernel_size=5,activation=tf.nn.relu)
##i add another layer of max pooling
self.Max2 = keras.layers.MaxPool2D(2, strides=2)
##here i execute data flatting, i will change this to use attention layer Att.
self.Flat = keras.layers.Flatten()
#i add another dense architecture
self.Dense1= keras.layers.Dense(2048,activation=tf.nn.relu)
self.Dense2= keras.layers.Dense(700,activation=tf.nn.relu)
#i add now the output layer with softmax
# Output layer, class prediction.
self.outputs = keras.layers.Dense(num_classes,activation=tf.nn.softmax)
def call(self, x):
x = self.attn_layer(x , x)
x = self.Conv1(x)
x = self.Conv12(x)
x = self.Conv13(x)
x = self.Max1(x)
x = self.Conv2(x)
x = self.Conv21(x)
x = self.Conv23(x)
x = self.Max2(x)
x = self.Flat(x)
x = self.Dense1(x)
x = self.Dense2(x)
return self.outputs(x)
model = Model()
x = np.random.randint(0 ,2 , size = (8, 64, 64,1))
y = np.random.randint(0,5, size=(8,1))
print(model(x).shape)
optimize_rmsprop = keras.optimizers.RMSprop(learning_rate=0.001, epsilon=1e-08, decay=0.0)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimize_rmsprop,metrics=["accuracy"])
model.fit(x, y, epochs=1)
Output:

Input layer 0 of sequence is incompatible with the layer - CNNs

I am trying to create a CNN model using hyperparameterization for image classification. When I run the code I receive the following error:
ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 32, 32, 32, 3), found shape=(32, 32, 32, 3)
How to fix the error? Here is the whole code pasted below:
# first we create our actual code which requires the arguments, units, activation, dropout, lr:
def build_model(hp):
model = ks.Sequential([
# adding first conv2d layer
ks.layers.Conv2D(
#Let's tune the filters, kernel_size, activation function.
filters = hp.Int("conv_1_filter", min_value=1,max_value=100, step = 16),
kernel_size = hp.Choice("conv_1_kernel", values = [3,5]),
activation = hp.Choice("conv_1_activation", ["relu", "tanh", "softmax"]),
input_shape = (32,32,32,3)
),
# adding second conv2d layer
ks.layers.Conv2D(
#Let's tune the filters, kernel_size, activation function.
filters = hp.Int("conv_2_filter", min_value=1,max_value=50, step = 16),
kernel_size = hp.Choice("conv_2_kernel", values = [3,5]),
activation = hp.Choice("conv_2_activation", ["relu", "tanh", "softmax"]),
input_shape = (32,32,32,3)
)])
model.add(layers.Flatten())
# Let's tune the number of Dense layers.
for i in range(hp.Int("num_dense_layers", 1, 3)):
model.add(
layers.Dense(
# Let's tune the number of units separately
units = hp.Int(f"units_{i}", min_value=1, max_value = 100, step = 16),
activation = hp.Choice("activation", ["relu", "tanh", "softmax"])
))
if hp.Boolean("dropout"):
model.add(layers.Dropout(rate = 0.25))
model.add(layers.Dense(10, activation = "softmax"))
learning_rate = hp.Float("lr", min_value = 1e-4, max_value = 1e-2, sampling="log")
model.compile(
optimizer = ks.optimizers.Adam(learning_rate = learning_rate),
loss = "categorical_crossentropy",
metrics = ["accuracy"]
)
return model
build_model(keras_tuner.HyperParameters())
You are getting this error due the input shape mismatch.
Here i have implemented the hypermodel on the mnist fashion dataset which contains images of shape (28,282,1).
def build_model(hp):
model = tf.keras.Sequential([
tf.keras.Input(shape=(28,28,1)),
# adding first conv2d layer
tf.keras.layers.Conv2D(
#Let's tune the filters, kernel_size, activation function.
filters = hp.Int("conv_1_filter", min_value=1,max_value=100, step = 16),
kernel_size = hp.Choice("conv_1_kernel", values = [3,5]),
activation = hp.Choice("conv_1_activation", ["relu", "tanh", "softmax"]),
input_shape = (28,28,1)
),
tf.keras.layers.MaxPooling2D(
pool_size=hp.Choice('pooling_1',values=[2,3])),
# adding second conv2d layer
tf.keras.layers.Conv2D(
#Let's tune the filters, kernel_size, activation function.
filters = hp.Int("conv_2_filter", min_value=1,max_value=50, step = 16),
kernel_size = hp.Choice("conv_2_kernel", values = [3,5]),
activation = hp.Choice("conv_2_activation", ["relu", "tanh", "softmax"]),
input_shape = (28,28,1)
)])
tf.keras.layers.MaxPooling2D(
pool_size=hp.Choice('pooling_2',values=[2,3])),
model.add(tf.keras.layers.Flatten())
if hp.Boolean("dropout"):
model.add(tf.keras.layers.Dropout(rate = 0.25))
model.add(tf.keras.layers.Dense(10, activation = "softmax"))
learning_rate = hp.Float("lr", min_value = 1e-4, max_value = 1e-2, sampling="log")
model.compile(
optimizer = tf.keras.optimizers.Adam(learning_rate = learning_rate),
loss = "categorical_crossentropy",
metrics = ["accuracy"]
)
return model
By providing the correct shape you will not get any error.
For more details, Please refer to this gist and this documentation. Thank You!

Tensorflow2.0 reuse layers

def head(self, input, num_anchors, name, flatten=False):
out_channels = (self.num_classes + 4) * num_anchors
conv = layers.Conv2D(256, 3, 1, 'same', activation='relu', name=name+'_conv1')(input)
conv = layers.Conv2D(256, 3, 1, 'same', activation='relu', name=name+'_conv2')(conv)
conv = layers.Conv2D(256, 3, 1, 'same', activation='relu', name=name+'_conv3')(conv)
out = layers.Conv2D(out_channels, 3, 1, 'same', name=name+'output')(conv)
if flatten is True:
batch_size = tf.shape(out)[0]
out = tf.reshape(out, [batch_size, -1, num_anchors, self.num_classes+4])
out = tf.reshape(out, [batch_size, -1, self.num_classes+4])
return out
I want to know how to reuse these layers as tf.variable_scope(scope resue=tf.AUTO_REUSE) in tensorflow1
In tensorflow1
with tf.variable_scope('', resue=tf.AUTO_REUSE) as scope:
all layers here could be auto reuse
You can reuse the layers by just having a common reference. I have attached a sample code below. I am making use of a variable named as common_layer which will be used in three separate models (sequential and functional). The first model is trained and after that we subtract the weights from common_layer of all the three models. It proves that what changes happen in first model's layer gets reflected in other model's common layer.
import tensorflow as tf
import numpy as np
common_layer = tf.keras.layers.Dense(100, name='common_layer')
model1 = tf.keras.models.Sequential([
tf.keras.layers.Input((100)),
common_layer,
tf.keras.layers.Dense(1)
])
model2 = tf.keras.models.Sequential([
tf.keras.layers.Input((100)),
common_layer,
tf.keras.layers.Dense(10)
])
input_layer = tf.keras.layers.Input((100))
output_layer = common_layer(input_layer)
output_layer = tf.keras.layers.Dense(20)(output_layer)
model3 = tf.keras.Model(inputs=[input_layer], outputs=[output_layer])
model1.compile('adam', loss='mse')
model1.fit(np.random.rand(128, 100), np.random.rand(128, 1))
weights1 = model1.get_weights()[0]
weights2 = model2.get_weights()[0]
weights3 = model3.get_weights()[0]
print(np.sum(weights1 - weights2)) # 0.0
print(np.sum(weights1 - weights3)) # 0.0

Neural network output issue

I built a neural network with tensorflow, here the code :
class DQNetwork:
def __init__(self, state_size, action_size, learning_rate, name='DQNetwork'):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
with tf.variable_scope(name):
# We create the placeholders
self.inputs_ = tf.placeholder(tf.float32, shape=[state_size[1], state_size[0]], name="inputs")
self.actions_ = tf.placeholder(tf.float32, [None, self.action_size], name="actions_")
# Remember that target_Q is the R(s,a) + ymax Qhat(s', a')
self.target_Q = tf.placeholder(tf.float32, [None], name="target")
self.fc = tf.layers.dense(inputs = self.inputs_,
units = 50,
kernel_initializer=tf.contrib.layers.xavier_initializer(),
activation = tf.nn.elu)
self.output = tf.layers.dense(inputs = self.fc,
units = self.action_size,
kernel_initializer=tf.contrib.layers.xavier_initializer(),
activation=None)
# Q is our predicted Q value.
self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_))
# The loss is the difference between our predicted Q_values and the Q_target
# Sum(Qtarget - Q)^2
self.loss = tf.reduce_mean(tf.square(self.target_Q - self.Q))
self.optimizer = tf.train.AdamOptimizer(self.learning_rate).minimize(self.loss)
But i have an issue with the output,
the output should normaly be at the same size than "action_size", and action_size value is 3
but i got an output like [[5][3]] instead of just [[3]] and i realy don't understand why...
This network got 2 dense layers, one with 50 perceptrons and the other with 3 perceptrons (= action_size).
state_size is format : [[9][5]]
If someone know why my output is two dimensions i will be very thankful
Your self.inputs_ placeholder has shape (5, 9). You perform the matmul(self.inputs_, fc1.w) operation in dense layer fc1 which has shape (9, 50) and it results in shape (5, 50). You then apply another dense layer with shape (50, 3) which results in output shape (5, 3).
The same schematically:
matmul(shape(5, 9), shape(9, 50)) ---> shape(5, 50) # output of 1st dense layer
matmul(shape(5, 50), shape(50, 3)) ---> shape(5, 3) # output of 2nd dense layer
Usually, the first dimension of the input placeholder represents batch size and the second dimension is the dimension of inputs feature vector. So for each sample in a batch you (batch size is 5 in your case) you get the output shape 3.
To get probabilities, use this:
import tensorflow as tf
import numpy as np
inputs_ = tf.placeholder(tf.float32, shape=(None, 9))
actions_ = tf.placeholder(tf.float32, shape=(None, 3))
fc = tf.layers.dense(inputs=inputs_, units=2)
output = tf.layers.dense(inputs=fc, units=3)
reduced = tf.reduce_mean(output, axis=0)
probs = tf.nn.softmax(reduced) # <--probabilities
inputs_vals = np.ones((5, 9))
actions_vals = np.ones((1, 3))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(probs.eval({inputs_:inputs_vals,
actions_:actions_vals}))
# [0.01858923 0.01566187 0.9657489 ]

Why doesn't connect to the expected layer in Keras model

I want to construct a variational autoencoder in Keras (2.2.4, with TensorFlow backend), here is my code:
dims = [1000, 256, 64, 32]
x_inputs = Input(shape=(dims[0],), name='inputs')
h = x_inputs
# internal layers in encoder
for i in range(n_stacks-1):
h = Dense(dims[i + 1], activation='relu', kernel_initializer='glorot_uniform', name='encoder_%d' % i)(h)
# hidden layer
z_mean = Dense(dims[-1], kernel_initializer='glorot_uniform', name='z_mean')(h)
z_log_var = Dense(dims[-1], kernel_initializer='glorot_uniform', name='z_log_var')(h)
z = Lambda(sampling, output_shape=(dims[-1],), name='z')([z_mean, z_log_var])
encoder = Model(inputs=x_inputs, outputs=z, name='encoder')
encoder_z_mean = Model(inputs=x_inputs, outputs=z_mean, name='encoder_z_mean')
# internal layers in decoder
latent_inputs = Input(shape=(dims[-1],), name='latent_inputs')
h = latent_inputs
for i in range(n_stacks-1, 0, -1):
h = Dense(dims[i], activation='relu', kernel_initializer='glorot_uniform', name='decoder_%d' % i)(h)
# output
outputs = Dense(dims[0], activation='relu', kernel_initializer='glorot_uniform' name='mean')
decoder = Model(inputs=latent_inputs, outputs=outputs, name='decoder')
ae_output = decoder(encoder_z_mean(x_inputs))
ae = Model(inputs=x_inputs, outputs=ae_output, name='ae')
ae.summary()
vae_output = decoder(encoder(x_inputs))
vae = Model(inputs=x_inputs, outputs=vae_output, name='vae')
vae.summary()
The problem is I can print the summary of the "ae" and "vae" models, but when I train the ae model, it says
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'latent_inputs' with dtype float and shape [?,32]
In the model "decoder" is supposed to connect to the output of "encoder_z_mean" layer in the ae model. But when I print the summary of the "ae" model, "decoder" is actually connected to "encoder_z_mean[1][0]". Should it be "encoder_z_mean[0][0]"?
A few corrections:
x_inputs is already the input of the encoders, don't call it again with encoder_z_mean(x_inputs) or with encoder(x_inputs)
Besides creating a second node (the 1 that you are worried with, and that is not a problem), it may be the source of the error because it's not an extra input, but the same input
A healthy usage of this would need the creation of a new Input(...) tensor to be called
The last Dense layer is not being called on a tensor. You probably want (h) there.
Do it this way:
# output - called h in the last layer
outputs = Dense(dims[0], activation='relu', kernel_initializer='glorot_uniform' name='mean')(h)
#unchanged
decoder = Model(inputs=latent_inputs, outputs=outputs, name='decoder')
#adjusted inputs
ae_output = decoder(encoder_z_mean.output)
ae = Model(encoder_z_mean.input, ae_output, name='ae')
ae.summary()
vae_output = decoder(encoder.output)
vae = Model(encoder.input, vae_output, name='vae')
vae.summary()
It's possible that the [1][0] still occurs with the decoder, but this is not a problem at all. It means that the decoder itself has its own input node (number 0), and you created an extra input node (number 1) when you called it with the output of another model. This is harmless. The node 1 will be used while node 0 will be ignored.