Keras Model fit throws shape mismatch error - tensorflow

I am building a Siamese network using Keras(TensorFlow) where the target is a binary column, i.e., match or mismatch(1 or 0). But the model fit method throws an error saying that the y_pred is not compatible with the y_true shape. I am using the binary_crossentropy loss function.
Here is the error I see:
Here is the code I am using:
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[tf.keras.metrics.Recall()])
history = model.fit([X_train_entity_1.todense(),X_train_entity_2.todense()],np.array(y_train),
epochs=2,
batch_size=32,
verbose=2,
shuffle=True)
My Input data shapes are as follows:
Inputs:
X_train_entity_1.shape is (700,2822)
X_train_entity_2.shape is (700,2822)
Target:
y_train.shape is (700,1)
In the error it throws, y_pred is the variable which was created internally. What is y_pred dimension is 2822 when I am having a binary target. And 2822 dimension actually matches the input size, but how do I understand this?
Here is the model I created:
in_layers = []
out_layers = []
for i in range(2):
input_layer = Input(shape=(1,))
embedding_layer = Embedding(embed_input_size+1, embed_output_size)(input_layer)
lstm_layer_1 = Bidirectional(LSTM(1024, return_sequences=True,recurrent_dropout=0.2, dropout=0.2))(embedding_layer)
lstm_layer_2 = Bidirectional(LSTM(512, return_sequences=True,recurrent_dropout=0.2, dropout=0.2))(lstm_layer_1)
in_layers.append(input_layer)
out_layers.append(lstm_layer_2)
merge = concatenate(out_layers)
dense1 = Dense(256, activation='relu', kernel_initializer='he_normal', name='data_embed')(merge)
drp1 = Dropout(0.4)(dense1)
btch_norm1 = BatchNormalization()(drp1)
dense2 = Dense(32, activation='relu', kernel_initializer='he_normal')(btch_norm1)
drp2 = Dropout(0.4)(dense2)
btch_norm2 = BatchNormalization()(drp2)
output = Dense(1, activation='sigmoid')(btch_norm2)
model = Model(inputs=in_layers, outputs=output)
model.summary()
Since my data is very sparse, I used todense. And there the type is as follows:
type(X_train_entity_1) is scipy.sparse.csr.csr_matrix
type(X_train_entity_1.todense()) is numpy.matrix
type(X_train_entity_2) is scipy.sparse.csr.csr_matrix
type(X_train_entity_2.todense()) is numpy.matrix
Summary of last few layers as follows:

Mismatched shape in the Input layer. The input shape needs to match the shape of a single element passed as x, or dataset.shape[1:]. So since your dataset size is (700,2822), that is 700 samples of size 2822. So your input shape should be 2822.
Change:
input_layer = Input(shape=(1,))
To:
input_layer = Input(shape=(2822,))

You need to set return_sequences in the lstm_layer_2 to False:
lstm_layer_2 = Bidirectional(LSTM(512, return_sequences=False, recurrent_dropout=0.2, dropout=0.2))(lstm_layer_1)
Otherwise, you will still have the timesteps of your input. That is why you have the shape (None, 2822, 1). You can also add a Flatten layer prior to your output layer, but I would recommend setting return_sequences=False.
Note that a Dense layer computes the dot product between the inputs and the kernel along the last axis of the inputs.

Related

How to pass bert embeddings to an LSTM layer

I want to do sentiment analysis using bert-embedding and lstm layer.
This is my code:
i = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
x = bert_preprocess(i)
x = bert_encoder(x)
x = tf.keras.layers.Dropout(0.2, name="dropout")(x['pooled_output'])
x = tf.keras.layers.LSTM(128, dropout=0.2)(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)
x = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(x)
model = tf.keras.Model(i, x)
When compiling this code I got the following error:
ValueError: Input 0 of layer "lstm_2" is incompatible with the layer: expected
ndim=3, found ndim=2. Full shape received: (None, 768)
Is the logic of my code correct? Can anyone please correct my code?
From bert like models you can expect generally three kinds of outputs (taken from huggingface's TFBertModel documentation)
last_hidden_state with shape (batch_size, sequence_length, hidden_size)
pooler_output with shape (batch_size, hidden_size)
hidden_states with shape (batch_size, sequence_length, hidden_size)
hidden_size is 768 above..
As the error says, the output from dropout layer lacks 3 dimensions (essentially the bert_encoder layer because dropout layers do not change tensor shape) and has only 2 dimensions.
x = bert_encoder(x)
x = tf.keras.layers.Dropout(0.2, name="dropout")(x['pooled_output'])
x = tf.keras.layers.LSTM(128, dropout=0.2)(x)
So if you are planning to use an LSTM layer after the bert_encoder layer, you would need a three dimensional input to the LSTM in the form of (batch_size, num_timesteps, num_features) hence you would have to use either the hidden_states or the last_hidden_state outputs instead of pooler_output.
You will have to choose between the two depending on your objective/use-case.

Given an input and get two outputs

I am using Tensorflow 2.4.0 and Ubuntu 16.04
Given this model
base_model = tf.keras.applications.EfficientNetB0(weights="imagenet", include_top=False)
base_model.trainable = base_model_trainable
inputs = tf.keras.Input(shape=(IMG_SIZE,IMG_SIZE,3), name="input")
x = tf.keras.applications.efficientnet.preprocess_input(inputs)
# more details - https://www.tensorflow.org/tutorials/images/transfer_learning#important_note_about_batchnormalization_layers
x = base_model(x,
training=False) # training=training is needed only if there are layers with different behavior during training versus inference (e.g. Dropout)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.Dense(len(class_names), name="output")(x)
model = tf.keras.Model(inputs, outputs)
My objective is given the input to get the value of model.get_layer("efficientnetb0").output,
and model.output
if I do (a) (it does not give model.get_layer("efficientnetb0").output)
layer_name="efficientnetb0"
grad_model = tf.keras.models.Model([model.inputs], [model.output])
or (b) (it requires an additional input after preprocessing)
grad_model = tf.keras.models.Model([model.inputs, model.get_layer(layer_name).input], [model.get_layer(layer_name).output, model.output])
both works
but if I do
grad_model = tf.keras.models.Model([model.inputs], [model.get_layer(layer_name).output, model.output])
It gives exception:
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, None, None, 3), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "rescaling". The following previous layers were accessed without issue: []
What is the reason, and is there a way for me to pass in just an input and get two outputs?

How can I concatenate Tensorflow Dataset columns?

I have a Keras model that takes an input layer with shape (n, 288, 1), of which 288 is the number of features. I am using a TensorFlow dataset tf.data.experimental.make_batched_features_dataset and my input layer will be (n, 1, 1) which means it gives one feature to the model at a time. How can I make an input tensor with the shape of (n, 288, 1)? I mean how can I use all my features in one tensor?
Here is my code for the model:
def _gzip_reader_fn(filenames):
"""Small utility returning a record reader that can read gzip'ed files."""
return tf.data.TFRecordDataset(filenames, compression_type='GZIP')
def _input_fn(file_pattern, tf_transform_output, batch_size):
"""Generates features and label for tuning/training.
Args:
file_pattern: input tfrecord file pattern.
tf_transform_output: A TFTransformOutput.
batch_size: representing the number of consecutive elements of returned
dataset to combine in a single batch
Returns:
A dataset that contains (features, indices) tuple where features is a
dictionary of Tensors, and indices is a single Tensor of label indices.
"""
transformed_feature_spec = (
tf_transform_output.transformed_feature_spec().copy())
dataset = tf.data.experimental.make_batched_features_dataset(
file_pattern=file_pattern,
batch_size=batch_size,
features=transformed_feature_spec,
reader=_gzip_reader_fn,
label_key=features.transformed_name(features.LABEL_KEY))
return dataset
def _build_keras_model(nb_classes=2, input_shape, learning_rate):
# Keras needs the feature definitions at compile time.
input_shape = (288,1)
input_layer = keras.layers.Input(input_shape)
padding = 'valid'
if input_shape[0] < 60:
padding = 'same'
conv1 = keras.layers.Conv1D(filters=6, kernel_size=7, padding=padding, activation='sigmoid')(input_layer)
conv1 = keras.layers.AveragePooling1D(pool_size=3)(conv1)
conv2 = keras.layers.Conv1D(filters=12, kernel_size=7, padding=padding, activation='sigmoid')(conv1)
conv2 = keras.layers.AveragePooling1D(pool_size=3)(conv2)
flatten_layer = keras.layers.Flatten()(conv2)
output_layer = keras.layers.Dense(units=nb_classes, activation='sigmoid')(flatten_layer)
model = keras.models.Model(inputs=input_layer, outputs=output_layer)
optimizer = keras.optimizers.Adam(lr=learning_rate)
# Compile Keras model
model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['accuracy'])
model.summary(print_fn=logging.info)
return model
This is the error:
tensorflow:Model was constructed with shape (None, 288, 1) for input Tensor("input_1:0", shape=(None, 288, 1), dtype=float32), but it was called on an input with incompatible shape (128, 1, 1).

Why doesn't connect to the expected layer in Keras model

I want to construct a variational autoencoder in Keras (2.2.4, with TensorFlow backend), here is my code:
dims = [1000, 256, 64, 32]
x_inputs = Input(shape=(dims[0],), name='inputs')
h = x_inputs
# internal layers in encoder
for i in range(n_stacks-1):
h = Dense(dims[i + 1], activation='relu', kernel_initializer='glorot_uniform', name='encoder_%d' % i)(h)
# hidden layer
z_mean = Dense(dims[-1], kernel_initializer='glorot_uniform', name='z_mean')(h)
z_log_var = Dense(dims[-1], kernel_initializer='glorot_uniform', name='z_log_var')(h)
z = Lambda(sampling, output_shape=(dims[-1],), name='z')([z_mean, z_log_var])
encoder = Model(inputs=x_inputs, outputs=z, name='encoder')
encoder_z_mean = Model(inputs=x_inputs, outputs=z_mean, name='encoder_z_mean')
# internal layers in decoder
latent_inputs = Input(shape=(dims[-1],), name='latent_inputs')
h = latent_inputs
for i in range(n_stacks-1, 0, -1):
h = Dense(dims[i], activation='relu', kernel_initializer='glorot_uniform', name='decoder_%d' % i)(h)
# output
outputs = Dense(dims[0], activation='relu', kernel_initializer='glorot_uniform' name='mean')
decoder = Model(inputs=latent_inputs, outputs=outputs, name='decoder')
ae_output = decoder(encoder_z_mean(x_inputs))
ae = Model(inputs=x_inputs, outputs=ae_output, name='ae')
ae.summary()
vae_output = decoder(encoder(x_inputs))
vae = Model(inputs=x_inputs, outputs=vae_output, name='vae')
vae.summary()
The problem is I can print the summary of the "ae" and "vae" models, but when I train the ae model, it says
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'latent_inputs' with dtype float and shape [?,32]
In the model "decoder" is supposed to connect to the output of "encoder_z_mean" layer in the ae model. But when I print the summary of the "ae" model, "decoder" is actually connected to "encoder_z_mean[1][0]". Should it be "encoder_z_mean[0][0]"?
A few corrections:
x_inputs is already the input of the encoders, don't call it again with encoder_z_mean(x_inputs) or with encoder(x_inputs)
Besides creating a second node (the 1 that you are worried with, and that is not a problem), it may be the source of the error because it's not an extra input, but the same input
A healthy usage of this would need the creation of a new Input(...) tensor to be called
The last Dense layer is not being called on a tensor. You probably want (h) there.
Do it this way:
# output - called h in the last layer
outputs = Dense(dims[0], activation='relu', kernel_initializer='glorot_uniform' name='mean')(h)
#unchanged
decoder = Model(inputs=latent_inputs, outputs=outputs, name='decoder')
#adjusted inputs
ae_output = decoder(encoder_z_mean.output)
ae = Model(encoder_z_mean.input, ae_output, name='ae')
ae.summary()
vae_output = decoder(encoder.output)
vae = Model(encoder.input, vae_output, name='vae')
vae.summary()
It's possible that the [1][0] still occurs with the decoder, but this is not a problem at all. It means that the decoder itself has its own input node (number 0), and you created an extra input node (number 1) when you called it with the output of another model. This is harmless. The node 1 will be used while node 0 will be ignored.

How to use convolution 1D with lstm ?

I have time series data input 72 value by separate last 6 value for test prediction. I want to use CONV1D with LSTM.
This is my code.
df = pd.read_csv('D://data.csv',
engine='python')
df['DATE_'] = pd.to_datetime(df['DATE_']) + MonthEnd(1)
df = df.set_index('DATE_')
df.head()
split_date = pd.Timestamp('03-01-2015')
train = df.loc[:split_date, ['COLUMN3DATA']]
test = df.loc[split_date:, ['COLUMN3DATA']]
sc = MinMaxScaler()
train_sc = sc.fit_transform(train)
test_sc = sc.transform(test)
X_train = train_sc[:-1]
y_train = train_sc[1:]
X_test = test_sc[:-1]
y_test = test_sc[1:]
################### Convolution #######################
X_train_t = X_train[None,:]
print(X_train_t.shape)
X_test_t = X_test[:, None]
K.clear_session()
model = Sequential()
model.add(Conv1D(6, 3, activation='relu', input_shape=(12,1)))
model.add(LSTM(6, input_shape=(1,3), return_sequences=True))
model.add(LSTM(3))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam' )
model.summary()
model.fit(X_train_t, y_train, epochs=400, batch_size=10, verbose=1)
y_pred = model.predict(X_test_t)
When I run it show error like this
ValueError: Error when checking input: expected conv1d_1_input to have shape (None, 12, 1) but got array with shape (1, 64, 1)
How to use conv1D with lstm
The problem is between your input data and your input shape.
You said in the model that your input shape is (12,1) (= batch_shape=(None,12,1))
But your data X_train_t has shape (1,64,1).
Either you fix the input shape of the model, or you fix your data if this is not the expected shape.
For variable lengths/timesteps, you can use input_shape=(None,1).
You don't need an input_shape in the second layer.