Optimization of Hyperparameters in a CNN

Optimization of Hyperparameters in a CNN - optimization

EDIT: I adjusted the model as suggested. That means I included lr and dropout as arguments in the ConvNet function.
I am new to Neural Networks and CNNs and facing a problem regarding Optimization of Hyperparameters. So now I will explain my process so far:
With the help of various excellent Blog-Posts I was able to build a CNN that works for my project. In my project I am trying to predict the VIX and S&P 500 with the help of the FOMC meeting statements. So basically I habe text data on the one hand and financial data (returns) on the other hand. After preprocessing and applying Googles Word2Vec pre-trained Word-Embeddings I built the following Convolutional Network:
def ConvNet(embeddings, max_sequence_length, num_words, embedding_dim, trainable=False, extra_conv=True,
lr=0.001, dropout=0.5):
embedding_layer = Embedding(num_words,
embedding_dim,
weights=[embeddings],
input_length=max_sequence_length,
trainable=trainable)
sequence_input = Input(shape=(max_sequence_length,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
convs = []
filter_sizes = [3, 4, 5]
for filter_size in filter_sizes:
l_conv = Conv1D(filters=128, kernel_size=filter_size, activation='relu')(embedded_sequences)
l_pool = MaxPooling1D(pool_size=3)(l_conv)
convs.append(l_pool)
l_merge = concatenate([convs[0], convs[1], convs[2]], axis=1)
# add a 1D convnet with global maxpooling, instead of Yoon Kim model
conv = Conv1D(filters=128, kernel_size=3, activation='relu')(embedded_sequences)
pool = MaxPooling1D(pool_size=3)(conv)
if extra_conv == True:
x = Dropout(dropout)(l_merge)
else:
# Original Yoon Kim model
x = Dropout(dropout)(pool)
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
preds = Dense(1, activation='linear')(x)
model = Model(sequence_input, preds)
sgd = SGD(learning_rate = lr, momentum= 0.8)
model.compile(loss='mean_squared_error',
optimizer= sgd,
metrics=['mean_squared_error'])
model.summary()
return model
My model architecture looks like this:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 1086) 0
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 1086, 300) 532500 input_1[0][0]
__________________________________________________________________________________________________
conv1d_1 (Conv1D) (None, 1084, 128) 115328 embedding_1[0][0]
__________________________________________________________________________________________________
conv1d_2 (Conv1D) (None, 1083, 128) 153728 embedding_1[0][0]
__________________________________________________________________________________________________
conv1d_3 (Conv1D) (None, 1082, 128) 192128 embedding_1[0][0]
__________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D) (None, 361, 128) 0 conv1d_1[0][0]
__________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D) (None, 361, 128) 0 conv1d_2[0][0]
__________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D) (None, 360, 128) 0 conv1d_3[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 1082, 128) 0 max_pooling1d_1[0][0]
max_pooling1d_2[0][0]
max_pooling1d_3[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 1082, 128) 0 concatenate_1[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 138496) 0 dropout_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 128) 17727616 flatten_1[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 1) 129 dense_3[0][0]
==================================================================================================
Total params: 18,721,429
Trainable params: 18,188,929
Non-trainable params: 532,500
So, now I am facing the next big problem, and I am really running out of ideas how to solve is: Optimization of hyperparameters
My problem ist, that every code example I found so far is applied to the Optimization of hyperparameters is to the architecture:
model = Sequential()
embedding = model.add(layers.Embedding(MAX_VOCAB_SIZE, EMBEDDING_DIM, input_length=MAX_SEQUENCE_LENGTH))
model.add(layers.Conv1D(filters=128, kernel_size=5, activation='relu'))
model.add(layers.MaxPool1D(pool_size=3))
model.add(Dropout(0.25))
So my specific question is, how to perform the Optimization of hyperparameters, because whenever I change something in my ConvNet I am getting errors an as I said all tutorials I can find are applied to model = Sequential().
The new error message is:
__________________________________________________________________________________________________
0%| | 0/100 [00:00<?, ?trial/s, best loss=?]
job exception: 'Model' object is not subscriptable
Traceback (most recent call last):
File "/Users/lukaskoston/Desktop/MasterarbeitFOMCAnalysis/07_Regression/CNN regression neu.py", line 262, in <module>
max_evals=100)
File "/Users/lukaskoston/.local/lib/python3.7/site-packages/hyperopt/fmin.py", line 482, in fmin
show_progressbar=show_progressbar,
File "/Users/lukaskoston/.local/lib/python3.7/site-packages/hyperopt/base.py", line 686, in fmin
show_progressbar=show_progressbar,
File "/Users/lukaskoston/.local/lib/python3.7/site-packages/hyperopt/fmin.py", line 509, in fmin
rval.exhaust()
File "/Users/lukaskoston/.local/lib/python3.7/site-packages/hyperopt/fmin.py", line 330, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
File "/Users/lukaskoston/.local/lib/python3.7/site-packages/hyperopt/fmin.py", line 286, in run
self.serial_evaluate()
File "/Users/lukaskoston/.local/lib/python3.7/site-packages/hyperopt/fmin.py", line 165, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "/Users/lukaskoston/.local/lib/python3.7/site-packages/hyperopt/base.py", line 894, in evaluate
rval = self.fn(pyll_rval)
File "/Users/lukaskoston/Desktop/MasterarbeitFOMCAnalysis/07_Regression/CNN regression neu.py", line 248, in train_and_score
return hist['val_loss'][-1]
TypeError: 'Model' object is not subscriptable
Thanks in advance,
Lukas

You should make your hyperparameters arguments to your method.
def ConvNet(embeddings, max_sequence_length, num_words, embedding_dim, trainable=False, extra_conv=True
lr=1.0, dropout=0.5):
# ...
Then you can update your code to use those values instead of the fixed values or Keras defaults.
if extra_conv == True:
x = Dropout(dropout)(l_merge)
else:
# Original Yoon Kim model
x = Dropout(dropout)(pool)
And:
model.compile(loss='mean_squared_error',
optimizer=keras.optimizers.Adadelta(learning_rate=lr).
metrics=['mean_squared_error'])
I would start with those two, and leave batch size and epoch count for later. Those can have a big effect on run time, which is hard to account for in the hyperparameter optimization.
Then you can optimize with a library like hyperopt.
from hyperopt import fmin, hp, tpe, space_eval
def train_and_score(args):
# Train the model the fixed params plus the optimization args.
# Note that this method should return the final History object.
hist = ConvNet(embeddings, max_sequence_length, num_words,
embedding_dim, trainable=False, extra_conv=True,
lr=args['lr'], dropout=args['dropout'])
# Unpack and return the last validation loss from the history.
return hist['val_loss'][-1]
# Define the space to optimize over.
space = {
'lr': hp.loguniform('lr', np.log(0.1), np.log(10.0)),
'dropout': hp.uniform('dropout', 0, 1),
}
# Minimize the training score over the space.
trials = Trials()
best = fmin(train_and_score, space, trials=trials, algo=tpe.suggest, max_evals=100)
# Print details about the best results and hyperparameters.
print(best)
print(space_eval(space, best))
There are also libraries that will help you directly integrate this with Keras. A popular choice is hyperas. In that case you would modify your function to use some templates instead of parameters, but it is otherwise very similar.

Related

Tensorflow music generation with lstm - model.fit not working

I am trying to train a lstm model for music generation, but something seems to cause an error when calling model.fit().
# Compile the model
lstm.compile(loss='categorical_crossentropy', optimizer='rmsprop')
tf.keras.utils.plot_model(lstm, show_shapes=True)
# Train the model
lstm.fit([trainChords, trainDurations], [targetChords, targetDurations], epochs=500)
Model: "model_6"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_19 (InputLayer) [(None, 32)] 0 []
input_20 (InputLayer) [(None, 32)] 0 []
embedding_18 (Embedding) (None, 32, 64) 2048 ['input_19[0][0]']
embedding_19 (Embedding) (None, 32, 64) 2048 ['input_20[0][0]']
concatenate_9 (Concatenate) (None, 64, 64) 0 ['embedding_18[0][0]',
'embedding_19[0][0]']
lstm_8 (LSTM) (None, 64, 512) 1181696 ['concatenate_9[0][0]']
dense_18 (Dense) (None, 64, 256) 131328 ['lstm_8[0][0]']
dense_19 (Dense) (None, 64, 32) 8224 ['dense_18[0][0]']
dense_20 (Dense) (None, 64, 32) 8224 ['dense_18[0][0]']
==================================================================================================
Total params: 1,333,568
Trainable params: 1,333,568
Non-trainable params: 0
I get the error that some shapes are not compatible, but I don't know how to fix this.
ValueError Traceback (most recent call last)
<ipython-input-63-8785c106bc4b> in <module>
5
6 # Train the model
----> 7 lstm.fit([trainChords, trainDurations], [targetChords, targetDurations], epochs=500)
ValueError: Shapes (None,) and (None, 64, 32) are incompatible
Any help would be appreciated.
Update on building the model. Maybe the error is with the output layers(?)
# Define input layers
chordInput = tf.keras.layers.Input(shape = (nChords))
durationInput = tf.keras.layers.Input(shape = (nDurations))
# Define embedding layers
chordEmbedding = tf.keras.layers.Embedding(nChords, embedDim, input_length = sequenceLength)(chordInput)
durationEmbedding = tf.keras.layers.Embedding(nDurations, embedDim, input_length = sequenceLength)(durationInput)
# Merge embedding layers using a concatenation layer
mergeLayer = tf.keras.layers.Concatenate(axis=1)([chordEmbedding, durationEmbedding])
# Define LSTM layer
lstmLayer = tf.keras.layers.LSTM(512, return_sequences=True)(mergeLayer)
# Define dense layer
denseLayer = tf.keras.layers.Dense(256)(lstmLayer)
# Define output layers
chordOutput = tf.keras.layers.Dense(nChords, activation = 'softmax')(denseLayer)
durationOutput = tf.keras.layers.Dense(nDurations, activation = 'softmax')(denseLayer)
# nChords and nDurations are both 32
# Define model
lstm = tf.keras.Model(inputs = [chordInput, durationInput], outputs = [chordOutput, durationOutput])

How to set output_shape of BERT preprocessing layer from tensorflow hub?

I am building a simple BERT model for text classification, using the tensorflow hub.
import tensorflow as tf
import tensorflow_hub as tf_hub
bert_preprocess = tf_hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
bert_encoder = tf_hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
preprocessed_text = bert_preprocess(text_input)
encoded_input = bert_encoder(preprocessed_text)
l1 = tf.keras.layers.Dropout(0.3, name="dropout1")(encoded_input['pooled_output'])
l2 = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(l1)
model = tf.keras.Model(inputs=[text_input], outputs = [l2])
model.summary()
Upon analyzing the output of the bert_preprocess step, I noticed that they are arrays of length 128. My texts are much shorter on average than 128 tokens and as such, my intention would be to decrease this length parameter, so that the preprocessing yields, say, arrays of length 40 only. However, I cannot figure out how to pass this max_length or output_shape parameter to the bert_preprocess.
Printed model summary:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
text (InputLayer) [(None,)] 0 []
keras_layer_16 (KerasLayer) {'input_word_ids': 0 ['text[0][0]']
(None, 128),
'input_type_ids':
(None, 128),
'input_mask': (Non
e, 128)}
keras_layer_17 (KerasLayer) {'sequence_output': 109482241 ['keras_layer_16[0][0]',
(None, 128, 768), 'keras_layer_16[0][1]',
'default': (None, 'keras_layer_16[0][2]']
768),
'encoder_outputs':
[(None, 128, 768),
(None, 128, 768),
(None, 128, 768),
(None, 128, 768),
(None, 128, 768),
(None, 128, 768),
(None, 128, 768),
...
Total params: 109,483,010
Trainable params: 769
Non-trainable params: 109,482,241
Checking the documentation, I found there is a output_shape argument for tf_hub.KerasLayer, so I tried passing the following arguments:
bert_preprocess = tf_hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3", output_shape=(64,))
bert_preprocess = tf_hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3", output_shape=[64])
However, in both of these cases, the following line throws an error:
bert_preprocess(["we have a very sunny day today don't you think so?"])
Error:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_23952\4048288771.py in <module>
----> 1 bert_preprocess("we have a very sunny day today don't you think so?")
~\AppData\Roaming\Python\Python37\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
c:\Users\username\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_hub\keras_layer.py in call(self, inputs, training)
237 result = smart_cond.smart_cond(training,
238 lambda: f(training=True),
--> 239 lambda: f(training=False))
240
241 # Unwrap dicts returned by signatures.
c:\Users\username\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_hub\keras_layer.py in <lambda>()
237 result = smart_cond.smart_cond(training,
238 lambda: f(training=True),
--> 239 lambda: f(training=False))
240
241 # Unwrap dicts returned by signatures.
...
Keyword arguments: {}
Call arguments received:
• inputs="we have a very sunny day today don't you think so?"
• training=False

You need to go lower levels in order to achieve this. Your goal was shown in the page of preprocess layer, however, not properly introduced.
You can wrap your intention into a custom TF layer:
class ModifiedBertPreprocess(tf.keras.layers.Layer):
def __init__(self, max_len):
super(ModifiedBertPreprocess, self).__init__()
preprocessor = tf_hub.load(
"https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
self.tokenizer = tf_hub.KerasLayer(preprocessor.tokenize, name="tokenizer")
self.prep_layer = tf_hub.KerasLayer(
preprocessor.bert_pack_inputs,
arguments={"seq_length":max_len})
def call(self, inputs, training):
tokenized = [self.tokenizer(seq) for seq in inputs]
return self.prep_layer(tokenized)
Basically, you will tokenize and prepare your inputs by yourself. Preprocessor has a method named bert_pack_inputs which will let you the specify max_len of the inputs.
For some reason, self.tokenizer expects the inputs in a list format. Mostly likely this will allow it to accept multiple inputs.
Your model should look like this:
bert_encoder = tf_hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")
text_input = [tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')]
bert_seq_changed = ModifiedBertPreprocess(max_len=40)
encoder_inputs = bert_seq_changed(text_input)
encoded_input = bert_encoder(encoder_inputs)
l1 = tf.keras.layers.Dropout(0.3, name="dropout1")(encoded_input['pooled_output'])
l2 = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(l1)
model = tf.keras.Model(inputs=[text_input], outputs = [l2])
Note that text_input layer is now inside in a list as self.tokenizer's input signatures expects a list.
Here's the model summary:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
text (InputLayer) [(None,)] 0 []
modified_bert_preprocess (Modi {'input_type_ids': 0 ['text[0][0]']
fiedBertPreprocess) (None, 40),
'input_word_ids':
(None, 40),
'input_mask': (Non
e, 40)}
keras_layer (KerasLayer) {'encoder_outputs': 109482241 ['modified_bert_preprocess[0][0]'
[(None, 40, 768), , 'modified_bert_preprocess[0][1]
(None, 40, 768), ',
(None, 40, 768), 'modified_bert_preprocess[0][2]'
(None, 40, 768), ]
(None, 40, 768),
(None, 40, 768),
(None, 40, 768),
(None, 40, 768),
(None, 40, 768),
(None, 40, 768),
(None, 40, 768),
(None, 40, 768)],
'default': (None,
768),
'pooled_output': (
None, 768),
'sequence_output':
(None, 40, 768)}
dropout1 (Dropout) (None, 768) 0 ['keras_layer[0][13]']
output (Dense) (None, 1) 769 ['dropout1[0][0]']
==================================================================================================
Total params: 109,483,010
Trainable params: 769
Non-trainable params: 109,482,241
When calling the custom preprocessing layer:
bert_seq_changed([tf.convert_to_tensor(["we have a very sunny day today don't you think so?"], dtype=tf.string)])
Notice, the inputs should be in a list. Calling the model can be done with both ways:
model([tf.convert_to_tensor(["we have a very sunny day today don't you think so?"], dtype=tf.string)])
or
model(tf.convert_to_tensor(["we have a very sunny day today don't you think so?"], dtype=tf.string))

Keras ELMO fails during training: Unsupported object type int

I've got this network thats using TF Hub's Elmo layer for a classification task. Oddly it starts the training but fails during the process with the error:
Unsupported object type int
import tensorflow_hub as hub
import tensorflow as tf
elmo = hub.Module("https://tfhub.dev/google/elmo/3", trainable=True)
from tensorflow.keras.layers import Input, Lambda, Bidirectional, Dense, Dropout, Flatten, LSTM
from tensorflow.keras.models import Model
def ELMoEmbedding(input_text):
return elmo(tf.reshape(tf.cast(input_text, tf.string), [-1]), signature="default", as_dict=True)["elmo"]
def build_model():
input_layer = Input(shape=(1,), dtype="string", name="Input_layer")
embedding_layer = Lambda(ELMoEmbedding, output_shape=(1024, ), name="Elmo_Embedding")(input_layer)
BiLSTM = Bidirectional(LSTM(128, return_sequences= False, recurrent_dropout=0.2, dropout=0.2), name="BiLSTM")(embedding_layer)
Dense_layer_1 = Dense(64, activation='relu')(BiLSTM)
Dropout_layer_1 = Dropout(0.5)(Dense_layer_1)
Dense_layer_2 = Dense(32, activation='relu')(Dropout_layer_1)
Dropout_layer_2 = Dropout(0.5)(Dense_layer_2)
output_layer = Dense(1, activation='sigmoid')(Dropout_layer_2)
model = Model(inputs=[input_layer], outputs=output_layer, name="BiLSTM with ELMo Embeddings")
model.summary()
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
return model
elmo_BiDirectional_model = build_model()
import numpy as np
import io
import re
from tensorflow import keras
i = 0
max_cells = 51
x_data = np.zeros((max_cells, 1), dtype='object')
y_data = np.zeros((max_cells, 1), dtype='float32')
with io.open('./data/names-sample.txt', encoding='utf-8') as f:
content = f.readlines()
for line in content:
line = re.sub("[\n]", " ", line)
x_data[i] = line
y_data[i] = .1 #testing!
i = i+1
with tf.Session() as session:
session.run(tf.global_variables_initializer())
session.run(tf.tables_initializer())
model_elmo = elmo_BiDirectional_model.fit(x_data, y_data, epochs=100, batch_size=5)
train_prediction = elmo_BiDirectional_model.predict(x_data)
Full error:
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
Model: "BiLSTM with ELMo Embeddings"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Input_layer (InputLayer) [(None, 1)] 0
_________________________________________________________________
Elmo_Embedding (Lambda) (None, None, 1024) 0
_________________________________________________________________
BiLSTM (Bidirectional) (None, 256) 1180672
_________________________________________________________________
dense_43 (Dense) (None, 64) 16448
_________________________________________________________________
dropout_28 (Dropout) (None, 64) 0
_________________________________________________________________
dense_44 (Dense) (None, 32) 2080
_________________________________________________________________
dropout_29 (Dropout) (None, 32) 0
_________________________________________________________________
dense_45 (Dense) (None, 1) 33
=================================================================
Total params: 1,199,233
Trainable params: 1,199,233
Non-trainable params: 0
_________________________________________________________________
Train on 51 samples
Epoch 1/100
30/51 [================>.............] - ETA: 2s - loss: 0.5324 - acc: 0.0000e+00 Traceback (most recent call last):
File "C:\temp\Simon\TestElmo2.py", line 52, in <module>
model_elmo = elmo_BiDirectional_model.fit(x_data, y_data, epochs=100, batch_size=5)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 675, in fit
steps_name='steps_per_epoch')
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 394, in model_iteration
batch_outs = f(ins_batch)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3476, in __call__
run_metadata=self.run_metadata)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\client\session.py", line 1472, in __call__
run_metadata_ptr)
InternalError: Unsupported object type int

This turned out to be a data issue. I had an empty line in the dataset!

"Tap" a specific layer in existing Keras Model and make a branch to a new output?

Environment:
Im using TF.Keras (Tensorflow 1.14) on Google Colab, and my model architecture is MobileNet V2 1.00 224.
Problem:
I am trying (and failing) to attach a new layer and make a new output to an existing layer that is not the normal output of my Model. I.e make a branch earlier in MobileNet V2
I want this new branch to be for a regression output - but I dont want that output to serially connected off of the final embedding layer of MobileNet, but a much earlier stage (which one - im not sure, im experimenting). Basically a branch with its own output, and then the normal, pre-trained image net embedding out.
Grab MobileNet V2 as base_model:
base_model = tf.keras.applications.MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3),
include_top=False,
weights='imagenet')
base_model.trainable = False
Make my layers from base_model and make my new outputs.
# get layers from mobilenet base layer
mobilenet_input = base_model.get_layer('input_1')
mobilenet_output = base_model.get_layer('out_relu')
# add our average pooling layer to our MobileNetV2 output like all of our other classifiers so we split our graph on the same nodes
out_global_pooling = tf.keras.layers.GlobalAveragePooling2D(name='embedding_pooling')(mobilenet_output.output)
out_global_pooling.trainable = False
# Our new branch and outputs for the branch
expanded_conv_depthwise_BN = base_model.get_layer('expanded_conv_depthwise_BN')
regression_dropout = tf.keras.layers.Dropout(0.5) (expanded_conv_depthwise_BN.output)
regression_global_pooling = tf.keras.layers.GlobalAveragePooling2D(name="regression_pooling")(regression_dropout)
new_regression_output = tf.keras.layers.Dense(num_labels, activation = 'sigmoid', name = "cinemanet_output") (regression_global_pooling)
This appears to be fine, and I can even make my model via the functional API:
model = tf.keras.Model(inputs=mobilenet_input.input, outputs=[out_global_pooling, new_regression_output])
My Training Code
My data set is a set of 30 floats (10 RGB duplets) I want to predict from an input image. My data set functions when training a 'sequence' model, but fails when I try to train this model.
ops.reset_default_graph()
tf.keras.backend.set_learning_phase(1) # 0 testing, 1 training mode
# preview contents of CSV to verify things are sane
import csv
import math
def lenopenreadlines(filename):
with open(filename) as f:
return len(f.readlines())
def csvheaderrow(filename):
with open(filename) as f:
reader = csv.reader(f)
return next(reader, None)
# !head {label_file}
NUM_IMAGES = ( lenopenreadlines(label_file) - 1) # remove header
COLUMN_NAMES = csvheaderrow(label_file)
LABEL_NAMES = COLUMN_NAMES[:]
LABEL_NAMES.remove("filepath")
ALL_LABELS.extend(LABEL_NAMES)
# make our data set
BATCH_SIZE = 256
NUM_EPOCHS = 50
FILE_PATH = ["filepath"]
LABELS_TO_PRINT = ' '.join(LABEL_NAMES)
print("Label contains: " + str(NUM_IMAGES) + " images")
print("Label Are: " + LABELS_TO_PRINT)
print("Creating Data Set From " + label_file)
csv_dataset = get_dataset(label_file, BATCH_SIZE, NUM_EPOCHS, COLUMN_NAMES)
#make a new data set from our csv by mapping every value to the above function
split_dataset = csv_dataset.map(split_csv_to_path_and_labels)
# make a new datas set that loads our images from the first path
image_and_labels_ds = split_dataset.map(load_and_preprocess_image_batch, num_parallel_calls=AUTOTUNE)
# update our image floating point range to match -1, 1
ds = image_and_labels_ds.map(change_range)
print(image_and_labels_ds)
model = build_model(LABEL_NAMES, use_masked_loss)
#split the final data set into train / validation splits to use for our model.
DATASET_SIZE = NUM_IMAGES
ds = ds.repeat()
steps_per_epoch = int(math.floor(DATASET_SIZE/BATCH_SIZE))
history = model.fit(ds, epochs=NUM_EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[TensorBoardColabCallback(tbc)])
print(history)
# results = model.evaluate(test_dataset)
# print('test loss, test acc:', results)
export_model(model, model_name, LABEL_NAMES, date)
ValueError: Error when checking model target:
the list of Numpy arrays that you are passing to your model is not the size the model expected.
Expected to see 2 array(s), but instead got the following list of 1 arrays:
[<tf.Tensor 'IteratorGetNext:1' shape=(?, 30) dtype=float32>]
If I instead use a Sequence and naively try to train my regression task against final output of mobile net (rather than the branch) - training works fine (although I get poor results).
My Model summary appears to tell me things are wired as I expect. My dropout is connected to expanded_conv_depthwise_BN. My regression pooling is connected to my drop out and my output layer appears in the summary connected to my regressing pooling
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________
Conv1_pad (ZeroPadding2D) (None, 225, 225, 3) 0 input_1[0][0]
__________________________________________________________________________________________________
Conv1 (Conv2D) (None, 112, 112, 32) 864 Conv1_pad[0][0]
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization) (None, 112, 112, 32) 128 Conv1[0][0]
__________________________________________________________________________________________________
Conv1_relu (ReLU) (None, 112, 112, 32) 0 bn_Conv1[0][0]
__________________________________________________________________________________________________
expanded_conv_depthwise (Depthw (None, 112, 112, 32) 288 Conv1_relu[0][0]
__________________________________________________________________________________________________
expanded_conv_depthwise_BN (Bat (None, 112, 112, 32) 128 expanded_conv_depthwise[0][0]
__________________________________________________________________________________________________
expanded_conv_depthwise_relu (R (None, 112, 112, 32) 0 expanded_conv_depthwise_BN[0][0]
__________________________________________________________________________________________________
expanded_conv_project (Conv2D) (None, 112, 112, 16) 512 expanded_conv_depthwise_relu[0][0
__________________________________________________________________________________________
< snip for brevity >
________
block_16_project (Conv2D) (None, 7, 7, 320) 307200 block_16_depthwise_relu[0][0]
__________________________________________________________________________________________________
block_16_project_BN (BatchNorma (None, 7, 7, 320) 1280 block_16_project[0][0]
__________________________________________________________________________________________________
Conv_1 (Conv2D) (None, 7, 7, 1280) 409600 block_16_project_BN[0][0]
__________________________________________________________________________________________________
Conv_1_bn (BatchNormalization) (None, 7, 7, 1280) 5120 Conv_1[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 112, 112, 32) 0 expanded_conv_depthwise_BN[0][0]
__________________________________________________________________________________________________
out_relu (ReLU) (None, 7, 7, 1280) 0 Conv_1_bn[0][0]
__________________________________________________________________________________________________
regression_pooling (GlobalAvera (None, 32) 0 dropout[0][0]
__________________________________________________________________________________________________
embedding_pooling (GlobalAverag (None, 1280) 0 out_relu[0][0]
__________________________________________________________________________________________________
cinemanet_output (Dense) (None, 30) 990 regression_pooling[0][0]
==================================================================================================
Total params: 2,258,974
Trainable params: 990
Non-trainable params: 2,257,984

It looks like you are setting things up correctly, but your training dataset doesn't include tensors for both outputs. If you only want to train the new output, you can provide dummy tensors (or even real training data) for the other one while using a loss weight of 0 to prevent the parameters from updating. That should also prevent any parameters that are not directly "upstream" of the new output layer from updating during training.
When compiling your model, use the argument loss_weights to pass the weights as either a list (e.g., loss_weights=[0, 1]) or a dictionary (e.g., loss_weights={'out_relu': 0, 'cinemanet_output': 1}).

Seralizing a keras model with an embedding layer

I've trained a model with pre-trained word embeddings like this:
embedding_matrix = np.zeros((vocab_size, 100))
for word, i in text_tokenizer.word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector
embedding_layer = Embedding(vocab_size,
100,
embeddings_initializer=Constant(embedding_matrix),
input_length=50,
trainable=False)
With the architecture looking like this:
sequence_input = Input(shape=(50,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
text_cnn = Conv1D(filters=5, kernel_size=5, padding='same', activation='relu')(embedded_sequences)
text_lstm = LSTM(500, return_sequences=True)(embedded_sequences)
char_in = Input(shape=(50, 18, ))
char_cnn = Conv1D(filters=5, kernel_size=5, padding='same', activation='relu')(char_in)
char_cnn = GaussianNoise(0.40)(char_cnn)
char_lstm = LSTM(500, return_sequences=True)(char_in)
merged = concatenate([char_lstm, text_lstm])
merged_d1 = Dense(800, activation='relu')(merged)
merged_d1 = Dropout(0.5)(merged_d1)
text_class = Dense(len(y_unique), activation='softmax')(merged_d1)
model = Model([sequence_input,char_in], text_class)
When I go to convert the model to json, I get this error:
ValueError: can only convert an array of size 1 to a Python scalar
Similarly, if I use the model.save() function, it seems to save correctly, but when I go to load it, I get Type Error: Expected Float32.
My question is: is there something I am missing when trying to serialize this model? Do I need some sort of Lambda layer or something of the sorts?
Any help would be greatly appreciated!

You can use the weights argument in Embedding layer to provide initial weights.
embedding_layer = Embedding(vocab_size,
100,
weights=[embedding_matrix],
input_length=50,
trainable=False)
The weights should remain non-trainable after model saving/loading:
model.save('1.h5')
m = load_model('1.h5')
m.summary()
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) (None, 50) 0
__________________________________________________________________________________________________
input_4 (InputLayer) (None, 50, 18) 0
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 50, 100) 1000000 input_3[0][0]
__________________________________________________________________________________________________
lstm_4 (LSTM) (None, 50, 500) 1038000 input_4[0][0]
__________________________________________________________________________________________________
lstm_3 (LSTM) (None, 50, 500) 1202000 embedding_1[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 50, 1000) 0 lstm_4[0][0]
lstm_3[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 50, 800) 800800 concatenate_2[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 50, 800) 0 dense_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 50, 15) 12015 dropout_2[0][0]
==================================================================================================
Total params: 4,052,815
Trainable params: 3,052,815
Non-trainable params: 1,000,000
__________________________________________________________________________________________________

I hope you are saving the model after compiling. Like:
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
To save model, you can do:
from keras.models import load_model
model.save('model.h5')
model = load_model('model_detect1.h5')
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
To load model,
from keras.models import model_from_json
json_file = open('model.json', 'r')
model_json = json_file.read()
model = model_from_json(model_json)
model.load_weights("model.h5")

I tried multiple methods . The problem is when we work in the embedding layer, then pickle doesnt work, and is not able to save the data.
SO what you can do , when you have some layers like these:-
## Creating model
embedding_vector_features=100
model=Sequential()
model.add(Embedding(voc_size,embedding_vector_features,input_length=sent_length))
model.add(LSTM(100))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
print(model.summary())
then, u can use
h5 extension to d=save file, and then convert that to json, model converetd to model2 here
from tensorflow.keras.models import load_model
model.save('model.h5')
model = load_model('model.h5')
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
and this to load data:-
from tensorflow.keras.models import model_from_json
json_file = open('model.json', 'r')
model_json = json_file.read()
model2 = model_from_json(model_json)
model2.load_weights("model.h5")

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Optimization of Hyperparameters in a CNN - optimization

Related

Tensorflow music generation with lstm - model.fit not working

How to set output_shape of BERT preprocessing layer from tensorflow hub?

Keras ELMO fails during training: Unsupported object type int

"Tap" a specific layer in existing Keras Model and make a branch to a new output?

Seralizing a keras model with an embedding layer

Categories

Resources