TF / TB running a profile with the following setup :
HP_NUM_NODES_ONE = hp.HParam('nodes_one', hp.Discrete([128]))
HP_NUM_NODES_TWO = hp.HParam('nodes_two', hp.Discrete([64, 128, 256]))
HP_NUM_NODES_THR = hp.HParam('nodes_thr', hp.Discrete([64, 128, 256]))
HP_NUM_FILT = hp.HParam('num_filter', hp.Discrete([64, 128, 256]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.3))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd', 'RMSprop']))
MSE = 'mean_squared_error'
with tf.summary.create_file_writer('logs/hparam_21-6').as_default():
hp.hparams_config(
hparams=[HP_NUM_NODES_ONE, HP_NUM_NODES_TWO, HP_NUM_NODES_THR,
HP_NUM_FILT, HP_DROPOUT, HP_OPTIMIZER],
metrics=[hp.Metric(MSE, display_name='Mean Squared Error')],
)
with this model :
def train_test_model(hparams):
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(filters = hparams[HP_NUM_FILT], kernel_size=6, strides=1,
activation='relu', input_shape=(300,4), use_bias=True),
tf.keras.layers.MaxPooling1D(pool_size=100),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(hparams[HP_NUM_NODES_ONE], activation=tf.nn.relu),
tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
tf.keras.layers.Dense(hparams[HP_NUM_NODES_TWO], activation=tf.nn.relu),
tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
tf.keras.layers.Dense(hparams[HP_NUM_NODES_THR], activation="linear"),
])
model.compile(
optimizer=hparams[HP_OPTIMIZER],
loss='mean_squared_error',
metrics=['mean_squared_error'],
)
model.fit(feature3, label2[0,], epochs=500)
_, mean_squared_error = model.evaluate(x_test, y_test[0,])
return mean_squared_error
It all seems to run fine and I do get outputs, however the outputs in tensorboard do not show the results for different values of the "optimizer", which I want also. Does it need to be treated/coded differently? Happy to supply more of the code if this is not clear. Thx. J
I found the problem, as simple as an unticked box in the tensorbox results! So the standard results returned are:
however is I simply tick the optimizer box, the results appear.
I have no idea why tensorboard would, by default, turn this off.
Related
I am trying to use CNN for trying to classify cats/dogs and noticed something strange.
When i define the model compile statement as below -
cat_dog_model.compile(optimizer =optimizers.Adam(),
metrics= [metrics.Accuracy()], loss=losses.binary_crossentropy)
my accuracy is very bad - something like 0.15% after 25 epochs.
When i define the same as
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
my accuracy shoots upto 55% in the first epoch and almost 80% by epoch 25.
When I read the Keras doc - https://keras.io/api/optimizers/ they mention explicitly that
You can either instantiate an optimizer before passing it to model.compile(), as in the above example, or you can pass it by its string identifier. In the latter case, the default parameters for the optimizer will be used.
Also the metrics parameter are also as per the API - Keras Metrics API
So as per my understanding i am using default parameters on both. Also when i change the metrics parameter to hardcode I get the same accuracy. So somehow the accuracy metrics is causing this issue. But I cant figure out why - Any help is appreciated.
My qn is why is hard coding metrics better than defining it as parameter?
Some more details : I am trying to use 8k images for training and about 2k images for validation.
sample code (you can change the line number 32 to get different results) :
from keras import models, layers, losses, metrics, optimizers
import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator, load_img,img_to_array
train_datagen = ImageDataGenerator(rescale = 1./255,shear_range = 0.2,zoom_range = 0.2,horizontal_flip = True)
train_set = train_datagen.flow_from_directory('/content/drive/MyDrive/....../training_set/',
target_size = (64, 64),batch_size = 32,class_mode = 'binary')
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory(
'/content/drive/MyDrive/........./test_set/',
target_size = (64, 64),batch_size = 32,class_mode = 'binary')
cat_dog_model = models.Sequential()
cat_dog_model.add(layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3]))
cat_dog_model.add(layers.MaxPool2D(pool_size=2, strides=2))
cat_dog_model.add(layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cat_dog_model.add(layers.MaxPool2D(pool_size=2, strides=2) )
cat_dog_model.add(layers.Flatten())
cat_dog_model.add(layers.Dense(units=128, activation='relu'))
cat_dog_model.add(layers.Dense(units=1, activation='sigmoid'))
cat_dog_model.compile(optimizer =optimizers.Adam(), metrics= [metrics.Accuracy()], loss=losses.binary_crossentropy)
cat_dog_model.summary()
cat_dog_model.fit(x=train_set,validation_data=test_set, epochs=25)
We have got some train valid and test data to create as homework CNN1D and to compare results with another model to get the exam marks
I tried with this model however I'm getting 84.18 accuracy Vs 84.58 for the competitor model. my classmates got also the same model as mine and they could improve it to get 85.20% as accuracy. Im just authorized to change the hyper parameters or to add/modify/delete some layers v after the fusion = concate()
Can anyone please help me improve this
def CNN1D ()
n_filters=256,
dropout_rate = 0.4
conv1 = Conv1D(filters=n_filters, kernel_size=3, padding='valid', name="conv1_", activation="relu")
Dropout1 = Dropout(rate=dropout_rate, name="dropOut1_")
conv2 = Conv1D(filters=n_filters, kernel_size=3, padding='valid', name="conv2_", activation="relu")
Dropout2 = Dropout(rate=dropout_rate, name="dropOut2_")
conv3 = Conv1D(filters=n_filters*2, kernel_size=3, padding='valid', name="conv3_", activation="relu")
Dropout3 = Dropout(rate=dropout_rate, name="dropOut3_")
conv4 = Conv1D(filters=n_filters*2, kernel_size=1, padding='valid', name="conv4_", activation="relu")
Dropout4 = Dropout(rate=dropout_rate,name="dropOut4_")
globPool = GlobalAveragePooling1D()
def TwoBranchModel():
num_units=256
branch1 = CNN1D()
branch2 = CNN1D()
fusion = concate()
out = tf.keras.Sequential([
Dense(num_units,activation='relu'),
BatchNormalization(),
Dense(n_classes,activation='softmax')
])
I would suggest you to try playing with the following
decrease the dropout percentage
Try playing with BatchNormalisation hyperparameters. See here: https://keras.io/api/layers/normalization_layers/batch_normalization/ and adjust the momentum. Also remove the BN layer and see the accuracy.
I am not sure if you can change the below (as per your constraint)
change globalpoolaverage to maxpool
your filter size seems to be constant, it is good idea if start increasing your number of filters. For example start with 32, 64, 128 and so on..
remove some dropout layers.
I created a simple cnn to detect custom digits and I am trying to visualize the activations of my layers. When I run the following code layer_outputs = [layer.output for layer in model.layers[:9]] I get the error Layer conv2d has no inbound nodes
When I searched online, it said to define input shape of first layer, but I've already done that and I'm not sure why that is happening. Below is my model.
class myModel(Model):
def __init__(self):
super().__init__()
self.conv1 = Conv2D(filters=32, kernel_size=(3,3), activation='relu', padding='same',
input_shape=(image_height, image_width, num_channels))
self.maxPool1 = MaxPool2D(pool_size=(2,2))
self.conv2 = Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same')
self.maxPool2 = MaxPool2D(pool_size=(2,2))
self.conv3 = Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same')
self.maxPool3 = MaxPool2D(pool_size=(2,2))
self.flatten = Flatten()
self.d1 = Dense(128, activation='relu')
self.d2 = Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.maxPool1(x)
x = self.conv2(x)
x = self.maxPool2(x)
x = self.conv3(x)
x = self.maxPool3(x)
x = self.flatten(x)
x = self.d1(x)
x = self.d2(x)
return x
Based on your stated goal and what you've posted, I believe the problem here is slightly (and very understandably) misunderstanding the way the TensorFlow APIs work. The model object and its constituent parts only store state for the model, not the evaluation of it, for example the hyperparameters you've set and the parameters the model learns when its fed training data. Even if you worked to fix the problem with what you're trying, the .output of the layer objects as part of the model wouldn't return the activations you want to visualize. It instead returns the part of the TensorFlow graph that represents that part of the computation.
For what you want to do, you'll need to manipulate an object that's the result of calling the .predict function on the model that you've set up and trained. Or you could drop down to below the Keras abstractions and manipulate the tensors directly.
If I gave this more thought, there's probably a reasonably elegant way to get this by only evaluating your graph (i.e., calling .predict) once, but the most obvious naïve way is simply to instantiate several new models (or several subclasses of your model) with each of the layers of interest as the terminal output, which should get you what you want.
For example, you could do something like this for each of the layers whose outputs you're interested in:
my_test_image = # get an image
input = Input(shape=(None, 256, 256, 3)) # input size will need to be set according to the relevant model
outputs_of_interest = Model(input, my_model.layers[-2].output)
outputs_of_interest.predict(my_test_image) # <=== this has the output you want
My issue:
I am trying to train a semantic segmentation model in tf.keras, in fact it works very well when I am using channels_last (WHC) mode (it reaches 96%+ val acc). I wanted to train it in channels_first (CHW) mode so the weights are compatible with TensorRT. When I do this, the ~80% training accuracy in the first few epochs dips down to around 0.020% and stays there permanently.
It is useful to know that the base of my model is a tf.keras.applications.MobileNet() model with the pre-trained 'imagenet' weights. (Model architecture at the bottom.)
The transformation process:
I used the guidelines provided and I change only a few things here:
Set tf.keras.backend.set_image_data_format() to 'channels_first'.
I change the channel order in the input tensor from: input_tensor=Input(shape=(376, 672, 3)) to: input_tensor=Input(shape=(3, 376, 672))
In my image preprocessing (using tf.data.Dataset), i use tf.transpose(img, perm=[2, 0, 1]) on both my input image and one-hot encoded mask to change the channel orders. I checked this with equality assertion to make sure its correct and it seems to be fine.
When I change these the training starts fine but as I said the training accuracy goes down to almost zero. When I revert back everything's fine again.
Possible leads:
What am I doing wrong or what could be the problematic part here? My suspicions are around these questions:
Are the pre-trained imageNet weights changed to the 'channels_first' order also when I set the backend? Is this something I should consider at all?
Could it be that the tf.transpose() function messes up the mask's one-hot encoding? (I have 3 classes represented by 3 colors: lane, opposing lane, background)
Maybe I am not seeing something obvious. I can provide further code and answers as needed.
EDIT:
08/17: This is still an ongoing issue, I have tried several things:
I checked if the image and the mask is correct after the transpose with numpy assertion, seems correct.
I suspected that the loss function calculates on the wrong axis, so I customized the loss function for the first axis (where the channels are). Here it is:
def ReverseAxisLoss(y_true, y_pred):
return K.categorical_crossentropy(y_true, y_pred, from_logits=True, axis=1)
My main suspicion is that the 'channels first' backend setting does nothing to transpose the pretrained 'imagenet' weights for the mobilenet part. Is there an updated way for TF2.x / Keras to transpose the pre-trained weights into CHW format?
Here is the architecture that I use (the skipNet() is the head network and the mobilenet is the base, and it is connected in the create_model() function)
def skipNet(encoder_output, feed1, feed2, classes):
# random initializer and regularizer
stddev = 0.01
init = RandomNormal(stddev=stddev)
weight_decay = 1e-3
reg = l2(weight_decay)
score_feed2 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed2)
score_feed2_bn = BatchNormalization()(score_feed2)
score_feed1 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed1)
score_feed1_bn = BatchNormalization()(score_feed1)
upscore2 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(encoder_output)
height_pad1 = ZeroPadding2D(padding=((1,0),(0,0)))(upscore2)
upscore2_bn = BatchNormalization()(height_pad1)
fuse_feed1 = add([score_feed1_bn, upscore2_bn])
upscore4 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(fuse_feed1)
height_pad2 = ZeroPadding2D(padding=((0,1),(0,0)))(upscore4)
upscore4_bn = BatchNormalization()(height_pad2)
fuse_feed2 = add([score_feed2_bn, upscore4_bn])
upscore8 = Conv2DTranspose(kernel_size=(16, 16), filters=classes, strides=(8, 8),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg, activation="softmax")(fuse_feed2)
return upscore8
def create_model(classes):
base_model = tf.keras.applications.MobileNet(input_tensor=Input(shape=IMG_SHAPE),
include_top=False,
weights='imagenet')
conv4_2_output = base_model.get_layer(index=43).output
conv3_2_output = base_model.get_layer(index=30).output
conv_score_output = base_model.output
head_model = skipNet(conv_score_output, conv4_2_output, conv3_2_output, classes)
for layer in base_model.layers:
layer.trainable = False
model = Model(inputs=base_model.input, outputs=head_model)
return model
I am now working on building a stereo matching network using Keras with tensorflow as backend. The network has the following structure:
After training the whole network, I need to test it. However, training phase and testing phase are quite different. I have to split the model into two parts. The first part is CNN+Concatenate which only needs to be run once, while the fully-connected part (actually I modify it to be fully-conv form when testing) needs to be run for d times with slightly different input, where d varies from 100 to 228.
The first part network code:
# input image dimensions
img_rows, img_cols = X1.shape[0], X1.shape[1]
input_shape = (img_rows, img_cols, 1)
X1 = X1.reshape(1, img_rows, img_cols, 1)
X2 = X2.reshape(1, img_rows, img_cols, 1)
# number of conv filters to use
nb_filters = 112
# CNN kernel size
kernel_size = (3,3)
left_branch = Sequential()
left_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same', input_shape=input_shape))
left_branch.add(Activation('relu'))
left_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
left_branch.add(Activation('relu'))
left_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
left_branch.add(Activation('relu'))
left_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
left_branch.add(Activation('relu'))
right_branch = Sequential()
right_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same', input_shape=input_shape))
right_branch.add(Activation('relu'))
right_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
right_branch.add(Activation('relu'))
right_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
right_branch.add(Activation('relu'))
right_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
right_branch.add(Activation('relu'))
merged = Merge([left_branch, right_branch], mode='concat')
cnn = Sequential()
cnn.add(merged)
I load the weights gained from training phase into the first part of the network and try to get prediction of it.
def load_cnn_weights(filepath):
f = h5py.File(filepath, mode='r')
weights = []
for i in range(1, 9):
weights.append(f['model_weights/conv2d_{}/conv2d_{}/kernel:0'.format(i, i)][()])
weights.append(f['model_weights/conv2d_{}/conv2d_{}/bias:0'.format(i, i)][()])
f.close()
return weights
weights = load_cnn_weights("/home/users/shixin.li/segment/Lecun_stereo_rebuild/weights.hdf5")
cnn.set_weights(weights)
output_cnn = cnn.predict([X1, X2])
I already check that the weights are read successfully and can fit into the network according to calling get_weights() function. X1 and X2 are not zero, they are normalized gray scale image matrix. I even tried compile the network before predict. But the result output_cnn gives all zero.
I didn't see anyone have this problem and I am stuck for two days. The part which really confuses me is that the input and weights are all not zero, then why the result is zero? If you could help, I would really appreciate that!
You might want to try using tfdbg to find out exactly what the inputs to the op with all-zero outputs are, to try to understand what is going on.