I have two models that draw their input values from the same training dataset. I am trying to train the two models alternately with two optimizers that have different learning rates. Hence, while training one model, I have to freeze the weights of the other model and vice versa. However, the approach that I am using is taking too long to train and even gives an OOM error. However, when I simply train the models together, no such problem occurs.
The code snippet and the image of a sample model are attached below. However, the actual models have numerous layers and high dimensional input.
def convA(x):
conv1 = keras.layers.Conv2D(64, (3,3), strides=(1, 1), padding='valid', activation='relu', name = 'conv21')(x)
conv2 = keras.layers.Conv2D(16, (3,3), strides=(1, 1), padding='valid',activation='relu', name = 'conv22')(conv1)
return conv2
def convB(x):
conv1 = keras.layers.Conv2D(64, (3,3), strides=(1, 1), padding='valid', activation='relu', name = 'conv2a')(x)
conv2 = keras.layers.Conv2D(16, (3,3), strides=(1, 1), padding='valid',activation='relu', name = 'conv2b')(conv1)
return conv2
x = Input(shape=(11,11,32), name='input1')
convP = convA(x)
convQ = convB(x)
model1 = Model(x,convP)
model2 = Model(x,convQ)
multiply_layer = keras.layers.Multiply()([model1(x), model2(x)])
conv1_reshape = keras.layers.Reshape([7*7*16],name = 'fc_reshape')(multiply_layer)
fc = keras.layers.Dense(15, activation='softmax', name = 'fc1')(conv1_reshape)
model_main = Model(x,fc)
optim1 = keras.optimizers.SGD(0.0009, momentum=0.01, nesterov=True)
optim2 = keras.optimizers.SGD(0.00009, momentum=0.01, nesterov=True)
for epoch in range(250):
for batch in range(100):
x_batch = x_train[batch*16:(batch+1)*16,:,:,:]
y_batch = y_train[batch*16:(batch+1)*16,:]
model1.trainable = False
model2.trainable = True
model_main.compile(loss='categorical_crossentropy', optimizer=optim1, metrics=['accuracy'])
model_main.train_on_batch(x_batch, y_batch)
model1.trainable = True
model2.trainable = False
model_main.compile(loss='categorical_crossentropy', optimizer=optim2, metrics=['accuracy'])
model_main.train_on_batch(x_batch, y_batch)
Related
I am very much a beginner and currently trying to get started with CNNs. I wanted to try out lane detection.
Using the tusimple dataset, I have images of roads and create masks with the lanes. It looks like this:
I was following this blog where something similar is done. However, my results look nothing like in the blog. For simplicity reasons, I only use one image as dataset. This way, the network should very easily be able to detect the lane in this one image. However, this cnn basically just adds a red filter on the input image. The output looks somewhat like this:
Maybe you can point me into the right direction / tell me what I am doing wrong.
I posted the whole notebook here: https://colab.research.google.com/drive/1igOulIU-1HA-Ecf4diQTLXM-mrnAeFXz?usp=sharing
Or the most relevant code included:
def convolutional_block(inputs=None, n_filters=32, dropout_prob=0, max_pooling=True):
conv = Conv2D(n_filters,
kernel_size = 3,
activation='relu',
padding='same',
kernel_initializer=tf.keras.initializers.HeNormal())(inputs)
conv = Conv2D(n_filters,
kernel_size = 3,
activation='relu',
padding='same',
kernel_initializer=tf.keras.initializers.HeNormal())(conv)
if dropout_prob > 0:
conv = Dropout(dropout_prob)(conv)
if max_pooling:
next_layer = MaxPooling2D(pool_size=(2,2))(conv)
else:
next_layer = conv
#conv = BatchNormalization()(conv)
skip_connection = conv
return next_layer, skip_connection
def upsampling_block(expansive_input, contractive_input, n_filters=32):
up = Conv2DTranspose(
n_filters,
kernel_size = 3,
strides=(2,2),
padding='same')(expansive_input)
merge = concatenate([up, contractive_input], axis=3)
conv = Conv2D(n_filters,
kernel_size = 3,
activation='relu',
padding='same',
kernel_initializer=tf.keras.initializers.HeNormal())(merge)
conv = Conv2D(n_filters,
kernel_size = 3,
activation='relu',
padding='same',
kernel_initializer=tf.keras.initializers.HeNormal())(conv)
return conv
def unet_model(input_size=(720, 1280,3), n_filters=32, n_classes=3):
inputs = Input(input_size)
#contracting path
cblock1 = convolutional_block(inputs, n_filters)
cblock2 = convolutional_block(cblock1[0], 2*n_filters)
cblock3 = convolutional_block(cblock2[0], 4*n_filters)
cblock4 = convolutional_block(cblock3[0], 8*n_filters, dropout_prob=0.2)
cblock5 = convolutional_block(cblock4[0],16*n_filters, dropout_prob=0.2, max_pooling=None)
#expanding path
ublock6 = upsampling_block(cblock5[0], cblock4[1], 8 * n_filters)
ublock7 = upsampling_block(ublock6, cblock3[1], n_filters*4)
ublock8 = upsampling_block(ublock7,cblock2[1] , n_filters*2)
ublock9 = upsampling_block(ublock8,cblock1[1], n_filters)
conv9 = Conv2D(n_classes,
1,
activation='relu',
padding='same',
kernel_initializer='he_normal')(ublock9)
conv10 = Activation('softmax')(conv9)
model = tf.keras.Model(inputs=inputs, outputs=conv10)
return model
I'm Using tensorflow and keras to predict handwrting digits. For training I'm using nmist dataset.
the accuracy is about 98.8% after training. but sometimes in test its confuse between 4 and 9 , 7 and 3, i'm alerady optimize the image input with opencv, like remove noise, rescale, threshold etc.
What should i do next to improved this prdiction accuracy?
My plan is add more sample, and resize the sample image from 28x28 to 56x56.
Will this affect accuracy?
This my model for training:
epoc=15, batch size=64
def build_model():
model = Sequential()
# add Convolutional layers
model.add(Conv2D(filters=32, kernel_size=(3,3), activation='relu', padding='same', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
# Densely connected layers
model.add(Dense(128, activation='relu'))
# output layer
model.add(Dense(10, activation='softmax'))
# compile with adam optimizer & categorical_crossentropy loss function
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
return model
You can try to add regularization:
def conv2d_bn(x,
units,
kernel_size=(3, 3),
activation='relu',
dropout=.5):
y = Dropout(x)
y = Conv2D(units, kernel_size=kernel_size, use_bias=False)(y)
y = BatchNormalization(y)
y = Activation(activation)(y)
return y
def build_model(..., dropout=.5):
x = Input(shape=[...])
y = conv2d_bn(x, 32)
y = MaxPooling2D(y)
...
y = Dropout(dropout)(y)
y = Dense(10, activation='softmax')
model = Model(x, y)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
You can tweak the class weights to force the model to pay more attention to classes 3, 4, 7 and 9 during training:
model.fit(..., class_weights={0: 1, 1: 1, 2:1, 3:2, 4:2, 5:1, 6:1, 7:2, 8:1, 9:2})
If you have some time to burn, you can also try to grid or random-search the models hyperparameters. Something in the lines:
def build(conv_layers, dense_layers, dense_units, activation, dropout):
y = x = Input(shape=[...])
kernels = 32
kernel_size = (2, 2)
for i in range(conv_layers):
y = conv2d_bn(y, kernel_size, kernels, activation, dropout)
if i % 2 == 0: # or 3 or 4.
y = MaxPooling2D(y)
kernels *= 2
kernel_size = tuple(k+1 for k in kernel_size)
y = GlobalAveragePooling2D()(y)
for i in range(dense_layers):
y = Dropout(dropout)(y)
y = Dense(dense_units)(y)
y = Dense(10, activation='softmax')(y)
model = KerasClassifier(build_model,
epochs=epochs,
validation_split=validation_split,
verbose=0,
...)
params = dict(conv_layers=[2, 3, 4],
dense_layers=[0, 1],
activation=['relu', 'selu'],
dropout=[.2, .3, .5],
callbacks=[callbacks.EarlyStopping(patience=10,
restore_best_weights=True)])
grid = GridSearchCV(model, params,
scoring='balanced_accuracy_score',
verbose=2,
n_jobs=1)
Now, combining hyperparams searching with the NumpyArrayIterator is a little tricky, because the latter assumes we have all training samples (and targets) at hand before the training steps. It's still doable, though:
g = ImageDataGenerator(...)
cv = StratifiedKFold(n_splits=3)
results = dict(params=[], valid_score=[])
for params in ParameterGrid(params):
fold_scores = []
for t, v in cv.split(train_data, train_labels):
train = g.flow(train_data[t], train_labels[t], subset='training')
nn_valid = g.flow(train_data[t], train_labels[t], subset='validation')
fold_valid = g.flow(train_data[v], train_labels[v])
nn = build_model(**params)
nn.fit_generator(train, validation_data=nn_valid, ...)
probabilities = nn.predict_generator(fold_valid, steps=...)
p = np.argmax(probabilities, axis=1)
fold_scores += [metrics.accuracy_score(valid.classes_, p)]
results['params'] += [params]
results['valid_score'] += [fold_scores]
best_ix = np.argmax(np.mean(results['valid_score'], axis=1))
best_params = results['params'][best_ix]
nn = build_model(**best_params)
nn.fit_generator(...)
I use keras with tf as the backend.
The goal of the simulation is attempting to use geo-spatial time series dataset to build a classifier. The target Y is labeled on -1, 0, 1 and 2, where -1 indicates the measured data at that grid point, 0 is meaning the data at good quality, 1 is middle quality and 2 is the worst.
Right now, i have two inputs. I have some atmospheric surface variables, such as wind, wind speed, and rain as one input. And , oceanic surface variables, such as sea surface temperature, and sea surface salinity as second input. The dimensions of all the datasets should be in, for example, (n_samples, n_timesteps, n_variables, n_xpoints: longitude, n_ypoints: latitude). The target dataset is in 3D dimensions like this: (n_samples, n_xpoints: longitude, n_ypoints: latitude).
In addition, all of the input variables are normalized by their value range. For example, the sea surface current velocity is normalized in the rage of (-1,1) from (-2, 2) [m/s], and the surface wind speed is normalized in the rage of (-1,1) from (-20,20) [m/s].
The model configuration is designed as described below.
def cnn():
model = Sequential()
model.add( Conv2D(64, (3,3), activation='relu',
data_format='channels_first', kernel_initializer='he_normal',
name='conv1') )
model.add( MaxPooling2D(pool_size=(2, 2), strides = (2,2)))
model.add( BatchNormalization() )
model.add( Conv2D(32, (3,3), activation='relu',
kernel_initializer='he_normal', data_format='channels_first',
name='conv2') )
model.add( MaxPooling2D(pool_size=(2, 2), strides = (2,2)))
model.add( Dropout(0.2) )
model.add( BatchNormalization() )
model.add( Activation('relu') )
model.add( MaxPooling2D(pool_size=(2, 2), strides = (2,2)))
model.add( Flatten() )
model.add( Dense(128,activation='relu') )
return model
def cnn2lstm(Input_shape, premo, name):
branch_in = Input(shape=Input_shape, dtype='float32')
model = TimeDistributed(premo)(branch_in)
model = LSTM(256, return_sequences=True, name=name+'_lstm1')(model)
model = TimeDistributed(Dense(4096, activation='relu'))(model)
model = TimeDistributed(Dropout(0.3))(model)
model = LSTM(256, return_sequences = True, name=name+'_lstm2')(model)
model = Dense(101, activation='sigmoid')(model)
model = Dropout(0.3)(model)
return branch_in, model
atm_in, atm = cnn2lstm(Train_atm.shape[1:], cnn(),'atm')
ocn_in, ocn = cnn2lstm(Train_ocn.shape[1:], cnn(),'ocn')
#--- two inputs into one output
x = keras.layers.concatenate([atm,ocn],axis=1)
x = LSTM(150,return_sequences=True)(x)
x = Dropout(0.2)(x)
x = LSTM(200,return_sequences=True)(x)
x = Dropout(0.2)(x)
x = LSTM(500)(x)
x = Dense(1001,activation='relu')(x)
x = Dense(2001,activation='relu')(x)
x = Dense(2501,activation='tanh')(x)
x = Dense(2701,activation='relu')(x)
x = Dense(3355,activation='softmax')(x)
x = Reshape((61,55),input_shape=(3355,))(x)
model2 = Model(inputs=[atm_in, ocn_in, bio_in], outputs=x)
plot_model(model2, show_shapes = True, to_file='model_way4_2.png')
model2.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
filepath='ways4_model02_best.hdf5'
checkpoint = ModelCheckpoint(filepath,monitor='val_acc', verbose=1,save_best_only=True,mode='max')
callbacks_list = [checkpoint]
hist = model2.fit([Train_atm, Train_ocn, Train_bio], Train_Y,
epochs=150, batch_size=3, validation_split=0.1,
shuffle=True, callbacks=callbacks_list, verbose=0)
scores = model2.evaluate([Train_atm, Train_ocn, Train_bio], Train_Y)
print("MODEL 2 %s: %.2f%%" % (model2.metrics_names[1], scores[1]*100))
The evaluation scores here is mostly like 83% or higher. But the value of the output from model2.predict doesn't give me the valid range like my target dataset. In contrary, the model output give me the value from 0 to 1 (0,1) with a similar pattern as the target dataset shows.
Could anyone tell any big issue I have in my DL algorithm?
I want to classification for pictures with different input sizes. I would like to use the following paper ideas.
'Fully Convolutional Networks for Semantic Segmentation'
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf
I did change the dense layer to conv2D layer like this.
def FullyCNN(input_shape, n_classes):
inputs = Input(shape=(None, None, 1))
first_layer = Conv2D(filters=16, kernel_size=(12,16), strides=1, activation='relu', kernel_initializer='he_normal', name='conv1')(inputs)
first_layer = BatchNormalization()(first_layer)
first_layer = MaxPooling2D(pool_size=2)(first_layer)
second_layer = Conv2D(filters=24, kernel_size=(8,12), strides=1, activation='relu', kernel_initializer='he_normal', name='conv2')(first_layer)
second_layer = BatchNormalization()(second_layer)
second_layer = MaxPooling2D(pool_size=2)(second_layer)
third_layer = Conv2D(filters=32, kernel_size=(5,7), strides=1, activation='relu', kernel_initializer='he_normal', name='conv3')(first_layer)
third_layer = BatchNormalization()(third_layer)
third_layer = MaxPooling2D(pool_size=2)(third_layer)
fully_layer = Conv2D(64, kernel_size=8, activation='relu', kernel_initializer='he_normal')(third_layer)
fully_layer = BatchNormalization()(fully_layer)
fully_layer = Dropout(0.5)(fully_layer)
fully_layer = Conv2D(n_classes, kernel_size=1)(fully_layer)
output = Conv2DTranspose(n_classes, kernel_size=1, activation='softmax')(fully_layer)
model = Model(inputs=inputs, outputs=output)
return model
and I made generator for using fit_generator().
def data_generator(x_train, y_train):
while True:
index = np.asscalar(np.random.choice(len(x_train),1))
feature = np.expand_dims(x_train[index],-1)
feature = np.resize(feature,(-1,feature.shape))
feature = np.expand_dims(feature,0) # make (1,input_height,input_width,1)
label = y_train[index]
yield (feature,label)
and These are images about my data.
However, there is some problems about dimension.
Since the output layer must have 4 dimensions unlike the original CNN model, dimensions do not fit in the label.
Model summary:
Original CNN model summary:
How can I handle this problem? I tried to change dimension about label by expanding the dimension.
label = np.expand_dims(label,0)
label = np.expand_dims(label,0)
label = np.expand_dims(label,0)
I think there is a better way and I wonderd that is it necessary to have conv2DTranspose? And should batch size be 1?
In CNTK - how can I use several filter sizes on the same layer (e.g. filter sizes 2,3,4,5)?
Following the work done here (link to code in github below(1)), I want to take text, use an embedding layer, apply four different sizes of filters (2,3,4,5), concatenate the results and feed it to a fully connected layer.
Network architecture figure
Keras sample code:
main_input = Input(shape=(100,)
embedding = Embedding(output_dim=32, input_dim=100, input_length=100, dropout=0)(main_input)
conv1 = getconvmodel(2,256)(embedding)
conv2 = getconvmodel(3,256)(embedding)
conv3 = getconvmodel(4,256)(embedding)
conv4 = getconvmodel(5,256)(embedding)
merged = merge([conv1,conv2,conv3,conv4],mode="concat")
def getconvmodel(filter_length,nb_filter):
model = Sequential()
model.add(Convolution1D(nb_filter=nb_filter,
`enter code here`input_shape=(100,32),
filter_length=filter_length,
border_mode='same',
activation='relu',
subsample_length=1))
model.add(Lambda(sum_1d, output_shape=(nb_filter,)))
#model.add(BatchNormalization(mode=0))
model.add(Dropout(0.5))
return model
(1): /joshsaxe/eXposeDeepNeuralNetwork/blob/master/src/modeling/models.py
You can do something like this:
import cntk as C
import cntk.layers as cl
def getconvmodel(filter_length,nb_filter):
#Function
def model(x):
f = cl.Convolution(filter_length, nb_filter, activation=C.relu))(x)
f = C.reduce_sum(f, axis=0)
f = cl.Dropout(0.5) (f)
return model
main_input = C.input_variable(100)
embedding = cl.Embedding(32)(main_input)
conv1 = getconvmodel(2,256)(embedding)
conv2 = getconvmodel(3,256)(embedding)
conv3 = getconvmodel(4,256)(embedding)
conv4 = getconvmodel(5,256)(embedding)
merged = C.splice([conv1,conv2,conv3,conv4])
Or with Sequential() and a lambda:
def getconvmodel(filter_length,nb_filter):
return Sequential([
cl.Convolution(filter_length, nb_filter, activation=C.relu)),
lambda f: C.reduce_sum(f, axis=0),
cl.Dropout()
])