Related
I'd like to use a LSTM network on Tensorflow to implement a difference equation. I searched on internet but I didn't find anything about this topic.
The equation is:
formula
in which b=[1, 2, 1] and a=[1, -1.6641, 0.8387].
My aim is to use a neural network to find the correlation between input and output. Due to that to find the output ad k-instant you have to know also the previous inputs and outputs, my idea is to implement a LSTM network (many to one structure).
If we suppose to have an input vector of 500 samples and to use a window size of 5, the input of LSTM network is a vector of shape (500,5,1) while the output is (500,1,1).
The IN%OUT of first iteration are:
[0; x(k-4), x(k-3), x(k-2), x(k-1), x(k); 1] -> [1; y(k); 1]
formula
in the second iteration:
[0; x(k-3), x(k-2), x(k-1), x(k), x(k+1); 1] -> [1; y(k+1); 1]
formula
So I used a LSMT network with stateful set to TRUE to allow the network to remember past states but it doesn't converge.
It seems to me that the idea is correct but I cannot see where I am going wrong. Could someone help me find the problem? I copy and paste the code below and the network is developed on Tensorflow.
# Difference equation
K = 0.0436
b = np.array([1,2,1])
a = np.array([1, -1.6641, 0.8387])
x = np.random.uniform(0, 1, 100)
y = K*(signal.lfilter(b,a,x))
# Generate Dataset
X_train = np.random.uniform(0, 1, 100)
y_train = K*(signal.lfilter(b,a,X_train))
X_val = np.ones(100)
y_val = K*(signal.lfilter(b,a,X_val))
X_test = np.random.uniform(0.5, 0.8, 100)
y_test = K*(signal.lfilter(b,a,X_test))
def get_x_split(data, windows_size):
""" Return sliding window dataset. """
x_temp = np.zeros([1,windows_size-1])
x = np.array([])
for i in range(0,len(data)):
x_temp = np.append(x_temp[-windows_size+1:], data[i]).T
x = np.append(x, x_temp, axis=0)
x = np.reshape(x, (int(len(x)/windows_size), windows_size))
return x
windows_size = 10
X_train = get_x_split(X_train, windows_size)
X_val = get_x_split(X_val, windows_size)
X_test = get_x_split(X_test, windows_size)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_val = np.reshape(X_val, (X_val.shape[0], X_val.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
# Model Definition
activation_function = 'tanh'
def build_model():
input_layer = Input(shape=(X_train.shape[1],1), batch_size=1)
HL_1 = LSTM(1, activation=activation_function, return_sequences=True, stateful = True)(input_layer)
HL_2 = LSTM(1, activation=activation_function, return_sequences=False, stateful = True)(HL_1)
output_layer = Dense(1, activation='relu',name='Output')(HL_2)
model = Model(inputs=input_layer, outputs=output_layer)
return model
model = build_model()
model.compile(optimizer=RMSprop(),
loss={'Output': 'mse'}, #mse
metrics={'Output': tf.keras.metrics.RootMeanSquaredError()})
# Training
history = model.fit(x=X_train,
y=y_train,
batch_size=1,
validation_data=(X_val, y_val),
epochs=5000,
verbose=1,
shuffle=False)
# Test
y_pred = model.predict(X_test)
pred_samples = 400
plt.figure(dpi=1200)
plt.plot(y_test[300:pred_samples,3,0], label='true', linewidth=0.8, alpha=0.5)
plt.plot(y_pred[300:pred_samples,3,0], label='pred')
plt.legend()
plt.grid()
plt.title("Test")
plt.show()
I've got a working CNN model that classifies images from a custom dataset that is loaded with a csv file. The dataset is split up into training, validation and test dataset after being shuffled. Now I want to expand the image input by four extra input classes containing info / metadata about the images.
I've already learnt that I should split up my cnn model into two branches, one for the images and one for the extra input. My question is, how must I modify my data input so that the model can correctly process both images and additional input?
I'm very new to creating neural networks in tensorflow. My entire code is basically from this website. However, none of the topics could solve the problem for my code.
This is my code: (additional metadata are called usages, completions, heights, constructions)
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers
from keras.callbacks import History
import matplotlib.pyplot as plt
import sklearn.metrics
from sklearn.metrics import confusion_matrix
import seaborn as sns
import io
# READ IMAGES, METADATA AND LABELS
df = pd.read_csv('dataset.csv')
df = df.sample(frac=1)
file_paths = df['file_name'].values
labels = df['label'].values
usages = df['usage'].values
completions = df['completion'].values
heights = df['height'].values
constructions = df['construction'].values
# SPLITTING THE DATASET INTO 80 % TRAINING DATA, 10 % VALIDATION DATA, 10 % TEST DATA
dataset_size = len(df.index)
train_size = int(0.8 * dataset_size)
val_size = int(0.1 * dataset_size)
test_size = int(0.1 * dataset_size)
img_height = 350
img_width = 350
batch_size = 16
autotune = tf.data.experimental.AUTOTUNE
# FUNCTION TO READ AND NORMALIZE THE IMAGES
def read_image(image_file, label, usg, com, hei, con):
image = tf.io.read_file(image_file)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, (img_width, img_height))
return tf.cast(image, tf.float32) / 255.0, label, \
tf.cast(usg, tf.float32), tf.cast(com, tf.float32), \
tf.cast(hei, tf.float32), tf.cast(con, tf.float32)
# FUNCTION FOR DATA AUGMENTATION
def augment(image, labeL, usg, com, hei, con):
if tf.random.uniform((), minval=0, maxval=1) < 0.1:
image = tf.tile(tf.image.rgb_to_grayscale(image), [1, 1, 3])
image = tf.image.random_brightness(image, max_delta=0.25)
image = tf.image.random_contrast(image, lower=0.75, upper=1.25)
image = tf.image.random_saturation(image, lower=0.75, upper=1.25)
image = tf.image.random_flip_left_right(image)
return image, label, usg, com, hei, con
# SETUP FOR TRAINING, VALIDATION & TEST DATASET
ds_train = ds_train.map(read_image, num_parallel_calls=autotune)
ds_train = ds_train.cache()
ds_train = ds_train.map(augment, num_parallel_calls=autotune)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(autotune)
ds_val = ds_val.map(read_image, num_parallel_calls=autotune)
ds_val = ds_val.batch(batch_size)
ds_val = ds_val.prefetch(autotune)
ds_test = ds_test.map(read_image, num_parallel_calls=autotune)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.prefetch(autotune)
## HOW TO SPLIT UP THE DATASET FOR THE MODEL FROM HERE? ##
# DEFINING FUNCTIONAL MODEL
input_img = keras.Input(shape=(img_width, img_height, 3))
input_dat = keras.Input(shape=(4,)) # how is this shape supposed to be?
x = layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.02), padding='same')(input_img)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.MaxPooling2D()(x)
x = layers.Conv2D(32, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.02), padding='same')(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.MaxPooling2D()(x)
x = layers.Conv2D(64, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.02), padding='same')(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.MaxPooling2D()(x)
x = layers.Conv2D(128, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.02), padding='same')(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.MaxPooling2D()(x)
out1 = layers.Flatten()(x)
out2 = layers.Dense(128, activation='relu')(input_dat)
merge = layers.concatenate([out1, out2])
x = layers.Dense(256, activation='relu')(merge)
x = layers.Dropout(0.35)(x)
output = layers.Dense(8, activation='sigmoid')(x)
model = keras.Model(inputs=[input_img, input_dat], outputs=output)
history = History()
no_overfit = keras.callbacks.EarlyStopping(monitor='val_loss', # stop training when overfitting occurs
min_delta=0.015, patience=1,
verbose=2, mode='auto')
# TRAINING STEP
model.compile(
optimizer=keras.optimizers.Adam(3e-5),
loss=[keras.losses.SparseCategoricalCrossentropy()],
metrics=["accuracy"])
model.fit(ds_train, epochs=30, callbacks=[no_overfit, history],
verbose=1, validation_data=ds_val)
So far I've only added the extra inputs to the dataset tensor and changed the model structure. How exactly do I split my dataset into input_img and input_dat so that each model branch will receive their proper input?
Also I have a custom test step in order to plot a confusion matrix. How is this supposed to be modified? Here the working code, for just the image input:
y_true = []
y_pred = []
for x, y in ds_test:
y_true.append(y)
predicts = model.predict(x) # compute model predictions for test step
y_pred.append(np.argmax(predicts, axis=-1))
true = tf.concat([item for item in y_true], axis=0)
pred = tf.concat([item for item in y_pred], axis=0)
cm = confusion_matrix(true, pred) # confusion matrix from seaborn
testacc = np.trace(cm) / float(np.sum(cm)) # calculating test accuracy
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
fig, ax = plt.subplots(figsize=(10, 10))
color = sns.light_palette("seagreen", as_cmap=False)
sns.heatmap(cm, annot=True, square=True, cmap=color, fmt=".3f",
linewidths=0.6, linecolor='k', cbar_kws={"shrink": 0.8})
plt.yticks(rotation=0)
plt.xlabel('\nPredicted Labels', fontsize=18)
plt.ylabel('True Labels\n', fontsize=18)
plt.title('Multiclass Model - Confusion Matrix (Test Step)\n', fontsize=24)
plt.text(10, 1.1, 'Accuracy = {:0.4f}'.format(testacc), fontsize=20)
ax.axhline(y=8, color='k', linewidth=1.5) # depending on amount of classes
ax.axvline(x=8, color='k', linewidth=1.5)
plt.show()
print('\naccuracy: {:0.4f}'.format(testacc))
Any help is greatly appreciated!!
I am trying to plot flower images with both the label and prediction that have a bounding box for each. I am using some lower layers of a pre-trained Xception model.
I have set the output layers to be 4 as there will be four coordinates for the bounding box:
loc_output = keras.layers.Dense(4)(avg)
For simplicity, I just set the four coordinates for the label as random numbers using tf.random.uniform.
How do I write a function using matplotlib that generates something like this:
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
dataset, info = tfds.load("tf_flowers", as_supervised=True, with_info=True)
test_set_raw, valid_set_raw, train_set_raw = tfds.load(
"tf_flowers",
split=["train[:10%]", "train[10%:25%]", "train[25%:]"],
as_supervised=True)
class_names = info.features["label"].names
n_classes = info.features["label"].num_classes
## Shuffle & Preprocess
def preprocess(image, label):
resized_image = tf.image.resize(image, [224, 224])
final_image = keras.applications.xception.preprocess_input(resized_image)
return final_image, label
batch_size = 32
train_set = train_set_raw.shuffle(1000).repeat()
train_set = train_set.map(preprocess).batch(batch_size).prefetch(1)
valid_set = valid_set_raw.map(preprocess).batch(batch_size).prefetch(1)
test_set = test_set_raw.map(preprocess).batch(batch_size).prefetch(1)
base_model = keras.applications.xception.Xception(weights="imagenet",
include_top=False) # Reuse lower layers of pretrained Xception model
avg = keras.layers.GlobalAveragePooling2D()(base_model.output)
class_output = keras.layers.Dense(n_classes, activation="softmax")(avg)
loc_output = keras.layers.Dense(4)(avg) # 4 coordinates for our bounding box
model = keras.models.Model(inputs=base_model.input, outputs=[class_output, loc_output])
# for layer in base_model.layers:
# layer.trainable = False
optimizer = keras.optimizers.SGD(lr=0.2, momentum=0.9, decay=0.01)
model.compile(loss=["sparse_categorical_crossentropy", "mse"],
loss_weights=[0.8, 0.2],
optimizer=optimizer, metrics=["accuracy"])
def add_random_bounding_boxes(images, labels):
fake_bboxes = tf.random.uniform([tf.shape(images)[0], 4])
return images, (labels, fake_bboxes)
fake_train_set = train_set.take(5).repeat(2).map(add_random_bounding_boxes)
model.fit(fake_train_set, steps_per_epoch=5, epochs=2)
Here is one way to achieve what you want. However, note that the dummy bounding box using tf.random.uniform makes less sense, by default the minval=0, maxval=1, so your dummy coordinates will give value within this range which is not appropriate for the bounding box and that's why in the following demonstration we will rescaling the coordinates with a scaler value (let's say with 150), and hopefully, you get the point.
After training, preparing the test set for inference.
import numpy as np
import matplotlib.pyplot as plt
print(class_names)
test_set = test_set_raw.map(preprocess).batch(1).prefetch(1)
test_set = test_set.map(add_random_bounding_boxes)
['dandelion', 'daisy', 'tulips', 'sunflowers', 'roses']
Display functionalities using matplotlib.
for i, (X,y) in enumerate(test_set.take(1)):
# true labels
true_label = y[0].numpy()
true_bboxs = y[1].numpy()
# model predicts
pred_label, pred_boxes = model.predict(X)
pred_label = np.argmax(pred_label, axis=-1)
# rescaling
dummy_true_boxes = (true_bboxs*150).astype(np.int32).clip(min=0, max=224)
dummy_predict_boxes = (pred_boxes*150).astype(np.int32).clip(min=0, max=224)
# Info printing
print('GT bbox scores: ', true_bboxs)
print('PRED bbox scores: ', pred_boxes)
print('After Rescaling and Clipped True BBOX: ', dummy_true_boxes)
print('After Rescaling and Clipped Pred BBOX: ', dummy_predict_boxes)
print('True label : {}, Predicted label {}'.format(class_names[int(true_label)],
class_names[int(pred_label)]))
plt.figure(figsize=(10, 10))
plt.axis("off")
plt.imshow(X[0])
ax = plt.gca()
for tbox, tcls, pbox, pcls in zip(dummy_true_boxes, true_label, dummy_predict_boxes, pred_label):
# gt and pred labels
ttext = "GT: {}".format(class_names[tcls])
ptext = "Pred: {}".format(class_names[pcls])
# gt and pred co-ordinates
tx1, ty1, x2, y2 = tbox # xmin, ymin, xmax, ymax
tw, th = x2 - tx1, y2 - ty1 # width (w) = xmax - xmin ; height (h) = ymax - ymin
px1, py1, x2, y2 = pbox # xmin, ymin, xmax, ymax
pw, ph = x2 - px1, y2 - py1 # width (w) = xmax - xmin ; height (h) = ymax - ymin
patch = plt.Rectangle(
[tx1, ty1], tw, th, fill=False, edgecolor=[0, 1, 0], linewidth=1
)
ax.add_patch(patch)
ax.text(
tx1,
ty1,
ttext,
bbox={"facecolor": [1, 1, 1], "alpha": 0.5},
clip_box=ax.clipbox,
clip_on=True,
)
patch = plt.Rectangle(
[px1, py1], pw, ph, fill=False, edgecolor=[1, 1, 1], linewidth=1
)
ax.add_patch(patch)
ax.text(
px1,
py1,
ptext,
bbox={"facecolor": [1, 1, 1], "alpha": 0.5},
clip_box=ax.clipbox,
clip_on=True,
)
plt.show()
GT bbox scores: [[0.75246954 0.36959255 0.18266702 0.7125735 ]]
PRED bbox scores: [[1.1755341 0.98745024 0.90438926 1.285707 ]]
After Rescaling and Clipped True BBOX: [[112 55 27 106]]
After Rescaling and Clipped Pred BBOX: [[176 148 135 192]]
True label : tulips, Predicted label sunflowers
I am building a preprocessing and data augmentation pipeline for my image segmentation dataset
There is a powerful API from keras to do this but I ran into the problem of reproducing same augmentation on image as well as segmentation mask (2nd image). Both images must undergo the exact same manipulations. Is this not supported yet?
https://www.tensorflow.org/tutorials/images/data_augmentation
Example / Pseudocode
data_augmentation = tf.keras.Sequential([
layers.experimental.preprocessing.RandomFlip(mode="horizontal_and_vertical", seed=SEED_VAL),
layers.experimental.preprocessing.RandomRotation(factor=0.4, fill_mode="constant", fill_value=0, seed=SEED_VAL),
layers.experimental.preprocessing.RandomZoom(height_factor=(-0.0,-0.2), fill_mode='constant', fill_value=0, seed=SEED_VAL)])
(train_ds, test_ds), info = tfds.load('somedataset', split=['train[:80%]', 'train[80%:]'], with_info=True)
This code does not work but illustrates how my dream api would work:
train_ds = train_ds.map(lambda datapoint: data_augmentation((datapoint['image'], datapoint['segmentation_mask']), training=True))
Alternative
The alternative is to code a custom load and manipulation / randomization method as is proposed in the image segmentation tutorial (https://www.tensorflow.org/tutorials/images/segmentation)
Any tips on state of the art data augmentation for this type of dataset is much appreciated :)
Here is my own implementation in case someone else wants to use tf built-ins (tf.image api) as of decembre 2020 :)
#tf.function
def load_image(datapoint, augment=True):
# resize image and mask
img_orig = input_image = tf.image.resize(datapoint['image'], (IMG_SIZE, IMG_SIZE))
mask_orig = input_mask = tf.image.resize(datapoint['segmentation_mask'], (IMG_SIZE, IMG_SIZE))
# rescale the image
if IMAGE_CHANNELS == 1:
input_image = tf.image.rgb_to_grayscale(input_image)
input_image = tf.cast(input_image, tf.float32) / 255.0
# augmentation
if augment:
# zoom in a bit
if tf.random.uniform(()) > 0.5:
# use original image to preserve high resolution
input_image = tf.image.central_crop(img_orig, 0.75)
input_mask = tf.image.central_crop(mask_orig, 0.75)
# resize
input_image = tf.image.resize(input_image, (IMG_SIZE, IMG_SIZE))
input_mask = tf.image.resize(input_mask, (IMG_SIZE, IMG_SIZE))
# random brightness adjustment illumination
input_image = tf.image.random_brightness(input_image, 0.3)
# random contrast adjustment
input_image = tf.image.random_contrast(input_image, 0.2, 0.5)
# flipping random horizontal or vertical
if tf.random.uniform(()) > 0.5:
input_image = tf.image.flip_left_right(input_image)
input_mask = tf.image.flip_left_right(input_mask)
if tf.random.uniform(()) > 0.5:
input_image = tf.image.flip_up_down(input_image)
input_mask = tf.image.flip_up_down(input_mask)
# rotation in 30° steps
rot_factor = tf.cast(tf.random.uniform(shape=[], maxval=12, dtype=tf.int32), tf.float32)
angle = np.pi/12*rot_factor
input_image = tfa.image.rotate(input_image, angle)
input_mask = tfa.image.rotate(input_mask, angle)
return input_image, input_mask
You can try with external libraries for extra image augmentations. These links may help for image augmentation along with segmentation mask,
albumentations
https://github.com/albumentations-team/albumentations
https://albumentations.ai/docs/getting_started/mask_augmentation/
imgaug
https://github.com/aleju/imgaug
https://nbviewer.jupyter.org/github/aleju/imgaug-doc/blob/master/notebooks/B05%20-%20Augment%20Segmentation%20Maps.ipynb
I solved this by using concat, to create one image and then using augmentation layers.
def augment_using_layers(images, mask, size=None):
if size is None:
h_s = mask.shape[0]
w_s = mask.shape[1]
else:
h_s = size[0]
w_s = size[1]
def aug(height=h_s, width=w_s):
flip = tf.keras.layers.RandomFlip(mode="horizontal")
rota = tf.keras.layers.RandomRotation(0.2, fill_mode='constant')
zoom = tf.keras.layers.RandomZoom(
height_factor=(-0.05, -0.15),
width_factor=(-0.05, -0.15)
)
trans = tf.keras.layers.RandomTranslation(height_factor=(-0.1, 0.1),
width_factor=(-0.1, 0.1),
fill_mode='constant')
crop = tf.keras.layers.RandomCrop(h_s, w_s)
layers = [flip, zoom, crop, trans, rota]
aug_model = tf.keras.Sequential(layers)
return aug_model
aug = aug()
mask = tf.stack([mask, mask, mask], -1)
mask = tf.cast(mask, 'float32')
images_mask = tf.concat([images, mask], -1)
images_mask = aug(images_mask)
image = images_mask[:,:,0:3]
mask = images_mask[:,:,4]
return image, tf.cast(mask, 'uint8')
Then you can map your dataset:
# create dataset
dataset = tf.data.Dataset.from_tensor_slices(data)
dataset = dataset.map(lambda x: load_dataset(x, (400, 400)))
# aug. dataset
dataset_aug = dataset.map(lambda x, y: augment_using_layers(x, y, (400, 400)))
Output:
Fixing a common seed will apply same augmentations to image and mask.
def Augment(tar_shape=(512,512), seed=37):
img = tf.keras.Input(shape=(None,None,3))
msk = tf.keras.Input(shape=(None,None,1))
i = tf.keras.layers.RandomFlip(seed=seed)(img)
m = tf.keras.layers.RandomFlip(seed=seed)(msk)
i = tf.keras.layers.RandomTranslation((-0.75, 0.75),(-0.75, 0.75),seed=seed)(i)
m = tf.keras.layers.RandomTranslation((-0.75, 0.75),(-0.75, 0.75),seed=seed)(m)
i = tf.keras.layers.RandomRotation((-0.35, 0.35),seed=seed)(i)
m = tf.keras.layers.RandomRotation((-0.35, 0.35),seed=seed)(m)
i = tf.keras.layers.RandomZoom((-0.1, 0.05),(-0.1, 0.05),seed=seed)(i)
m = tf.keras.layers.RandomZoom((-0.1, 0.05),(-0.1, 0.05),seed=seed)(m)
i = tf.keras.layers.RandomCrop(tar_shape[0],tar_shape[1],seed=seed)(i)
m = tf.keras.layers.RandomCrop(tar_shape[0],tar_shape[1],seed=seed)(m)
return tf.keras.Model(inputs=(img,msk), outputs=(i,m))
Augment = Augment()
ds_train = ds_train.map(lambda img,msk: Augment((img,msk)), num_parallel_calls=AUTOTUNE)
Imp:
The above functions can change dtype of image and mask from int32/uint8 to float32.
And also the output-mask can contain values other than 0/1 (like 0.9987,...). This was due to interpolation. To overcome this you can change interpolation from bilinear to nearest.
This is the method described in the official docs,
[Image Segmentation Official Tutorials][1]
class Augment(tf.keras.layers.Layer):
def __init__(self, seed=42):
super().__init__()
self.augment_inputs = tf.keras.Sequential([
layers.experimental.preprocessing.RandomFlip(mode="horizontal_and_vertical", seed=SEED_VAL),
layers.experimental.preprocessing.RandomRotation(factor=0.4, fill_mode="constant", fill_value=0, seed=SEED_VAL),
layers.experimental.preprocessing.RandomZoom(height_factor=(-0.0,-0.2), fill_mode='constant', fill_value=0, seed=SEED_VAL)])
self.augment_labels = tf.keras.Sequential([
layers.experimental.preprocessing.RandomFlip(mode="horizontal_and_vertical", seed=SEED_VAL),
layers.experimental.preprocessing.RandomRotation(factor=0.4, fill_mode="constant", fill_value=0, seed=SEED_VAL),
layers.experimental.preprocessing.RandomZoom(height_factor=(-0.0,-0.2), fill_mode='constant', fill_value=0, seed=SEED_VAL)])
def call(self, inputs, labels):
inputs = self.augment_inputs(inputs)
labels = self.augment_labels(labels)
return inputs, labels
After this, you can call the Augment() func
train_batches = (
train_images
.cache()
.shuffle(BUFFER_SIZE)
.batch(BATCH_SIZE)
.repeat()
.map(Augment())
.prefetch(buffer_size=tf.data.AUTOTUNE))
This will make sure that your inputs and masks are equally randomly augmented.
[1]: https://www.tensorflow.org/tutorials/images/segmentation
Posting here to check if there's anything wrong with my implementation of a simple semantic segmentation model in TensorFlow. This code represents a sanity check I'm doing with just a single image from the database, for which I'm trying to overfit the model.
It is a binary classification problem with each image pixel mapped to [0,1] in the ground truth label.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
img = plt.imread('image.png') #Image of size [750,750,3]
img = plt.imread('map.png') # Ground Truth of size [750, 750]
img = np.expand_dims(img, 0)
lab = np.expand_dims(lab, 0)
w1 = tf.Variable(tf.constant(0.001, shape=[3,3,3,32]))
b1 = tf.Variable(tf.constant(0.0, shape=[32]))
w2 = tf.Variable(tf.constant(0.001, shape=[3,3,32,2]))
b2 = tf.Variable(tf.constant(0.0, shape=[2]))
mul = tf.nn.conv2d(img, w1, strides=[1,1,1,1], padding='SAME')
bias_add = tf.add(mul, b1)
conv1 = tf.nn.relu(bias_add)
mul2 = tf.nn.conv2d(conv1, w2, strides=[1,1,1,1], padding='SAME')
bias_add2 = tf.add(mul2, b2)
conv2 = tf.nn.relu(bias_add2)
sess = tf.InteractiveSession()
lab = lab.astype('int32')
conv2_out = tf.reshape(conv2, [-1, 2])
lab = np.reshape(lab, [-1])
prediction = tf.nn.softmax(pred) # I use this to visualize prediction of the model, and calculate accuracy
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(conv2_out, lab))
optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)
correct_pred = tf.equal(tf.argmax(prediction, 1), lab)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.int32)
tf.initialize_all_variables().run()
step = 1
iter = 5
while step < iter:
sess.run(optimizer, feed_dict={x: img, y: lab})
loss_val,acc = sess.run([loss,accuracy], feed_dict={x: img, y: lab})
print ("Iter:"+ str(step) +" Loss : " + "{:.6f}".format(loss_val)#+ " Accuracy : " + "{:.6f}".format(acc))
step += 1
print ("optimization finished!")
prediction_logits = prediction.eval()
weights = w1.eval() # first layer learned weights
prediction_logits = np.reshape(prediction_logits, [750,750,2])
plt.figure() # Plotting original image with predicted labels
plt.imshow(img[0,:,:,:])
plt.imshow(prediction_logits[:,:,0], cmap=plt.cm.binary)
plt.show()
plt.figure() # Plotting first layer weights
for i in range(32):
plt.subplot(8,4,i+1)
plt.imshow(weights[:,:,:,i])
plt.show()
When I run this (as an interactive session), just to train the model to overfit on this single image, the loss minimizes, but my accuracy doesn't seem to change. I'm not quite sure I understand how the tf.argmax function works or if I've implemented it correctly - and the accuracy sticks to a single value no matter how many iterations.
Thoughts? Also, am I going about plotting the figure and predicted label correctly, or are there any errors here? (any other errors as well - or best practices I'm not following, do point them out)
Additionally, what is the recommended way to implement a regularization over the weights? I found tf.contrib.layers.l2_regularizer to be a feasible option - how do I include it in this scenario, though? A simple sum with the loss function?