Loss stuck from first epoch when using with float16 in 3D (Keras) - tensorflow

I've got a 3D segmentation task that I'd like to solve with a CNN in keras, using 64x64x64 patches. I know that float16 is more than good enough for the kind of data I've got in input. It's also more than good enough for the output, it's a segmentation, it could even be a uint8 (or a bool).
If I work with float32 my network works perfectly (I mean... as much as you can expect before optimizing it). I've tried switching to float16 for speed and... The loss gets stuck from iteration 0, the weights don't get updated etc... What am I doing wrong?
I've tried the following: changing loss function, changing optimizer, changing learning rate by many orders of magnitude.
I've been able to reproduce the issue with a minimal example: a network with 2 layers, no downsampling, that has to learn to predict segmented masks from the masks themselves. Works fine in float32 not in float16. I'm working in Colab with default settings, therefore keras v2.2.5 and tensorflow 1.15.0
EDIT: more updates The issue is most likely one of numerical precision in computing the loss function when using float16 and a 64x64x64 patch. When using 32x32x32 ones it does work.
from keras import backend as K
from keras.engine import Input, Model
from keras.layers import Conv3D, Activation
chosenDataType = 'float16' #swap between float16 and float32 to test
#set up a random, minimal 3D CNN
K.set_image_data_format("channels_last")
K.set_floatx(chosenDataType)
inputL = Input([64,64,64,1],dtype=chosenDataType)
l1 = Conv3D(32,[3,3,3],padding='same')(inputL)
l2 = Activation('relu') (l1)
l3 = Conv3D(32,[3,3,3],padding='same')(l2)
l4 = Activation('relu') (l3)
l5 = Conv3D(1,[1,1,1],padding='same')(l4)
model = Model(inputs=inputL,output=l5)
model.compile(optimizer='sgd',loss='mse')
#create some fake black images with white spots
import numpy as np
import random
X_train = np.zeros([30,64,64,64,1],dtype=chosenDataType)
for imIdx in range(30):
centPoin = random.randrange(20,44)
X_train[imIdx,centPoin-10:centPoin+10,centPoin-10:centPoin+10,centPoin-10:centPoin+10,0]=1
#ask the network to fit the images themselves.
Y_train = X_train.copy()
model.fit(X_train,Y_train,batch_size=30,epochs=100)

Related

Why can't I classify my data perfectly on this simple problem using a NN?

I have a set of observations made of 10 features, each of these features being a real number in the interval (0,2). Say I wanted to train a simple neural network to classify whether the average of those features is above or below 1.0.
Unless I'm missing something, it should be enough with a two-layer network with one neuron on each layer. The activation functions would be a linear one (i.e. no activation function) on the first layer and a sigmoid on the output layer. An example of a NN with this architecture that would work is one that calculates the average on the first layer (i.e. all weights = 0.1 and bias=0) and asseses whether that is above or below 1.0 in the second layer (i.e. weight = 1.0 and bias = -1.0).
When I implement this using TensorFlow (see code below), I obviously get a very high accuracy quite quickly, but never get to 100% accuracy... I would like some help to understand conceptually why this is the case. I don't see why the backppropagation algorithm does not reach a set of optimal weights (may be this is related with the loss function I'm using, which has local minmums?). Also I would like to know whether a 100% accuracy is achievable if I use different activations and/or loss function.
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
X = [np.random.random(10)*2.0 for _ in range(10000)]
X = np.array(X)
y = X.mean(axis=1) >= 1.0
y = y.astype('int')
train_ratio = 0.8
train_len = int(X.shape[0]*0.8)
X_train, X_test = X[:train_len,:], X[train_len:,:]
y_train, y_test = y[:train_len], y[train_len:]
def create_classifier(lr = 0.001):
classifier = tf.keras.Sequential()
classifier.add(tf.keras.layers.Dense(units=1))
classifier.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))#, input_shape=input_shape))
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
metrics=[tf.keras.metrics.BinaryAccuracy()],
classifier.compile(optimizer=optimizer, loss=tf.keras.losses.BinaryCrossentropy(from_logits=False), metrics=metrics)
return classifier
classifier = create_classifier(lr = 0.1)
history = classifier.fit(X_train, y_train, batch_size=1000, validation_split=0.1, epochs=2000)
Ignoring the fact that a neural network is an odd approach for this problem, and answering your specific question - it looks like your learning rate might be too high which could explain the fluctuations around the optimal point.

Learning a Categorical Variable with TensorFlow Probability

I would like to use TFP to write a neural network where the output are the probabilities of a categorical variable with 3 classes, and train it using the negative log-likelihood.
As I'm moving my first steps with TF and TFP, I started with a toy model where the input layer has only 1 unit receiving a null input, and the output layer has 3 units with softmax activation function. The idea is that the biases should learn (up to an additive constant) the log of the probabilities.
Here below is my code, true_p are the true parameters I use to generate the data and I would like to learn, while learned_p is what I get from the NN.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from functions import nll
from tensorflow.keras.optimizers import SGD
import tensorflow.keras.layers as layers
import tensorflow_probability as tfp
tfd = tfp.distributions
# params
true_p = np.array([0.1, 0.7, 0.2])
n_train = 1000
# training data
x_train = np.array(np.zeros(n_train)).reshape((n_train,))
y_train = np.array(np.random.choice(len(true_p), size=n_train, p=true_p)).reshape((n_train,))
# model
input_layer = layers.Input(shape=(1,))
p_layer = layers.Dense(len(true_p), activation=tf.nn.softmax)(input_layer)
p_y = tfp.layers.DistributionLambda(tfd.Categorical)(p_layer)
model_p = keras.models.Model(inputs=input_layer, outputs=p_y)
model_p.compile(SGD(), loss=nll)
# training
hist_p = model_p.fit(x=x_train, y=y_train, batch_size=100, epochs=3000, verbose=0)
# check result
learned_p = np.round(model_p.layers[1].call(tf.constant([0], shape=(1, 1))).numpy(), 3)
learned_p
With this setup, I get the result:
>>> learned_p
array([[0.005, 0.989, 0.006]], dtype=float32)
I over-estimate the second category, and can't really distinguish between the first and the third one. What's worst, if I plot the probabilities at the end of each epoch, it looks like they are converging monotonically to the vector [0,1,0], which doesn't make sense (it seems to me the gradient should push in the opposite direction once I start to over-estimate).
I really can't figure out what's going on here, but have the feeling I'm doing something plain wrong. Any idea? Thank you for your help!
For the record, I also tried using other optimizers like Adam or Adagrad playing with the hyper-params, but with no luck.
I'm using Python 3.7.9, TensorFlow 2.3.1 and TensorFlow probability 0.11.1
I believe the default argument to Categorical is not the vector of probabilities, but the vector of logits (values you'd take softmax of to get probabilities). This is to help maintain precision in internal Categorical computations like log_prob. I think you can simply eliminate the softmax activation function and it should work. Please update if it doesn't!
EDIT: alternatively you can replace the tfd.Categorical with
lambda p: tfd.Categorical(probs=p)
but you'll lose the aforementioned precision gains. Just wanted to clarify that passing probs is an option, just not the default.

Loaded keras model fails to continue training, dimensions mismatch

I'm using tensorflow with keras to train to a char-RNN using google colabs. I train my model for 10 epochs and save it, using 'model.save()' as shown in the documentation for saving models. Immediately after, I load it again just to check, I try to call model.fit() on the loaded model and I get a "Dimensions must be equal" error using the exact same training set. The training data is in a tensorflow dataset organised in batches as shown in the documentation for tf datasets. Here is a minimal working example:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
X = np.random.randint(0,50,(10000))
seq_len = 150
batch_size = 20
dataset = tf.data.Dataset.from_tensor_slices(X)
dataset = dataset.batch(seq_len+1,drop_remainder=True)
dataset = dataset.map(lambda x: (x[:-1],x[1:]))
dataset = dataset.shuffle(20).batch(batch_size,drop_remainder=True)
def make_model(vocabulary_size,embedding_dimension,rnn_units,batch_size,stateful):
model = Sequential()
model.add(Embedding(vocabulary_size,embedding_dimension,
batch_input_shape=[batch_size,None]))
model.add(LSTM(rnn_units,return_sequences=True,stateful=stateful))
model.add(Dense(vocabulary_size))
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer='adam',metrics=['accuracy'])
model.summary()
return model
vocab_size = 51
emb_dim = 20
rnn_units = 10
model = make_model(vocab_size,emb_dim,rnn_units,batch_size,False)
model.fit(dataset,epochs=10)
model.save('/content/test_model')
model2 = tf.keras.models.load_model('/content/test_model')
model2.fit(dataset,epochs=10)
The first training line, "model.fit()", runs fine but the last line returns the error:
ValueError: Dimensions must be equal, but are 20 and 150 for '{{node
Equal}} = Equal[T=DT_INT64, incompatible_shape_error=true](ArgMax,
ArgMax_1)' with input shapes: [20], [20,150].
I want to be able to resume training later, as my real dataset is much larger. Therefore, saving only the weights is not an ideal option.
Any advice?
Thanks!
If you have saved checkpoints than, from those checkpoints, you can resume with reduced dataset. Your neural network / layers and dimensions should be same.
The problem is the 'accuracy' metric. For some reason, there is some mishandling of dimensions on the predictions when the model is loaded with this metric, as I found in this thread (see last comment). Running model.compile() on the loaded model with the same metric allows training to continue. However, it shouldn't be necessary to compile the model again. Moreover, this means that the optimiser state is lost, as explained in this answer, thus, this is not very useful for resuming training.
On the other hand, using 'sparse_categorical_accuracy' from the start works just fine. I am able to load the model and continue training without having to recompile. In hindsight, this choice is more appropriate given that the outputs of my last layer are logits over the distribution of characters. Thus, this is not a binary but a multiclass classification problem. Nonetheless, I verified that both 'accuracy' and 'sparse_categorical_accuracy' returned the same values in my specific example. Thus, I believe that keras is internally converting accuracy to categorical accuracy, but something goes wrong when doing this on a model that has been just loaded which forces the need to recompile.
I also verified that if the saved model was compiled with 'accuracy', loading the model and recompiling with 'sparse_categorical_accuracy' will allow resuming training. However, as mentioned before, this would discard the state of the optimiser and I suspect that it would be no better than just making a new model and loading only the weights from the saved one.

Too Much Memory Issue with Semantic Image Segmentation NN (DeepLabV3+)

I first explain my task: I have nearly 3000 images from two different ropes. They contain rope 1, rope 2 and the background. My Labels/Masks are images, where for example the pixel value 0 represents the background, 1 represents the first rope and 2 represents the second rope. You can see both the input picture and the ground truth/labels here on picture 1 and 2 below. Notice that my ground truth/label has only 3 values: 0, 1 and 2.
My input picture is gray, but for DeepLab i converted it to a RGB Picture, because DeepLab was trained on RGB Pictures. But my converted picture still doesn't contain color.
The idea of this task is that the Neural Network should learn the structure from ropes, so it can label ropes correctly even if there are knotes. Therfore the color information is not important, because my ropes have different color, so it is easy to use KMeans for creating the ground truth/labels.
For this task i choose a Semantic Segmentation Network called DeepLab V3+ in Keras with TensorFlow as Backend. I want to train the NN with my nearly 3000 images. The size of alle the images is under 100MB and they are 300x200 pixels.
Maybe DeepLab is not the best choice for my task, because my pictures doesn't contain color information and the size of my pictures are very small (300x200), but i didn't find any better Semantic Segmentation NN for my task so far.
From the Keras Website i know how to load the Data with flow_from_directory and how to use the fit_generator method. I don't know if my code is logical correct...
Here are the links:
https://keras.io/preprocessing/image/
https://keras.io/models/model/
https://github.com/bonlime/keras-deeplab-v3-plus
My first question is:
With my implementation my graphic card used nearly all the memory (11GB). I don't know why. Is it possible, that the weights from DeepLab are that big? My Batchsize is default 32 and all my nearly 300 images are under 100MB big. I already used the config.gpu_options.allow_growth = True code, see my code below.
A general question:
Does somebody know a good semantic segmentation NN for my task? I don't need NN, which were trained with color images. But i also don't need NN, which were trained with binary ground truth pictures...
I tested my raw color image(picture 3) with DeepLab, but the result label i got was not good...
Here is my code so far:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "3"
import numpy as np
from model import Deeplabv3
import tensorflow as tf
import time
import tensorboard
import keras
from keras.preprocessing.image import img_to_array
from keras.applications import imagenet_utils
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import TensorBoard
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
from keras import backend as K
K.set_session(session)
NAME = "DeepLab-{}".format(int(time.time()))
deeplab_model = Deeplabv3(input_shape=(300,200,3), classes=3)
tensorboard = TensorBoard(log_dir="logpath/{}".format(NAME))
deeplab_model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
# we create two instances with the same arguments
data_gen_args = dict(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
#image_datagen.fit(images, augment=True, seed=seed)
#mask_datagen.fit(masks, augment=True, seed=seed)
image_generator = image_datagen.flow_from_directory(
'/path/Input/',
target_size=(300,200),
class_mode=None,
seed=seed)
mask_generator = mask_datagen.flow_from_directory(
'/path/Label/',
target_size=(300,200),
class_mode=None,
seed=seed)
# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)
print("compiled")
#deeplab_model.fit(X, y, batch_size=32, epochs=10, validation_split=0.3, callbacks=[tensorboard])
deeplab_model.fit_generator(train_generator, steps_per_epoch= np.uint32(2935 / 32), epochs=10, callbacks=[tensorboard])
print("finish fit")
deeplab_model.save_weights('deeplab_1.h5')
deeplab_model.save('deeplab-1')
session.close()
Here is my code to test DeepLab (from Github):
from matplotlib import pyplot as plt
import cv2 # used for resize. if you dont have it, use anything else
import numpy as np
from model import Deeplabv3
import tensorflow as tf
from PIL import Image, ImageEnhance
deeplab_model = Deeplabv3(input_shape=(512,512,3), classes=3)
#deeplab_model = Deeplabv3()
img = Image.open("Path/Input/0/0001.png")
imResize = img.resize((512,512), Image.ANTIALIAS)
imResize = np.array(imResize)
img2 = cv2.cvtColor(imResize, cv2.COLOR_GRAY2RGB)
w, h, _ = img2.shape
ratio = 512. / np.max([w,h])
resized = cv2.resize(img2,(int(ratio*h),int(ratio*w)))
resized = resized / 127.5 - 1.
pad_x = int(512 - resized.shape[0])
resized2 = np.pad(resized,((0,pad_x),(0,0),(0,0)),mode='constant')
res = deeplab_model.predict(np.expand_dims(resized2,0))
labels = np.argmax(res.squeeze(),-1)
plt.imshow(labels[:-pad_x])
plt.show()
First question: The DeepLabV3+ is a very large model (I assume you are using the Xception backbone?!) and 11 GB of needed GPU capacity is totally normal regarding a bachsize of 32 with 200x300 pixels :) (Training DeeplabV3+, I needed approx. 11 GB using a batchsize of 5 with 500x500 pixels). One note to the second sentence of your question: the needed GPU resources are influenced by many factors (model, optimizer, batchsize, image crop, preprocessing etc) but the actual size of your dataset set shouldn't influence it. So it doesn't matter if your dataset is 300MB or 300GB large.
General Question: You are using a small dataset. Choosing DeeplabV3+ & Xception might not be a good fit, since the model might be too large. This might lead to overfitting. If you haven't obtained satisfying results yet you might try a smaller network. If you want to stick to the DeepLab-framework you could switch the backbone from the Xception network to MobileNetV2 (In the official tensorflow version it is already implemented). Alternatively, you could try using a standalone network like the Inception network with a FCN head...
In each case it would be essential to use a pre-trained encoder with a well-trained feature representation. If you don't find a good initialization of your desired model based on grayscale input images, just use a model pre-trained on RGB images and extend the pre-training with a grayscale dataset (basically you can convert any big rgb dataset to be grayscale) and finetune the weights on the grayscale input before using your data.
I hope this helps! Cheers, Frank
IBM's Large Model Support (LMS) library enables training of large deep neural networks that would normally exhaust GPU memory while training. LMS manages this over-subscription of GPU memory by temporarily swapping tensors to host memory when they are not needed.
Description - https://developer.ibm.com/components/ibm-power/articles/deeplabv3-image-segmentation-with-pytorch-lms/
Pytorch - https://github.com/IBM/pytorch-large-model-support
TensorFlow - https://github.com/IBM/tensorflow-large-model-support

Having trouble introducing a constant matrix to multiplication in Tensorflow

I am trying to implement a layer that is not fully connected. I have a matrix that specifies the connectivity I desire in the variable connectivity_matrix, which is a numpy array of ones and zeros.
The way I am currently trying to impliment the layer is by pairwise multiplying the weights, by this connectivity matrix F:
Is this the correct way to do this in tensorflow? Here is what I have so far
import numpy as np
import tensorflow as tf
import tflearn
num_input = 10
num_layer1 = 313
num_output = 700
# For example:
connectivity_matrix = np.array(np.random.choice([0, 1], size=(num_layer1, num_output)), dtype='float32')
input = tflearn.input_data(shape=[None, num_input])
# Here is where I specify the connectivity in tensorflow
connectivity = tf.constant(connectivity_matrix, shape=[num_layer1, num_output])
# One basic, fully connected layer
layer1 = tflearn.fully_connected(input, num_layer1, activation='relu')
# Here is where I want to have a non-fully connected layer
W = tf.Variable(tf.random_uniform([num_layer1, num_output]))
b = tf.Variable(tf.zeros([num_output]))
# so take a fully connected W, and do a pairwise multiplication with my tf_connectivity matrix
W_filtered = tf.mul(connectivity, W)
output = tf.matmul(layer1, W_filtered) + b
Masking out unwanted connections in each iteration should work, but I am not sure what the convergence properties are like. It may okay for a small enough learning rate?
Another approach would be to penalize unwanted weights in the cost function. You would use a mask matrix with 1's at unwanted connections, and 0's at wanted ones (or have a smoother transition). This would be multiplied by weights, squared/scaled and added to the cost function. This should converge more smoothly.
P.S.: If you've made progress on this, it would be great to hear your comments as I am also working on this problem.