How to improve miou for deeplabV3+ ? - tensorflow

Currently I’m struggling with improving the results on semantic segmentation problem using deeplabV3+ trained on my own dataset.
I’ve trained deeplabV3+ successfully a few times using different pretrained models from the model zoo, all based on xception_65, but my results keep staying in the same miou range, somewhere around this interval [10, 11].
I have only one GPU at my disposal with 11GB GPU memory.
My dataset has 8 classes with various object sizes, from little to big, and is quite unbalanced.
Here are the label weights: [1, 4, 4, 17, 42, 36, 19, 20].
In my dataset I have 757 instances for training and 100 validation.
When training the general tendency is: the first 10k iterations my loss decreases, but then it only oscillates.
I’ve tried:
to adjust parameters like: the learning rate, last_layer_gradient_multiplier, weight decay
training on various image sizes 321, 513, 769
some kind of weighting using the above weights in this formula
weights = tf.to_float(tf.equal(scaled_labels, 0)) * 1 +
tf.to_float(tf.equal(scaled_labels, 1)) * 4 +
tf.to_float(tf.equal(scaled_labels, 2)) * 4 +
tf.to_float(tf.equal(scaled_labels, 3)) * 17 +
tf.to_float(tf.equal(scaled_labels, 4)) * 42 +
tf.to_float(tf.equal(scaled_labels, 5)) * 36 +
tf.to_float(tf.equal(scaled_labels, 6)) * 19 +
tf.to_float(tf.equal(scaled_labels, 7)) * 20 +
tf.to_float(tf.equal(scaled_labels, ignore_label)) * 0.0
I’ve trained without fine tuning the batch normalization parameters (fine_tune_batch_norm = False). Although I also tried training those parameters (fine_tune_batch_norm = True) with a 321 crop size in order to be able to fit a batch size of 12 in my GPU.
The point being I need some tips to figure out what I can do to improve those results.
What do you guys think? Do I need more data in order to increase my miou or hardware?

Related

How should the output of my embedding layer look? Keras to PyTorch

I am in the process of translating a Keras implementation to a PyTorch one. After the full conversion my model was not converging fast enough, although the loss did seem to be decreasing. As I was tracing back my steps, I noticed something a bit odd about my embedding layer. Let me explain the data:
I have 4 batches, each with a sequence length of 100, and a vocab size of 83. I am working with songs in ABC notation, so the song can have 83 different symbols in it and it is 100 symbols long.
So now I have an ndarray of shape (4, 100) which contains my 4 sequences of songs. Let's call it x.
Now if I pass x into an embedding layer in Keras:
tf.keras.layers.Embedding(83, 256, batch_input_shape=[4, None])(x).numpy()
I get a more "narrow" set of values for each batch than I do in PyTorch, does this affect my convergence?. I.E. the minimum value in the first batch is -0.04999 and the maximum value is 0.04999.
Now if I pass the same x into my PyTorch embedding layer:
torch.nn.Embedding(4*100, 256)(torch.tensor(x)).detach().numpy()
I get a "wider" set of values for each batch. The maximum value is 3.3865 and the minimum value is -3.917.
My question is, should I be worried that this is a cause for my model not converging properly?
You need to understand the sequential to sequantail interactions they are not exatcly the same as numpy or matrix but they are posibility when you generate even from embedding Fn there are a bit of changes you may need training or dfilters for target actions.
Example you may do it with CONV or LSTM but filters out layers that make actiuons stable or you may see this game as example !
Embedding layer:
layer_1 = model.get_layer( name="embedding_layer" )
###<keras.layers.embeddings.Embedding object at 0x000001AD42102A30
print(layer_1) # (83, 256)
print(layer_1.get_weights()[0].shape) # (48, 64)
print('min: ' + str(np.min(layer_1.get_weights()[0]))) # min: -0.049991023
print('max: ' + str(np.max(layer_1.get_weights()[0]))) # max: 0.049998153
Output:
👉 the first time
<keras.layers.embeddings.Embedding object at 0x000001FA0BE74A30>
(83, 256)
min: -0.049991023
max: 0.049998153
👉 the second time
<keras.layers.embeddings.Embedding object at 0x00000214A1C34A30>
(83, 256)
min: -0.04999887
max: 0.049993087
👉 the third time
<keras.layers.embeddings.Embedding object at 0x00000283B20F3A30>
(83, 256)
min: -0.049999725
max: 0.049998928
Sample of actions from limited inputs:
This is proving the randoms actions is working correct with simple lines of code
gameState = p.getGameState()
### {'player_x': 102, 'player_vel': 0.0, 'fruit_x': 30, 'fruit_y': -120}
player_x_array = gameState['player_x']
player_vel_array = gameState['player_vel']
fruit_x_array = gameState['fruit_x']
fruit_y_array = gameState['fruit_y']
### x is less then go left
var_1 = player_x_array - fruit_x_array ## right
var_2 = player_x_array - fruit_x_array ## left
var_3 = fruit_y_array - ( player_x_array - fruit_x_array )
print(str(var_1) + " " + str(var_2) + " " + str(var_3))
temp = tf.random.normal([len(posibility_actions)], 1, 0.2, tf.float32)
temp = np.asarray(temp) * np.asarray([ var_1, var_2, var_3 ])
temp = tf.nn.softmax(temp)
action = int(np.argmax(temp))
reward = p.act(posibility_actions[action])
print('random action: ' + str(posibility_actions[action]))
It should not be any problem when it pass though multiple lines of layers that filters out no need information, see input and output what is the taks they generated?

How to get the CNN model size?

https://keras.io/api/applications/#available-models
From the table given by Keras, we know Xception has 22,910,480 parameters in total, which is the number of weight and bias of convoluation and FC layers. How do we get the size of 88 MB from the number of parameters?
Every tf.float32 / tf.int32 takes 4 bytes. So 23 * 4 ~ 88. There could be some tf.float16, tf.int16.

what these dimensions represent in a neural network?

I am following the course of Andrew Ng on the topic of Deep Learning. In one programming assignment that uses the SIGN dataset. For what I know each image is composed of 64 by 64 pixels of width and height, and another dimension of 3 that corresponds to the RGB channels.
According to the author it says that the value of:
n_x=num_px * num_px = 64 * 64 * 3 = 12288
and having the following data:
number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
the part that I do not understand is when the author initializes the weights, he says that the shape of W1 (an array of weights) is:
W1 : [25, 12288]
this part I do not get it, why 25 as the number of rows? I get it that the number of columns corresponds to the formula of n_x, but this 25 to what it refers to? is it the number of neurons inside a hidden layer?
Thanks
This looks like 12288 is the number of input nodes and 25 is the number of nodes in the hidden layer.
Thus, the number of weights should be = 25 * 12288 (Each node in Layer(i) is connected to each node in Layer(i+1)), and thus the size of the matrix.

How to handle log(0) when using cross entropy

In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.
Loss function
loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m
Probability of Y
predY is computed using sigmoid and logits can be thought as the outcome of from a neural network before reaching the classification step
predY = sigmoid(logits) #binary case
def sigmoid(X):
return 1/(1 + np.exp(-X))
Problem
Suppose we are running a feed-forward net.
Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)
Num of hidden units: 100 (only 1 hidden layer)
Iterations: 10000
Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0) undefined. How do you usually handle this issue?
If you don't mind the dependency on scipy, you can use scipy.special.xlogy. You would replace the expression
np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY))
with
xlogy(Y, predY) + xlogy(1 - Y, 1 - predY)
If you expect predY to contain very small values, you might get better numerical results using scipy.special.xlog1py in the second term:
xlogy(Y, predY) + xlog1py(1 - Y, -predY)
Alternatively, knowing that the values in Y are either 0 or 1, you can compute the cost in an entirely different way:
Yis1 = Y == 1
cost = -(np.log(predY[Yis1]).sum() + np.log(1 - predY[~Yis1]).sum())/m
How do you usually handle this issue?
Add small number (something like 1e-15) to predY - this number doesn't make predictions much off, and it solves log(0) issue.
BTW if your algorithm outputs zeros and ones it might be useful to check the histogram of returned probabilities - when algorithm is so sure that something's happening it can be a sign of overfitting.
One common way to deal with log(x) and y / x where x is always non-negative but can become 0 is to add a small constant (as written by Jakub).
You can also clip the value (e.g. tf.clip_by_value or np.clip).

tensorflow : conv2d_transpose : Matching desired output dimensions

How can I force certain dimensionality of the output of the conv2d_transpose layer ? My problem is that I use it for upsampling and I want to match the dimensionality of my labels and the output of the NN, for example if I have a feature map as Bx25x40xC how can I make it Bx100x160xC (i.e. upsample exactly 4x times)?
It seems like dimensions of the output can be calculated using
h = ((h_in - 1) * stride_h) + kernel_h - 2 * pad_h
w = ((w_in - 1) * stride_w) + kernel_w - 2 * pad_w
one can manipulate strides and kernels, but padding is controlled by 'same'/'valid' algorithms which, to my understanding, means they are pretty much uncontrollable, so is the resulting output size. For comparison, in caffe, one can at least force the padding in attempt to match the desired output explicitly.