https://keras.io/api/applications/#available-models
From the table given by Keras, we know Xception has 22,910,480 parameters in total, which is the number of weight and bias of convoluation and FC layers. How do we get the size of 88 MB from the number of parameters?
Every tf.float32 / tf.int32 takes 4 bytes. So 23 * 4 ~ 88. There could be some tf.float16, tf.int16.
Related
I want to increase the learning rate from batch to batch inside of one epoch, so the first data the Net sees in one epoch has low learning rate and the last data it sees has high learning rate. How do I do this in tf.keras?
To modify the learning rate after every epoch, you can use tf.keras.callbacks.LearningRateScheduler as mentioned in the docs here.
But in our case, we need to modify the learning rate after every batch is passed to the model. We'll use tf.keras.optimizers.schedules.LearningRateSchedule for this purpose. This would modify the learning rate after each step or a gradient update.
Suppose I have 100 samples in my training dataset and my batch size is 5. The no. of steps will be 100 / 5 = 20 steps. Reframing the statement, in a single epoch, 20 batches will be passed to the model and gradient updates would also occur 20 times ( in a single epoch ).
Using the code given in the docs,
batch_size = 5
num_train_samples = 100
class MyLRSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
def __init__(self, initial_learning_rate):
self.initial_learning_rate = initial_learning_rate
def __call__(self, step):
return self.initial_learning_rate / (step + 1)
optimizer = tf.keras.optimizers.SGD(learning_rate=MyLRSchedule(0.1))
The value of step will be go from 0 to 19 for the 1st epoch, considering our example. For the 2nd epoch, it will go from 20 to 39. For your use-case, we can modify the above like,
batch_size = 5
num_train_samples = 100
num_steps = num_train_samples / batch_size
class MyLRSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
def __init__(self, initial_learning_rate):
self.initial_learning_rate = initial_learning_rate
def __call__(self, step):
step_in_epoch = step - (( step // num_steps ) * num_steps )
# Update LR according to step_in_epoch
optimizer = tf.keras.optimizers.SGD(learning_rate=MyLRSchedule(0.1))
The value of step_in_epoch will go from 0 to 19 for 1st epoch. For 2nd epoch, it will go from 0 to 19 again and likewise for all epochs. Update the LR accordingly.
Make sure that num_train_samples is perfectly divisible by the batch size. This would ease the calculation of no. of steps.
I'm working on a simple classification problem. I proceeded through the example and created a model.
I arranged the tag column as given below.
label 0 1 1 0 0 1
As a result, I wanted to test the system with samples. But it does value as a percentage.
I expect it to give 2 correct values, either 0 or 1.
example codes;
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = reloaded_model.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])
print(
"This particular pet had a %.1f percent probability "
"of getting adopted." % (100 * prob)
)
What code will result in 0 and 1?
thank you.
What to do depends on how you model was constructed. With only two labels you are doing binary classification. If in your model the last dense layer has 1 neuron then it is set up for binary classification. In that case your loss function in model.compile should be
loss=BinaryCrossentropy
Model.predict in that case will produce a single value probability output. You can just use an if statement to determine the class. If the probability is less than.5 it is one class, if greator or equal to .5 it is the other class. Now you may have constructed your model where the last dense layer has 2 neurons. In that case you should be using either sparse_categorical_crossentropy if the labels were integers or categorical_crossentropy if the labels were one hot encoded as your loss function. Model.predict in this case will produce two probabilities as the output. You want to select the index of with the highest probability as the class.
You can do that with class=np.argmax(predictions)
I am following the course of Andrew Ng on the topic of Deep Learning. In one programming assignment that uses the SIGN dataset. For what I know each image is composed of 64 by 64 pixels of width and height, and another dimension of 3 that corresponds to the RGB channels.
According to the author it says that the value of:
n_x=num_px * num_px = 64 * 64 * 3 = 12288
and having the following data:
number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
the part that I do not understand is when the author initializes the weights, he says that the shape of W1 (an array of weights) is:
W1 : [25, 12288]
this part I do not get it, why 25 as the number of rows? I get it that the number of columns corresponds to the formula of n_x, but this 25 to what it refers to? is it the number of neurons inside a hidden layer?
Thanks
This looks like 12288 is the number of input nodes and 25 is the number of nodes in the hidden layer.
Thus, the number of weights should be = 25 * 12288 (Each node in Layer(i) is connected to each node in Layer(i+1)), and thus the size of the matrix.
Currently I’m struggling with improving the results on semantic segmentation problem using deeplabV3+ trained on my own dataset.
I’ve trained deeplabV3+ successfully a few times using different pretrained models from the model zoo, all based on xception_65, but my results keep staying in the same miou range, somewhere around this interval [10, 11].
I have only one GPU at my disposal with 11GB GPU memory.
My dataset has 8 classes with various object sizes, from little to big, and is quite unbalanced.
Here are the label weights: [1, 4, 4, 17, 42, 36, 19, 20].
In my dataset I have 757 instances for training and 100 validation.
When training the general tendency is: the first 10k iterations my loss decreases, but then it only oscillates.
I’ve tried:
to adjust parameters like: the learning rate, last_layer_gradient_multiplier, weight decay
training on various image sizes 321, 513, 769
some kind of weighting using the above weights in this formula
weights = tf.to_float(tf.equal(scaled_labels, 0)) * 1 +
tf.to_float(tf.equal(scaled_labels, 1)) * 4 +
tf.to_float(tf.equal(scaled_labels, 2)) * 4 +
tf.to_float(tf.equal(scaled_labels, 3)) * 17 +
tf.to_float(tf.equal(scaled_labels, 4)) * 42 +
tf.to_float(tf.equal(scaled_labels, 5)) * 36 +
tf.to_float(tf.equal(scaled_labels, 6)) * 19 +
tf.to_float(tf.equal(scaled_labels, 7)) * 20 +
tf.to_float(tf.equal(scaled_labels, ignore_label)) * 0.0
I’ve trained without fine tuning the batch normalization parameters (fine_tune_batch_norm = False). Although I also tried training those parameters (fine_tune_batch_norm = True) with a 321 crop size in order to be able to fit a batch size of 12 in my GPU.
The point being I need some tips to figure out what I can do to improve those results.
What do you guys think? Do I need more data in order to increase my miou or hardware?
I have the following model file from LIBSVM:
svm_type c_svc kernel_type linear nr_class 2 total_sv 3 rho 0.0666415
label 1 -1 nr_sv 2 1 SV
0.004439511653718091 1:4.5 2:0.5
0.07111595083031433 1:2 2:2
-0.07555546248403242 1:-0.5 2:-2.5
My question is how do I figure out the weight vector from this information?
The weights of the support vectors are the first numbers on each of the support vector lines (the last three). Despite using a linear kernel, libsvm is for general kernel SVMs, so it isn't storing a weight vector and bias explicitly.
If you know you want a linear kernel, and you want that information, you can use liblinear (from the same folks as libsvm). Given this trivial data:
1 1:1 2:1
0 1:-1 2:-1
you can get this model, which has explicit weight and bias:
solver_type L2R_L2LOSS_SVC_DUAL
nr_class 2
label 1 0
nr_feature 2
bias -1
w
0.4327936
0.4327936