I want to search for the best learning rate using tensorflow object detection api. But in the config file I'm not able to find anything for it. I can add schedule but it can't search for the best learning rate.
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.003
schedule {
step: 6000
learning_rate: .0003
}
schedule {
step: 12000
learning_rate: .00003
}
Is there any trick or way to search for best learning rate.
If you refer to the Learning Rate Finder (as described by Smith for example here: https://arxiv.org/abs/1803.09820), it seems like you can emulate it by using:
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 10000
decay_factor: 1.3
}
}
with a decay_factor above 1.
You will still have to look at the loss and choose the best learning rate yourself though.
Related
I want to add a action_reconition calculator node to the pose_landmark detector (pose_landmark_gpu.pbtxt). Does anyone know if there is already a calculator implementation suited for that purpose?
i.e.
Input: pose landmarks
Inference via tflite model
Output: probability values for the respective action classes
I've seen that the original pose landmark detector uses tensors_to_landmarks_calculator.cc. I would need a similar file but for different input & output types. Any idea if there is a "template" cc file that I could adapt to my use case?
Just for better understanding, here is my edited pbtxt of the pose_landmark detector with an additional node for action classification:
# GPU buffer. (GpuBuffer)
input_stream: "input_video"
output_stream: "output_video" # Output image with rendered results. (GpuBuffer)
output_stream: "pose_landmarks" # Pose landmarks. (NormalizedLandmarkList)
output_stream: "action_detection" # Action Probabilities
node {
calculator: "FlowLimiterCalculator"
input_stream: "input_video"
input_stream: "FINISHED:output_video"
input_stream_info: {
tag_index: "FINISHED"
back_edge: true
}
output_stream: "throttled_input_video"
}
# Subgraph that detects poses and corresponding landmarks.
node {
calculator: "PoseLandmarkGpu"
input_stream: "IMAGE:throttled_input_video"
output_stream: "LANDMARKS:pose_landmarks"
output_stream: "DETECTION:pose_detection"
output_stream: "ROI_FROM_LANDMARKS:roi_from_landmarks"
}
# Subgraph that renders pose-landmark annotation onto the input image.
node {
calculator: "PoseRendererGpu"
input_stream: "IMAGE:throttled_input_video"
input_stream: "LANDMARKS:pose_landmarks"
input_stream: "ROI:roi_from_landmarks"
input_stream: "DETECTION:pose_detection"
output_stream: "IMAGE:output_video"
}
# Subgraph that detects actions from poses
node {
calculator: "ActionDetectorGPU"
input_stream: "LANDMARKS:pose_landmarks"
output_stream: "ACTION:action_detection"
}
Update
There is a open source project called SigNN, that does the same thing as I'm intending just for hand pose classification (into american sign language letters). I'm going to plow through that...
Here is a more general formulation of a similar problem. There is a solution using MediaPipeUnityPlugin (but the same graph would also work in pure mediapipe, though there is no released driver code at the time of writing this)
I used tensorflow object detection API.
Here is my environment.
All images are from coco API
Tensorflow version : 1.13.1
Tensorboard version : 1.13.1
Number of test images : 3000
Number of train images : 24000
Pre-trained model : SSD mobilenet v2 quantized 300x300 coco
Number of detecting class : 1(person)
And here is my train_config.
train_config: {
batch_size: 6
optimizer {
adam_optimizer: {
learning_rate {
exponential_decay_learning_rate: {
initial_learning_rate:0.000035
decay_steps: 7
decay_factor: 0.98
}
}
}
}
fine_tune_checkpoint: "D:/TF/models/research/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt"
fine_tune_checkpoint_type: "detection"
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
I can't find optimized learning rate, appropriate decay steps and factor.
So I did many training, but the result is always similar.
How can I fix this??
I already spent a week just for this problem..
On the other post, someone recommended that add a noise to data set(images).
But I don't know what it means.
How can I make that happen?
I think what was referenced on the other post was to do some data augmentation by adding some noisy images to your training dataset. It means that you apply some random transformations to your input so that the model aims to generalize better.
A type of noise that can be used is the Random Gaussian noise (https://en.wikipedia.org/wiki/Gaussian_noise) which is applied by patch in the object-detection API.
Although it seems that you have enough training images it is worth a shot.
The noise would look like :
...
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
data_augmentation_options {
randompatchgaussian {
// The patch size will be chosen to be in the range
// [min_patch_size, max_patch_size).
min_patch_size: 300;
max_patch_size: 300; //if you want the whole image to be noisy
}
}
...
For the list of data augmentation you can check :
https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto
Regarding the learning rate one common strategy is to try on large learning rate (0.02 for instance) and one very small as you have tried already. I would recommend you to try with 0.02, leave it for a while or use the exponential decay learning rate to see if the results are better.
Changing the batch_size can also have some benefits, try batch_size = 2 instead of 6.
I would also recommend you to leave the training for more steps until you see no improvements at all in the training, maybe leave it until the 200000 steps define in your configuration.
Some deeper strategies can help the model to perform better, they have been said on this answer : https://stackoverflow.com/a/61699696/14203615
That being said, if your dataset is correctly made you should get good results on your test set.
I have a SavedModel with saved_model.pbtxt and variables\ which was pre-trained on a single GPU from this repo: https://github.com/sthalles/deeplab_v3. I'm trying to serve this SavedModel with tensorflow-serving, and it can only utilise GPU:0 in a multi-GPU machine. I learned from https://github.com/tensorflow/serving/issues/311 that tensorflow-serving loads the graph with tensorflow, and this model was trained on a single GPU. I tried to save the model with clear_devices=True flag but no help, still ran on GPU:0.
Then I try to read the GraphDef in saved_model.pbtxt, from https://www.tensorflow.org/guide/extend/model_files#device I know that the device assigned to one node/operation is defined in NodeDef.
My problem is, in this saved_model.pbtxt, only CPU was assigned for some operations/nodes in NodeDef as device: "/device:CPU:0", while no GPU was specifically assigned. All those operations executed on GPU don't have a device tag in their NodeDef.
I wonder where are the device placement infomation for GPU operations was saved in SavedModel and can I change the device info in a graph? Thanks for your help.
For example, in this saved_model.pbtxt a CPU op was defined as:
node {
name: "save/RestoreV2/tensor_names"
op: "Const"
device: "/device:CPU:0"
...
}
A computation op was:
node {
name: "resnet_v2_50/block1/unit_1/bottleneck_v2/conv2/kernel/Regularizer/l2_regularizer"
op: "Mul"
input: "resnet_v2_50/block1/unit_1/bottleneck_v2/conv2/kernel/Regularizer/l2_regularizer/scale"
input: "resnet_v2_50/block1/unit_1/bottleneck_v2/conv2/kernel/Regularizer/l2_regularizer/L2Loss"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
attr {
key: "_class"
value {
list {
s: "loc:#resnet_v2_50/block1/unit_1/bottleneck_v2/conv2/weights"
}
}
}
attr {
key: "_output_shapes"
value {
list {
shape {
}
}
}
}
}
I am working on number detector and use object-detection API from tensorflow. Sometimes the predicted bounding box does not contain whole number, which cannot be read then. I would like to change the loss function to penalize much more when part of a number is missing then when the predicted bounding box is too large.
I found definition of IOU in the file utils/np_box_ops.py, but it is probably not used during training. Where can I find implemantation of loss function used during training?
First of all, have in mind that there might be a problem with your dataset and/or the model/config you are using that we can't be aware because you didn't share any information about those things.
With that said, the avaliable loss functions are defined in:
https://github.com/tensorflow/models/blob/master/research/object_detection/core/losses.py
With corresponding .proto definitions for your config file in:
https://github.com/tensorflow/models/blob/master/research/object_detection/protos/losses.proto
You might be interested in trying WeightedIOULocalizationLoss.
You can also try to adjust the parameter localization_weight in the loss section of your config file:
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 0
}
classification_weight: 1.0
localization_weight: 1.0
}
And, as a little hack, you could try to post-process the predicted boxes by adding a little offset.
I am working on a binary classifier using Encog (via Java). I have it set up using an SVM or neural network, and I am want to evaluate the quality of the different models using (in part) the area under the ROC curve.
More specifically, I would ideally like to convert the output of the model into a some kind of prediction confidence score that can be used for rank ordering in the ROC, but I have yet to find anything in the documentation.
In the code, I get the model results with something like:
MLData result = ((MLRegression) method).compute( pair.getInput() );
String classification = normHelper.denormalizeOutputVectorToString( result )[0];
How do I also get a numerical confidence of the classification?
I have found a way to coax prediction probabilities out of SVM inside the encog framework. This method relies upon the equivalent of the -b option for libSVM (see http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html)
To do this, override the SVM class from encog. The constructor will enable probability estimates via the smv_parameter object (see below). Then, when doing the calculation, call the method svm_predict_probability as shown below.
Caveat: below is only a code fragment and in order to be useful you will probably need to write other constructors and to pass the resulting probabilities out of the methods below. This fragment is based upon encog version 3.3.0.
public class MySVMProbability extends SVM {
public MySVMProbability(SVM method) {
super(method.getInputCount(), method.getSVMType(), method.getKernelType());
// Enable probability estimates
getParams().probability = 1;
}
#Override
public int classify(final MLData input) {
svm_model model = getModel();
if (model == null) {
throw new EncogError(
"Can't use the SVM yet, it has not been trained, "
+ "and no model exists.");
}
final svm_node[] formattedInput = makeSparse(input);
final double probs[] = new double[svm.svm_get_nr_class(getModel())];
final double d = svm.svm_predict_probability(model, formattedInput, probs);
/* probabilities for each class are in probs[] */
return (int) d;
}
#Override
public MLData compute(MLData input) {
svm_model model = getModel();
if (model == null) {
throw new EncogError(
"Can't use the SVM yet, it has not been trained, "
+ "and no model exists.");
}
final MLData result = new BasicMLData(1);
final svm_node[] formattedInput = makeSparse(input);
final double probs[] = new double[svm.svm_get_nr_class(getModel())];
final double d = svm.svm_predict_probability(model, formattedInput, probs);
/* probabilities for each class are in probs[] */
result.setData(0, d);
return result;
}
}
Encog has no direct support for ROC curves. A ROC curve is more of a visualization than an actual model type, which is primarily the focus of Encog.
Generating a ROC curve for SVM's and Neural Networks is somewhat different. For a neural network, you must establish thresholds for the classification neurons. There is a good paper about that here: http://www.lcc.uma.es/~jja/recidiva/048.pdf
I may eventually add direct support for ROC curves into Encog in the future. They are becoming a very common visualization.